manuzhang commented on code in PR #8251: URL: https://github.com/apache/iceberg/pull/8251#discussion_r1289480160
########## docs/spark-procedures.md: ########## @@ -283,11 +283,27 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `options` | ️ | map<string, string> | Options to be used for actions| | `where` | ️ | string | predicate as a string used for filtering the files. Note that all files that may contain data matching the filter will be selected for rewriting| +#### Options + +| Name | Default Value | Description | +|------|---------------|-------------| +| `max-concurrent-file-group-rewrites` | 5 | Maximum number of file groups to be simultaneously rewritten | +| `partial-progress.enabled` | false | Enable committing groups of files prior to the entire rewrite completing | +| `partial-progress.max-commits` | 10 | Maximum amount of commits that this rewrite is allowed to produce if partial progress is enabled | +| `use-starting-sequence-number` | true | Use the sequence number of the snapshot at compaction start time instead of that of the newly produced snapshot | +| `rewrite-job-order` | none | Force the rewrite job order based on the value (one of bytes-asc, bytes-desc, files-asc, files-desc, none) | +| `target-file-size-bytes` | default value of `write.target-file-size-bytes` from [table properties](../configuration/#write-properties) | Target output file size | +| `min-file-size-bytes` | 75% of target file size | Files under this threshold will be considered for rewriting regardless of any other criteria | +| `max-file-size-bytes` | 180% of target file size | Files with sizes above this threshold will be considered for rewriting regardless of any other criteria | +| `min-input-files` | 5 | Any file group exceeding this number of files will be rewritten regardless of other criteria | +| `rewrite-all` | false | Force rewriting of all provided files overriding other options | +| `max-file-group-size-bytes` | 107374182400 | Largest amount of data that should be rewritten in a single file group | Review Comment: Added. It's 100GB actually ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
