mxm commented on code in PR #15566: URL: https://github.com/apache/iceberg/pull/15566#discussion_r2939796707
########## docs/docs/flink-maintenance.md: ########## @@ -361,9 +409,64 @@ CREATE TABLE db.tbl ( ); ``` +### IcebergSink Maintenance Configuration (SQL) + +These keys are used in SQL (SET or table WITH options) or via `IcebergSink.Builder.set()` / `setAll()`. + +#### Enable Flags + +| Key | Description | Default | +|-----|-------------|---------| +| `flink-maintenance.rewrite.enabled` | Enable compaction (rewrite data files) | `false` | +| `flink-maintenance.expire-snapshots.enabled` | Enable expire snapshots | `false` | +| `flink-maintenance.delete-orphan-files.enabled` | Enable delete orphan files | `false` | + +#### Rewrite Data Files Configuration + +| Key | Description | Default | +|-----|-------------|---------| +| `flink-maintenance.rewrite.schedule.commit-count` | Trigger after N commits | `10` | +| `flink-maintenance.rewrite.schedule.data-file-count` | Trigger after N data files | `1000` | +| `flink-maintenance.rewrite.schedule.data-file-size` | Trigger after total data file size (bytes) | `107374182400` (100GB) | +| `flink-maintenance.rewrite.schedule.interval-second` | Trigger after time interval (seconds) | `600` | +| `flink-maintenance.rewrite.max-bytes` | Maximum bytes to rewrite per execution | `Long.MAX_VALUE` | +| `flink-maintenance.rewrite.partial-progress.enabled` | Enable partial progress commits | `false` | +| `flink-maintenance.rewrite.partial-progress.max-commits` | Maximum commits for partial progress | `10` | + +#### Expire Snapshots Configuration + +| Key | Description | Default | +|-----|-------------|---------| +| `flink-maintenance.expire-snapshots.schedule.commit-count` | Trigger after N commits | `10` | +| `flink-maintenance.expire-snapshots.schedule.data-file-count` | Trigger after N data files | `1000` | +| `flink-maintenance.expire-snapshots.schedule.data-file-size` | Trigger after total data file size (bytes) | `107374182400` (100GB) | +| `flink-maintenance.expire-snapshots.schedule.interval-second` | Trigger after time interval (seconds) | `600` | +| `flink-maintenance.expire-snapshots.max-snapshot-age-seconds` | Maximum age of snapshots to retain (seconds) | Not set | +| `flink-maintenance.expire-snapshots.retain-last` | Minimum number of snapshots to retain | Not set | +| `flink-maintenance.expire-snapshots.delete-batch-size` | Batch size for deleting expired files | `1000` | +| `flink-maintenance.expire-snapshots.clean-expired-metadata` | Remove expired metadata (partition specs, schemas) | Not set | +| `flink-maintenance.expire-snapshots.planning-worker-pool-size` | Worker pool size for planning | Shared pool | + +#### Delete Orphan Files Configuration + +| Key | Description | Default | +|-----|-------------|---------| +| `flink-maintenance.delete-orphan-files.schedule.commit-count` | Trigger after N commits | `10` | +| `flink-maintenance.delete-orphan-files.schedule.data-file-count` | Trigger after N data files | `1000` | +| `flink-maintenance.delete-orphan-files.schedule.data-file-size` | Trigger after total data file size (bytes) | `107374182400` (100GB) | +| `flink-maintenance.delete-orphan-files.schedule.interval-second` | Trigger after time interval (seconds) | `600` | +| `flink-maintenance.delete-orphan-files.min-age-seconds` | Minimum age of files to consider for deletion (seconds) | `259200` (3 days) | +| `flink-maintenance.delete-orphan-files.delete-batch-size` | Batch size for deleting orphan files | `1000` | +| `flink-maintenance.delete-orphan-files.location` | Location to start recursive listing | Table location | +| `flink-maintenance.delete-orphan-files.use-prefix-listing` | Use prefix listing for file discovery | `false` | +| `flink-maintenance.delete-orphan-files.planning-worker-pool-size` | Worker pool size for planning | Shared pool | +| `flink-maintenance.delete-orphan-files.equal-schemes` | Equivalent schemes (format: `s3n=s3,s3a=s3`) | Not set | Review Comment: Adjusted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
