mxm commented on code in PR #15566:
URL: https://github.com/apache/iceberg/pull/15566#discussion_r2939794451
##########
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##########
@@ -626,6 +633,92 @@ public Builder setSnapshotProperty(String property, String
value) {
return this;
}
+ /**
+ * Enables or disables compaction (rewriting data files) as a post-commit
maintenance task.
+ *
+ * @param enabled whether to enable compaction
+ * @see RewriteDataFilesConfig for the default config.
+ * @deprecated See {@code rewriteDatafiles(..)}
+ */
+ @Deprecated
+ public Builder compaction(boolean enabled) {
+ writeOptions.put(FlinkWriteOptions.COMPACTION_ENABLE.key(),
Boolean.toString(enabled));
+ return this;
+ }
+
+ /**
+ * Enables or disables compaction (rewriting data files) as a post-commit
maintenance task.
+ *
+ * @param enabled whether to enable compaction
+ * @see RewriteDataFilesConfig for the default config.
+ */
+ public Builder rewriteDataFiles(boolean enabled) {
+ writeOptions.put(FlinkWriteOptions.COMPACTION_ENABLE.key(),
Boolean.toString(enabled));
+ return this;
+ }
+
+ /**
+ * Enables or disables compaction (rewriting data files) as a post-commit
maintenance task.
+ *
+ * @param enabled whether to enable compaction
+ * @param config task-specific configuration, see {@link
RewriteDataFilesConfig} for available
+ * keys
+ */
+ public Builder rewriteDataFiles(boolean enabled, Map<String, String>
config) {
+ rewriteDataFiles(enabled);
+ writeOptions.putAll(config);
+ return this;
+ }
+
+ /**
+ * Enables or disables expire snapshots as a post-commit maintenance task.
+ *
+ * @param enabled whether to enable expire snapshots
+ * @see ExpireSnapshotsConfig for the default config.
+ */
+ public Builder expireSnapshots(boolean enabled) {
+ writeOptions.put(FlinkWriteOptions.EXPIRE_SNAPSHOTS_ENABLE.key(),
Boolean.toString(enabled));
Review Comment:
That makes sense. I've made these adjustments based on your feedback:
ExpireSnapshotsConfig
- schedule.commit-count: 10
- schedule.interval-second: 3600 (1 hour)
- max-snapshot-age-seconds: not set
- retain-last: not set
- delete-batch-size: 1000
- clean-expired-metadata: true
- planning-worker-pool-size: not set (shared pool)
DeleteOrphanFilesConfig
- schedule.interval-second: 3600 (1 hour)
- min-age-seconds: 259200 (3 days)
- delete-batch-size: 1000
- location: not set (table location)
- use-prefix-listing: true
- planning-worker-pool-size: not set (shared pool)
- equal-schemes: s3n=s3, s3a=s3
- equal-authorities: not set
- prefix-mismatch-mode: ERROR
> For max-snapshot-age-seconds and retain-last, the defaults are both
null—do we need to set a default value for them?
I find it difficult to set defaults for these, this is usually configured
per table. If we don't set that, the table values will be used, which looks
like the right thing to do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]