nsivabalan commented on code in PR #7632:
URL: https://github.com/apache/hudi/pull/7632#discussion_r1066307179
##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -344,6 +351,11 @@ public Builder withInlineLogCompaction(Boolean
inlineLogCompaction) {
return this;
}
+public Builder withOfflineCompaction(Boolean useOfflineCompaction) {
Review Comment:
lets try to align the method naming w/ the var name.
##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -67,6 +67,13 @@ public class HoodieCompactionConfig extends HoodieConfig {
.withDocumentation("When set to true, logcompaction service is triggered
after each write. While being "
+ " simpler operationally, this adds extra latency on the write
path.");
+ public static final ConfigProperty
DISABLE_ASYNC_COMPACT_FOR_SPARK_STREAMING = ConfigProperty
Review Comment:
we have other streaming configs in DataSourceOptions.
for eg
```
val STREAMING_RETRY_CNT: ConfigProperty[String] = ConfigProperty
.key("hoodie.datasource.write.streaming.retry.count")
.defaultValue("3")
.withDocumentation("Config to indicate how many times streaming job
should retry for a failed micro batch.")
val STREAMING_RETRY_INTERVAL_MS: ConfigProperty[String] = ConfigProperty
.key("hoodie.datasource.write.streaming.retry.interval.ms")
.defaultValue("2000")
.withDocumentation(" Config to indicate how long (by millisecond) before
a retry should issued for failed microbatch")
```
So, lets move it there and honor the prefix
"hoodie.datasource.write.streaming.disable.compaction"
##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -67,6 +67,13 @@ public class HoodieCompactionConfig extends HoodieConfig {
.withDocumentation("When set to true, logcompaction service is triggered
after each write. While being "
+ " simpler operationally, this adds extra latency on the write
path.");
+ public static final ConfigProperty
DISABLE_ASYNC_COMPACT_FOR_SPARK_STREAMING = ConfigProperty
+ .key("hoodie.compact.disable.for.spark.streaming")
+ .defaultValue("false")
+ .withDocumentation("When set to true, compaction will not run inline nor
asynchronously. Compaction can take up significant resources "
Review Comment:
some word smithing
```
By default for MOR table, async compaction is enabled with spark streaming
sink. By setting this config to true, we can disable it and the expectation is
that, users will schedule and execute compaction in a different process/job
altogether. Some users may wish to run it separately to manage resources across
table services and regular ingestion pipeline and so this could be preferred on
such cases."
```
##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala:
##
@@ -117,7 +119,8 @@ class HoodieStreamingSink(sqlContext: SQLContext,
retry(retryCnt, retryIntervalMs)(
Try(
HoodieSparkSqlWriter.write(
- sqlContext, mode, updatedOptions, data, hoodieTableConfig,
writeClient, Some(triggerAsyncCompactor), Some(triggerAsyncClustering),
+ sqlContext, mode, updatedOptions, data, hoodieTableConfig,
writeClient,
+ if (disableCompaction) None else Some(triggerAsyncCompactor),
Some(triggerAsyncClustering),
Review Comment:
Is it possible to write tests in TestStructuredStreaming to test the
functionality?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org