flashJd commented on PR #9113: URL: https://github.com/apache/hudi/pull/9113#issuecomment-1623593792
> > How about follow the spark behavior? We should respect the spark configure: `spark.sql.sources.partitionOverwriteMode`, if it's `Static`, overwrite the whole table, otherwise if `dynamic`, overwrite the changed partitions. > > It appeals this is also how iceberg works. > > Agreed, I will check the logic to respect the spark configure @boneanxs @KnightChess @danny0405 1) I try to respect the spark configure `spark.sql.sources.partitionOverwriteMode`, but found it's supported in datasource v2 as HoodieInternalV2Table only support V1_BATCH_WRITE TableCapability thus can't extend interface `SupportsDynamicOverwrite` 2) For iceberg, I found it repsect the spark config as implement the interface `SupportsDynamicOverwrite`, but it also set it's own configure to control the static/dynamic overwrite semantics, https://github.com/apache/iceberg/blob/1f1ec4be478feae79b04bcea3e9a8556d8076054/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L106 3) As the v2 BATCH_WRITE not supported now, we can first use hooide config `hoodie.datasource.write.operation = insert_overwrite_table/insert_overwrite` to implement the static/dynamic overwrite semantics, then respect spark configure when v2 write support. what about you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org