[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546422024 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1545726848 @bvaradar CI is green, could you please take a look at it again? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542814478 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442298082 Hi Hudi devs, I would appreciate a review, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441751764 There seem to be a bug with non-strict insert mode when using spark datasource it can insert duplicates only in overwrite mode or append mode when data is inserted to the table for the first time, but if I want to insert in append mode for the second time it deduplicates the dataset as if it was working in upsert mode. ``` opt_insert = { 'hoodie.table.name': 'huditbl', 'hoodie.datasource.write.recordkey.field': 'keyid', 'hoodie.datasource.write.table.name': 'huditbl', 'hoodie.datasource.write.operation': 'insert', 'hoodie.sql.insert.mode': 'non-strict', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2, 'hoodie.combine.before.upsert': 'false', 'hoodie.combine.before.insert': 'false', 'hoodie.datasource.write.insert.drop.duplicates': 'false' } df = spark.range(0, 10).toDF("keyid") \ .withColumn("age", expr("keyid + 1000")) df.write.format("hudi"). \ options(**opt_insert). \ mode("overwrite"). \ save(path) spark.read.format("hudi").load(path).count() # returns 10 df = df.union(df) # creates duplicates df.write.format("hudi"). \ options(**opt_insert). \ mode("append"). \ save(path) spark.read.format("hudi").load(path).count() # returns 10 but should return 20 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false
kazdy commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438816953 GH actions tests failed but I don't see why, it passed before. Azure failed on TestHoodieDeltaStreamerWithMultiWriter, rather unrelated and it also passed it on the first run. Looks like some flaky tests causing issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org