[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-05-12 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546422024

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-05-12 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1545726848

   @bvaradar CI is green, could you please take a look at it again? thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-05-10 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542814478

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-23 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442298082

   Hi Hudi devs, I would appreciate a review, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-23 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441751764

   There seem to be a bug with non-strict insert mode
   when using spark datasource it can insert duplicates only in overwrite mode 
or append mode when data is inserted to the table for the first time, but if I 
want to insert in append mode for the second time it deduplicates the dataset 
as if it was working in upsert mode.
   
   ```
   
   opt_insert = {
   'hoodie.table.name': 'huditbl',
   'hoodie.datasource.write.recordkey.field': 'keyid',
   'hoodie.datasource.write.table.name': 'huditbl',
   'hoodie.datasource.write.operation': 'insert',
   'hoodie.sql.insert.mode': 'non-strict',
   'hoodie.upsert.shuffle.parallelism': 2,
   'hoodie.insert.shuffle.parallelism': 2,
   'hoodie.combine.before.upsert': 'false',
   'hoodie.combine.before.insert': 'false',
   'hoodie.datasource.write.insert.drop.duplicates': 'false'
   }
   
   df = spark.range(0, 10).toDF("keyid") \
 .withColumn("age", expr("keyid + 1000"))
   
   df.write.format("hudi"). \
   options(**opt_insert). \
   mode("overwrite"). \
   save(path)
   
   spark.read.format("hudi").load(path).count() # returns 10
   
   df = df.union(df) # creates duplicates
   df.write.format("hudi"). \
   options(**opt_insert). \
   mode("append"). \
   save(path)
   
   spark.read.format("hudi").load(path).count() # returns 10 but should return 
20
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-21 Thread via GitHub


kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438816953

   GH actions tests failed but I don't see why, it passed before.
   Azure failed on TestHoodieDeltaStreamerWithMultiWriter, rather unrelated and 
it also passed it on the first run. Looks like some flaky tests causing issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org