voonhous commented on PR #8579:
URL: https://github.com/apache/hudi/pull/8579#issuecomment-1523206446

   > > If preCombine is invoked with the same key when an old data {price: 
11.00, _ts:999} is received together with a new data {price: null, _ts: 1001}, 
the old data's column value might overwrite the existing newer data {price: 
10.0, _ts: 1000}.
   > 
   > This is expected right ? We always ignore the nulls while merging, 
shouldn't the `#combineAndGetUpdateValue` follow the same convention?
   
   `#combineAndGetUpdate` does follow the same convention. The crux of the 
gotcha here is that if a batch contains multiple records of the same key, it 
will produce different results when `#combineAndGetUpdateValue` individually.
   
   **NOTE:**
   My bad, the initial precombineField value of the table's initial state is 
wrong. I've edited the previous examples.
   
   Let me provide an example again:
   
   
   # preCombine + combineAndGetUpdateValue
   ```
   Table initial state (1):
   [1    a1_0    10.0    1000]
   
   Table performs an update with an incoming batch that has the following 
results (2):
   (preCombine + combineAndGetUpdateValue)
   [
     [1    a1_0    11.0     999],
     [1    a1_0    null     1001]
   ]
   
   After preCombine results from (2), we will get (3):
   [1    a1_0    11.0    1001]
   
   This will be combineAndGetUpdateValue with (1) to produce:
   > (1) + (3)
   
   [1    a1_0    11.0    1001]
   ```
   
   # combineAndGetUpdateValue only
   
   ```
   Table initial state (1):
   [1    a1_0    10.0    1000]
   
   Table performs an update (2): 
   (combineAndGetUpdateValue)
   [1    a1_0    11.0     999]
   
   to produce (3) [NO CHANGE]:
   [1    a1_0    10.0     1000]
   
   Table performs an update again (3): 
   (combineAndGetUpdateValue)
   [1    a1_0   null     1001]
   
   End state of the table:
   > (2) + (3):
   
   [1    a1_0   10.0    1001]
   ````
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to