vishalpathak1986 opened a new issue #2095:
URL: https://github.com/apache/hudi/issues/2095


   I am trying to write an insert and an update to a partitioned Hudi table. I 
am interested in MoR RO view. My workload profile looks like the following:
   WorkloadProfile {globalStat=WorkloadStat {numInserts=1, numUpdates=1}, 
partitionStat={dept=test_part1=WorkloadStat {numInserts=1, numUpdates=0}, 
dept=test_part2=WorkloadStat {numInserts=0, numUpdates=1}}}
   
   To do this job 
[upsertRecordsInternal](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java#L449)
 is invoked.
   
   This invokes 
[getPartitioner](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java#L481)
 method which inturn invokes 
[getUpsertPartitioner](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L146).
   
   This creates an object of 
[UpsertPartitioner](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L543).
   
   Everything until this point is as expected.
   
   The constructor of UpsertPartitioner invokes 
[assignUpdates](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L589)
 and 
[assignInserts](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L608).
 While assignUpdates invokes 
[addUpdateBucket](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L597)
 and sets the [BucketType to 
UPDATE](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L601),
 assignInserts sets the [BucketType to 
INSERT](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.
 java#L666).
   
   Because of this, in 
[handleUpsertPartition](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L253)
 (invoked through 
[upsertRecordsInternal](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java#L472)),
 
[handleInsert](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L260)
 causes the insert record to be written to parquet file and 
[handleUpdate](https://github.com/apache/hudi/blob/c51ac6553e7faa40a9a41ad40330cccd34554149/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L262)
 causes the update record to be written to Avro file.
   
   Now when I query the MoR RO view, I see the inserted record visible in the 
output but not the updated record. This gives the impression of inconsistent 
data. While the behavior of updated record is expected, is it expected to see 
the inserted record in MoR view without running Compaction?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to