PhatakN1 commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618292312


   If MOR inserts go to a parquet file but updates to go a log file, then a 
query on the _ro table will show the inserts since the last compaction but not 
the updates. Isnt that like providing an inconsistent state of data? So, I 
still see all inserts since the last compaction but none of  the updates?
   
   These are the contents of the log file using show logfile records in hudi-cli
   {"_hoodie_commit_time": "20200422083923", "_hoodie_commit_seqno": 
"20200422083923_1_2", "_hoodie_record_key": "11", "_hoodie_partition_path": 
"2019-03-14", "_hoodie_file_name": "c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0", 
"dms_received_ts": "2020-04-22T08:38:36.873970Z", "tran_id": 11, "tran_date": 
"2019-03-14", "store_id": 5, "store_city": "CHICAGO", "store_state": "IL", 
"item_code": "XXXXXX", "quantity": 15, "total": 106.25, "Op": "D"}
   
   This is the log file metadata
   ║ 20200422083923 │ 1           │ AVRO_DATA_BLOCK │ 
{"SCHEMA":"{\"type\":\"record\",\"name\":\"retail_transactions\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"dms_received_ts\",\"type\":\"string\"},{\"name\":\"tran_id\",\"type\":\"int\"},{\"name\":\"tran_date\",\"type\":\"string\"},{\"name\":\"store_id\",\"type\":\"int\"},{\"name\":\"store_city\",\"type\":\"string\"},{\"name\":\"store_state\",\"type\":\"string\"},{\"name\":\"item_code\",\"type\":\"string\"},{\"name\":\"quantity\",\"type\":\"int\"},{\"name\":\"total\",\"type\":\"float\"},{\"name\":\"Op\",\"type\":\"string\"}]}","INSTANT_TIME":"20200422083923"}
 │ {}             ║
   
   The name of the parquet file in the partition is 
c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_3-23-40_20200422072539.parquet and the 
log file name is 
.c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_20200422072539.log.1_1-24-33
   
   The partiton metadata contents are 
   commitTime=20200422072539
   partitionDepth=1
   
   Not sure why a query on the _rt table does not reflect the delete. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to