Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

Vinoth Chandar Fri, 26 Apr 2019 06:00:36 -0700

Looks like you are querying the RO table? If so, the query only hits
parquet file; which was probably generated during the first upsert and all
others went to the log. Unless compaction runs, it wont show up on ro table


If you want the latest merged view you need to query the RT table.

Does that sound applicable?



On Fri, Apr 26, 2019 at 3:02 AM [email protected] <
[email protected]> wrote:

> Writing hudi set as below
>
> ds.withColumn("emp_name",lit("upd1
> Emily")).withColumn("ts",current_timestamp).write.format("com.uber.hoodie")
> .option(HoodieWriteConfig.TABLE_NAME,"emp_mor_26")
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,"emp_id")
> .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,"MERGE_ON_READ")
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "part_by")
> .option("hoodie.upsert.shuffle.parallelism",4)
> .mode(SaveMode.Append)
> .save("/apps/hive/warehouse/emp_mor_26")
>
>
> 1st run - write record 1,"hudi_045",current_timestamp as ts
> read result -- 1, hudi_045
> 2nd run - write record 1,"hudi_046",current_timestamp as ts
> read result -- 1,hudi_046
> 3rd run -- write record 1, "hoodie_123",current_timestamp as ts
> read result --- 1,hudi_046
> 4th run -- write record 1, "hdie_1232324",current_timestamp as ts
> read result --- 1,hudi_046
>
> after multiple updates to same record ,
> the generated  log.1 has multiple instances of the same record.
> At this point the updated record is not fetched.
>
> 14:45
> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1
> - has record that was updated in run 1
> 15:00
> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1
> - has record that was updated in run 2 and run 3
> 14:41 /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata
> 14:41
> /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet
>
>
> So is there any compaction to be enabled before reading or while writing .
>
>

Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

Reply via email to