https://github.com/apache/incubator-hudi/issues/652#issuecomment-487016906
Looks like Nishith and you were chatting about this here.

On Fri, Apr 26, 2019 at 6:00 AM Vinoth Chandar <vin...@apache.org> wrote:

> Looks like you are querying the RO table? If so, the query only hits
> parquet file; which was probably generated during the first upsert and all
> others went to the log. Unless compaction runs, it wont show up on ro table
>
> If you want the latest merged view you need to query the RT table.
>
> Does that sound applicable?
>
>
>
> On Fri, Apr 26, 2019 at 3:02 AM satish.sidnakoppa...@gmail.com <
> satish.sidnakoppa...@gmail.com> wrote:
>
>> Writing hudi set as below
>>
>> ds.withColumn("emp_name",lit("upd1
>> Emily")).withColumn("ts",current_timestamp).write.format("com.uber.hoodie")
>> .option(HoodieWriteConfig.TABLE_NAME,"emp_mor_26")
>> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,"emp_id")
>> .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,"MERGE_ON_READ")
>> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "part_by")
>> .option("hoodie.upsert.shuffle.parallelism",4)
>> .mode(SaveMode.Append)
>> .save("/apps/hive/warehouse/emp_mor_26")
>>
>>
>> 1st run - write record 1,"hudi_045",current_timestamp as ts
>> read result -- 1, hudi_045
>> 2nd run - write record 1,"hudi_046",current_timestamp as ts
>> read result -- 1,hudi_046
>> 3rd run -- write record 1, "hoodie_123",current_timestamp as ts
>> read result --- 1,hudi_046
>> 4th run -- write record 1, "hdie_1232324",current_timestamp as ts
>> read result --- 1,hudi_046
>>
>> after multiple updates to same record ,
>> the generated  log.1 has multiple instances of the same record.
>> At this point the updated record is not fetched.
>>
>> 14:45
>> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1
>> - has record that was updated in run 1
>> 15:00
>> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1
>> - has record that was updated in run 2 and run 3
>> 14:41
>> /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata
>> 14:41
>> /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet
>>
>>
>> So is there any compaction to be enabled before reading or while writing .
>>
>>

Reply via email to