Yes, As this needed discussion ,the thread was created in google groups for inputs. I am unable to read from rt table after multiple updates.
14:45 /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1 -* has record that was updated in run 1* 15:00 /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1 - *has record that was updated in run 2 and run 3* 14:41 /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata 14:41 /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet On Sat, Apr 27, 2019 at 7:24 PM SATISH SIDNAKOPPA < [email protected]> wrote: > No ,the issue is faced with rt table created by sync tool . > > On Fri 26 Apr, 2019, 11:53 PM Vinoth Chandar <[email protected] wrote: > >> once you registered the rt table, is this working now for you? >> >> On Fri, Apr 26, 2019 at 9:36 AM SATISH SIDNAKOPPA < >> [email protected]> wrote: >> >> > I am querying real time view of the table. >> > This table (emp_mor_26_rt) created after runsync tool. >> > So the first updated record are fetched from log1 file. >> > >> > Only after third update both the updates are placed in log files. >> > >> > >> > >> > >> > On Fri 26 Apr, 2019, 6:30 PM Vinoth Chandar <[email protected] wrote: >> > >> > > Looks like you are querying the RO table? If so, the query only hits >> > > parquet file; which was probably generated during the first upsert and >> > all >> > > others went to the log. Unless compaction runs, it wont show up on ro >> > table >> > > >> > > If you want the latest merged view you need to query the RT table. >> > > >> > > Does that sound applicable? >> > > >> > > >> > > >> > > On Fri, Apr 26, 2019 at 3:02 AM [email protected] < >> > > [email protected]> wrote: >> > > >> > > > Writing hudi set as below >> > > > >> > > > ds.withColumn("emp_name",lit("upd1 >> > > > >> > > >> > >> Emily")).withColumn("ts",current_timestamp).write.format("com.uber.hoodie") >> > > > .option(HoodieWriteConfig.TABLE_NAME,"emp_mor_26") >> > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,"emp_id") >> > > > .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,"MERGE_ON_READ") >> > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, >> "part_by") >> > > > .option("hoodie.upsert.shuffle.parallelism",4) >> > > > .mode(SaveMode.Append) >> > > > .save("/apps/hive/warehouse/emp_mor_26") >> > > > >> > > > >> > > > 1st run - write record 1,"hudi_045",current_timestamp as ts >> > > > read result -- 1, hudi_045 >> > > > 2nd run - write record 1,"hudi_046",current_timestamp as ts >> > > > read result -- 1,hudi_046 >> > > > 3rd run -- write record 1, "hoodie_123",current_timestamp as ts >> > > > read result --- 1,hudi_046 >> > > > 4th run -- write record 1, "hdie_1232324",current_timestamp as ts >> > > > read result --- 1,hudi_046 >> > > > >> > > > after multiple updates to same record , >> > > > the generated log.1 has multiple instances of the same record. >> > > > At this point the updated record is not fetched. >> > > > >> > > > 14:45 >> > > > >> > > >> > >> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1 >> > > > - has record that was updated in run 1 >> > > > 15:00 >> > > > >> > > >> > >> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1 >> > > > - has record that was updated in run 2 and run 3 >> > > > 14:41 >> > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata >> > > > 14:41 >> > > > >> > > >> > >> /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet >> > > > >> > > > >> > > > So is there any compaction to be enabled before reading or while >> > writing >> > > . >> > > > >> > > > >> > > >> > >> >
