Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

SATISH SIDNAKOPPA Sat, 27 Apr 2019 06:56:05 -0700

No ,the issue is faced with rt table created by sync tool .

On Fri 26 Apr, 2019, 11:53 PM Vinoth Chandar <[email protected] wrote:


> once you registered the rt table, is this working now for you?
>
> On Fri, Apr 26, 2019 at 9:36 AM SATISH SIDNAKOPPA <
> [email protected]> wrote:
>
> > I am querying real time view of the table.
> > This table (emp_mor_26_rt) created after runsync tool.
> > So the first updated record are fetched from log1 file.
> >
> > Only after third update both the updates are placed in log files.
> >
> >
> >
> >
> > On Fri 26 Apr, 2019, 6:30 PM Vinoth Chandar <[email protected] wrote:
> >
> > > Looks like you are querying the RO table? If so, the query only hits
> > > parquet file; which was probably generated during the first upsert and
> > all
> > > others went to the log. Unless compaction runs, it wont show up on ro
> > table
> > >
> > > If you want the latest merged view you need to query the RT table.
> > >
> > > Does that sound applicable?
> > >
> > >
> > >
> > > On Fri, Apr 26, 2019 at 3:02 AM [email protected] <
> > > [email protected]> wrote:
> > >
> > > > Writing hudi set as below
> > > >
> > > > ds.withColumn("emp_name",lit("upd1
> > > >
> > >
> >
> Emily")).withColumn("ts",current_timestamp).write.format("com.uber.hoodie")
> > > > .option(HoodieWriteConfig.TABLE_NAME,"emp_mor_26")
> > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,"emp_id")
> > > > .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,"MERGE_ON_READ")
> > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
> "part_by")
> > > > .option("hoodie.upsert.shuffle.parallelism",4)
> > > > .mode(SaveMode.Append)
> > > > .save("/apps/hive/warehouse/emp_mor_26")
> > > >
> > > >
> > > > 1st run - write record 1,"hudi_045",current_timestamp as ts
> > > > read result -- 1, hudi_045
> > > > 2nd run - write record 1,"hudi_046",current_timestamp as ts
> > > > read result -- 1,hudi_046
> > > > 3rd run -- write record 1, "hoodie_123",current_timestamp as ts
> > > > read result --- 1,hudi_046
> > > > 4th run -- write record 1, "hdie_1232324",current_timestamp as ts
> > > > read result --- 1,hudi_046
> > > >
> > > > after multiple updates to same record ,
> > > > the generated  log.1 has multiple instances of the same record.
> > > > At this point the updated record is not fetched.
> > > >
> > > > 14:45
> > > >
> > >
> >
> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1
> > > > - has record that was updated in run 1
> > > > 15:00
> > > >
> > >
> >
> /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1
> > > > - has record that was updated in run 2 and run 3
> > > > 14:41
> > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata
> > > > 14:41
> > > >
> > >
> >
> /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet
> > > >
> > > >
> > > > So is there any compaction to be enabled before reading or while
> > writing
> > > .
> > > >
> > > >
> > >
> >
>

Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

Reply via email to