Hi Satish, Those files are understandable i.e it seems your second update went to a log file, then compaction was scheduled, the third one went to a new log file. Only issue could be the the query is not picking up the right InputFormat or the Hive table is not registered with the correct InputFormat/RecordReader.
I know you and Nishith were chatting about this on gh. but did you get the table registered using sync tool and followed steps here http://hudi.apache.org/querying_data.html#hive-rt-view ? Otherwise, it could be that the query engine (I am assuming Hive?) could just ignore all log files since they begin with"." and only reading parquet files/ thanks Vinoth On Mon, Apr 29, 2019 at 9:33 PM SATISH SIDNAKOPPA < satish.sidnakoppa...@gmail.com> wrote: > Hi Vinoth, > > Missed while copying. > PFB the list of files > > 14:45 > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1 > - has record that was updated in run 1 > 15:00 > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1 > - has record that was updated in run 2 and run 3 > 14:41 /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata > 14:41 > /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet > > > On Mon, Apr 29, 2019 at 8:26 PM Vinoth Chandar <vin...@apache.org> wrote: > > > Hi Satish, > > > > There are no parquet files? Can you share the full listing of files in > the > > partition? > > > > Thanks > > Vinoth > > > > On Mon, Apr 29, 2019 at 7:22 AM SATISH SIDNAKOPPA < > > satish.sidnakoppa...@gmail.com> wrote: > > > > > Yes, > > > As this needed discussion ,the thread was created in google groups for > > > inputs. > > > I am unable to read from rt table after multiple updates. > > > > > > 14:45 > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1 > > > -* has record that was updated in run 1* > > > 15:00 > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1 > > > - *has record that was updated in run 2 and run 3* > > > 14:41 > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata > > > 14:41 > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet > > > > > > > > > > > > > > > On Sat, Apr 27, 2019 at 7:24 PM SATISH SIDNAKOPPA < > > > satish.sidnakoppa...@gmail.com> wrote: > > > > > > > No ,the issue is faced with rt table created by sync tool . > > > > > > > > On Fri 26 Apr, 2019, 11:53 PM Vinoth Chandar <vin...@apache.org > wrote: > > > > > > > >> once you registered the rt table, is this working now for you? > > > >> > > > >> On Fri, Apr 26, 2019 at 9:36 AM SATISH SIDNAKOPPA < > > > >> satish.sidnakoppa...@gmail.com> wrote: > > > >> > > > >> > I am querying real time view of the table. > > > >> > This table (emp_mor_26_rt) created after runsync tool. > > > >> > So the first updated record are fetched from log1 file. > > > >> > > > > >> > Only after third update both the updates are placed in log files. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > On Fri 26 Apr, 2019, 6:30 PM Vinoth Chandar <vin...@apache.org > > wrote: > > > >> > > > > >> > > Looks like you are querying the RO table? If so, the query only > > hits > > > >> > > parquet file; which was probably generated during the first > upsert > > > and > > > >> > all > > > >> > > others went to the log. Unless compaction runs, it wont show up > on > > > ro > > > >> > table > > > >> > > > > > >> > > If you want the latest merged view you need to query the RT > table. > > > >> > > > > > >> > > Does that sound applicable? > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Fri, Apr 26, 2019 at 3:02 AM satish.sidnakoppa...@gmail.com > < > > > >> > > satish.sidnakoppa...@gmail.com> wrote: > > > >> > > > > > >> > > > Writing hudi set as below > > > >> > > > > > > >> > > > ds.withColumn("emp_name",lit("upd1 > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > Emily")).withColumn("ts",current_timestamp).write.format("com.uber.hoodie") > > > >> > > > .option(HoodieWriteConfig.TABLE_NAME,"emp_mor_26") > > > >> > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,"emp_id") > > > >> > > > > > > .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,"MERGE_ON_READ") > > > >> > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, > > > >> "part_by") > > > >> > > > .option("hoodie.upsert.shuffle.parallelism",4) > > > >> > > > .mode(SaveMode.Append) > > > >> > > > .save("/apps/hive/warehouse/emp_mor_26") > > > >> > > > > > > >> > > > > > > >> > > > 1st run - write record 1,"hudi_045",current_timestamp as ts > > > >> > > > read result -- 1, hudi_045 > > > >> > > > 2nd run - write record 1,"hudi_046",current_timestamp as ts > > > >> > > > read result -- 1,hudi_046 > > > >> > > > 3rd run -- write record 1, "hoodie_123",current_timestamp as > ts > > > >> > > > read result --- 1,hudi_046 > > > >> > > > 4th run -- write record 1, "hdie_1232324",current_timestamp as > > ts > > > >> > > > read result --- 1,hudi_046 > > > >> > > > > > > >> > > > after multiple updates to same record , > > > >> > > > the generated log.1 has multiple instances of the same > record. > > > >> > > > At this point the updated record is not fetched. > > > >> > > > > > > >> > > > 14:45 > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144153.log.1 > > > >> > > > - has record that was updated in run 1 > > > >> > > > 15:00 > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.278a46f9--87a_20190426144540.log.1 > > > >> > > > - has record that was updated in run 2 and run 3 > > > >> > > > 14:41 > > > >> > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/.hoodie_partition_metadata > > > >> > > > 14:41 > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > /apps/hive/warehouse/emp_mor_26/2019/09/22/278a46f9--87a_0_20190426144153.parquet > > > >> > > > > > > >> > > > > > > >> > > > So is there any compaction to be enabled before reading or > while > > > >> > writing > > > >> > > . > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >