[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar reassigned HUDI-651: ----------------------------------- Assignee: sivabalan narayanan (was: Vinoth Chandar) > Incremental Query on Hive via Spark SQL does not return expected results > ------------------------------------------------------------------------ > > Key: HUDI-651 > URL: https://issues.apache.org/jira/browse/HUDI-651 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration > Reporter: Vinoth Chandar > Assignee: sivabalan narayanan > Priority: Blocker > Labels: pull-request-available > Fix For: 0.7.0 > > > Using the docker demo, I added two delta commits to a MOR table and was a > hoping to incremental consume them like Hive QL.. Something amiss > {code} > scala> > spark.sparkContext.hadoopConfiguration.set("hoodie.stock_ticks_mor_rt.consume.start.timestamp","20200302210147") > scala> > spark.sparkContext.hadoopConfiguration.set("hoodie.stock_ticks_mor_rt.consume.mode","INCREMENTAL") > scala> spark.sql("select distinct `_hoodie_commit_time` from > stock_ticks_mor_rt").show(100, false) > +-------------------+ > |_hoodie_commit_time| > +-------------------+ > |20200302210010 | > |20200302210147 | > +-------------------+ > scala> sc.setLogLevel("INFO") > scala> spark.sql("select distinct `_hoodie_commit_time` from > stock_ticks_mor_rt").show(100, false) > 20/03/02 21:15:37 INFO aggregate.HashAggregateExec: > spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current > version of codegened fast hashmap does not support this aggregate. > 20/03/02 21:15:37 INFO aggregate.HashAggregateExec: > spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current > version of codegened fast hashmap does not support this aggregate. > 20/03/02 21:15:37 INFO memory.MemoryStore: Block broadcast_44 stored as > values in memory (estimated size 292.3 KB, free 365.3 MB) > 20/03/02 21:15:37 INFO memory.MemoryStore: Block broadcast_44_piece0 stored > as bytes in memory (estimated size 25.4 KB, free 365.3 MB) > 20/03/02 21:15:37 INFO storage.BlockManagerInfo: Added broadcast_44_piece0 in > memory on adhoc-1:45623 (size: 25.4 KB, free: 366.2 MB) > 20/03/02 21:15:37 INFO spark.SparkContext: Created broadcast 44 from > 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Reading hoodie > metadata from path hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Loading > HoodieTableMetaClient from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: > [hdfs://namenode:8020], Config:[Configuration: core-default.xml, > core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, > yarn-site.xml, hdfs-default.xml, hdfs-site.xml, > org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@5a66fc27, > file:/etc/hadoop/hive-site.xml], FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1645984031_1, ugi=root > (auth:SIMPLE)]]] > 20/03/02 21:15:37 INFO table.HoodieTableConfig: Loading table properties from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties > 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Finished Loading Table of > type MERGE_ON_READ(version=1) from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO mapred.FileInputFormat: Total input paths to process : > 1 > 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Found a total of 1 > groups > 20/03/02 21:15:37 INFO timeline.HoodieActiveTimeline: Loaded instants > [[20200302210010__clean__COMPLETED], > [20200302210010__deltacommit__COMPLETED], [20200302210147__clean__COMPLETED], > [20200302210147__deltacommit__COMPLETED]] > 20/03/02 21:15:37 INFO view.HoodieTableFileSystemView: Adding file-groups for > partition :2018/08/31, #FileGroups=1 > 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: addFilesToView: > NumFiles=1, FileGroupsCreationTime=0, StoreTimeTaken=0 > 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Total paths to > process after hoodie filter 1 > 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Reading hoodie > metadata from path hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Loading > HoodieTableMetaClient from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: > [hdfs://namenode:8020], Config:[Configuration: core-default.xml, > core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, > yarn-site.xml, hdfs-default.xml, hdfs-site.xml, > org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@5a66fc27, > file:/etc/hadoop/hive-site.xml], FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1645984031_1, ugi=root > (auth:SIMPLE)]]] > 20/03/02 21:15:37 INFO table.HoodieTableConfig: Loading table properties from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties > 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Finished Loading Table of > type MERGE_ON_READ(version=1) from > hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor > 20/03/02 21:15:37 INFO timeline.HoodieActiveTimeline: Loaded instants > [[20200302210010__clean__COMPLETED], > [20200302210010__deltacommit__COMPLETED], [20200302210147__clean__COMPLETED], > [20200302210147__deltacommit__COMPLETED]] > 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: Building file system > view for partition (2018/08/31) > 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: #files found in > partition (2018/08/31) =3, Time taken =1 > 20/03/02 21:15:37 INFO view.HoodieTableFileSystemView: Adding file-groups for > partition :2018/08/31, #FileGroups=1 > 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: addFilesToView: > NumFiles=3, FileGroupsCreationTime=0, StoreTimeTaken=0 > 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: Time to load > partition (2018/08/31) =2 > 20/03/02 21:15:37 INFO realtime.HoodieParquetRealtimeInputFormat: Returning a > total splits of 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)