[jira] [Updated] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables
[ https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-258: - Labels: bug-bash-0.6.0 help-requested query-eng user-support-issues (was: bug-bash-0.6.0 help-requested user-support-issues) > Hive Query engine not supporting join queries between RT and RO tables > -- > > Key: HUDI-258 > URL: https://issues.apache.org/jira/browse/HUDI-258 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Balaji Varadarajan >Assignee: Nishith Agarwal >Priority: Major > Labels: bug-bash-0.6.0, help-requested, query-eng, > user-support-issues > > Description : > [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619] > > Root Cause: Hive is tracking getSplits calls by dataset basePath and does not > take INputFormatClass into account. Hence getSplits() is called only once. In > the case of RO and RT tables, they both have same dataset base-path but > differ in the InputFormatClass. Due to this, Hive join query is returning > weird results. > > = > The result of the demo is very strange > (Step 6(a)) > > {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor_rt where symbol = 'GOOG'; > select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor where symbol = 'GOOG';}} > return as demo > BUT! > > {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b > on a.key=b.key where a.ts != b.ts > ... > ++---+---+--+ > | a.key | a.ts | b.ts | > ++---+---+--+ > ++---+---+--+}} > > {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from > stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= > 'GOOG_2018-08-31 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log > 2019-07-18 09:13:20 Starting to launch local task to process map join; > maximum memory = 477626368 > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into > file: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > 2019-07-18 09:13:21 Uploaded 1 File to: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > (317 bytes) > 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec. > +-+--+--+--+ > |a.key| a.ts | b.ts | > +-+--+--+--+ > | GOOG_2018-08-31 10 | 2018-08-31 10:29:00 | 2018-08-31 10:29:00 | > +-+--+--+--+ > 1 row selected (7.207 seconds) > 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor > a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 > 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp
[jira] [Updated] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables
[ https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-258: - Labels: bug-bash-0.6.0 help-requested user-support-issues (was: bug-bash-0.6.0 help-requested) > Hive Query engine not supporting join queries between RT and RO tables > -- > > Key: HUDI-258 > URL: https://issues.apache.org/jira/browse/HUDI-258 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Balaji Varadarajan >Assignee: Nishith Agarwal >Priority: Major > Labels: bug-bash-0.6.0, help-requested, user-support-issues > > Description : > [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619] > > Root Cause: Hive is tracking getSplits calls by dataset basePath and does not > take INputFormatClass into account. Hence getSplits() is called only once. In > the case of RO and RT tables, they both have same dataset base-path but > differ in the InputFormatClass. Due to this, Hive join query is returning > weird results. > > = > The result of the demo is very strange > (Step 6(a)) > > {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor_rt where symbol = 'GOOG'; > select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor where symbol = 'GOOG';}} > return as demo > BUT! > > {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b > on a.key=b.key where a.ts != b.ts > ... > ++---+---+--+ > | a.key | a.ts | b.ts | > ++---+---+--+ > ++---+---+--+}} > > {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from > stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= > 'GOOG_2018-08-31 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log > 2019-07-18 09:13:20 Starting to launch local task to process map join; > maximum memory = 477626368 > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into > file: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > 2019-07-18 09:13:21 Uploaded 1 File to: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > (317 bytes) > 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec. > +-+--+--+--+ > |a.key| a.ts | b.ts | > +-+--+--+--+ > | GOOG_2018-08-31 10 | 2018-08-31 10:29:00 | 2018-08-31 10:29:00 | > +-+--+--+--+ > 1 row selected (7.207 seconds) > 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor > a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 > 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp/root/root_20190718091348_72a5fc30-fc04-41c1
[jira] [Updated] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables
[ https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-258: Labels: bug-bash-0.6.0 help-requested (was: ) > Hive Query engine not supporting join queries between RT and RO tables > -- > > Key: HUDI-258 > URL: https://issues.apache.org/jira/browse/HUDI-258 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Hive Integration >Reporter: Balaji Varadarajan >Assignee: Nishith Agarwal >Priority: Major > Labels: bug-bash-0.6.0, help-requested > > Description : > [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619] > > Root Cause: Hive is tracking getSplits calls by dataset basePath and does not > take INputFormatClass into account. Hence getSplits() is called only once. In > the case of RO and RT tables, they both have same dataset base-path but > differ in the InputFormatClass. Due to this, Hive join query is returning > weird results. > > = > The result of the demo is very strange > (Step 6(a)) > > {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor_rt where symbol = 'GOOG'; > select `_hoodie_commit_time`, symbol, ts, volume, open, close from > stock_ticks_mor where symbol = 'GOOG';}} > return as demo > BUT! > > {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b > on a.key=b.key where a.ts != b.ts > ... > ++---+---+--+ > | a.key | a.ts | b.ts | > ++---+---+--+ > ++---+---+--+}} > > {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from > stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= > 'GOOG_2018-08-31 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log > 2019-07-18 09:13:20 Starting to launch local task to process map join; > maximum memory = 477626368 > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into > file: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > 2019-07-18 09:13:21 Uploaded 1 File to: > file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable > (317 bytes) > 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec. > +-+--+--+--+ > |a.key| a.ts | b.ts | > +-+--+--+--+ > | GOOG_2018-08-31 10 | 2018-08-31 10:29:00 | 2018-08-31 10:29:00 | > +-+--+--+--+ > 1 row selected (7.207 seconds) > 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor > a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 > 10'; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Execution log at: > /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log > 2019-07-18 09:13:51 Starting to launch loca