BALAJI VARADARAJAN created HUDI-258:
---------------------------------------

             Summary: Hive Query engine not supporting join queries between RT 
and RO tables
                 Key: HUDI-258
                 URL: https://issues.apache.org/jira/browse/HUDI-258
             Project: Apache Hudi (incubating)
          Issue Type: Bug
          Components: Hive Integration
            Reporter: BALAJI VARADARAJAN


Description : 
[https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]

 

Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
take INputFormatClass into account. Hence getSplits() is called only once. In 
the case of RO and RT tables, they both have same dataset base-path but differ 
in the InputFormatClass. Due to this, Hive join query is returning weird 
results.

 

=============

The result of the demo is very strange
(Step 6(a))

 

{{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
stock_ticks_mor_rt where  symbol = 'GOOG';
 select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
stock_ticks_mor where  symbol = 'GOOG';}}

return as demo

BUT!

 

{{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  on 
a.key=b.key where a.ts != b.ts
...
+--------+-------+-------+--+
| a.key  | a.ts  | b.ts  |
+--------+-------+-------+--+
+--------+-------+-------+--+}}

 

{{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from 
stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: 
/tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
2019-07-18 09:13:20 Starting to launch local task to process map join;  maximum 
memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
file: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
2019-07-18 09:13:21 Uploaded 1 File to: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
 (317 bytes)
2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
+---------------------+----------------------+----------------------+--+
|        a.key        |         a.ts         |         b.ts         |
+---------------------+----------------------+----------------------+--+
| GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
+---------------------+----------------------+----------------------+--+
1 row selected (7.207 seconds)
0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor a 
join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: 
/tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
2019-07-18 09:13:51 Starting to launch local task to process map join;  maximum 
memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into 
file: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
2019-07-18 09:13:53 Uploaded 1 File to: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
 (317 bytes)
2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
+---------------------+----------------------+----------------------+--+
|        a.key        |         a.ts         |         b.ts         |
+---------------------+----------------------+----------------------+--+
| GOOG_2018-08-31 10  | 2018-08-31 10:59:00  | 2018-08-31 10:59:00  |
+---------------------+----------------------+----------------------+--+}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to