[jira] [Created] (HIVE-16499) [Tez] CommonMergeJoin Operator is taking longer to join rows as compared to MR

Adesh Kumar Rao (JIRA) Fri, 21 Apr 2017 05:05:22 -0700

Adesh Kumar Rao created HIVE-16499:
--------------------------------------

             Summary: [Tez] CommonMergeJoin Operator is taking longer to join 
rows as compared to MR
                 Key: HIVE-16499
                 URL: https://issues.apache.org/jira/browse/HIVE-16499
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Adesh Kumar Rao



It can be reproduced by a reduce side join (Using the patch available in 
HIVE-16498 as reading useless data will mask the longer time taken issue 
described here).
The data for large_table is generated by following shell script and a table can 
be created from the file `large.txt`
{code:java}
for (( j=1 ; j <=20; j++))
do
  for (( i=1; i <= 1000000; i++ ))
  do
    echo "$i,$j" >> large.txt
  done
done
{code}
{code:java}
create external table large_table ( i int, j int) row format delimited fields 
terminated by ',' location "hdfs://<some-hdfs-location>";

set hive.auto.convert.join=false; -- So that reduce side join is used instead 
of MapJoin

select * from large_table a join large_table b on a,j = b.j limit 100;
{code}

The issue is different from HIVE-16498 as Tez is taking time in join operator 
instead of reading extra data.
Applied the patch available for HIVE-16498 and ran the above join query. It is 
taking around 30-40 minutes as compared to 5 minutes on MR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16499) [Tez] CommonMergeJoin Operator is taking longer to join rows as compared to MR

Reply via email to