subject:"Review Request 64688\: HIVE\-18218"

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Deepak Jaiswal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/
---

(Updated Feb. 10, 2018, 5:48 a.m.)


Review request for hive, Ashutosh Chauhan and Jason Dere.


Changes
---

Missed the SparkOnYarn test result for auto_sortmerge_join_16.q
Fixed test bucket_mapjoin_mismatch1.q to work with new logic for file name 
format.


Repository: hive-git


Description
---

Bucket based Join : Handle buckets with no splits.

The current logic in CustomPartitionVertex assumes that there is a split for 
each bucket whereas in Tez, we can have no splits for empty buckets.
Also falls back to reduceside join if small table has more buckets than big 
table.

Disallow loading files in bucketed tables if the file name format is not like 
00_0, 01_0_copy_1 etc.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
26afe90faa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
ef5e7edcd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
dc698c8de8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
54f5bab6de 
  ql/src/test/queries/clientnegative/bucket_mapjoin_mismatch1.q 5f653bc9bb 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 
  ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out b9c2e6f827 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
91408df129 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out 
9939e834bd 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
91408df129 


Diff: https://reviews.apache.org/r/64688/diff/5/

Changes: https://reviews.apache.org/r/64688/diff/4-5/


Testing
---


Thanks,

Deepak Jaiswal

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Deepak Jaiswal



> On Feb. 10, 2018, 2:44 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java
> > Line 548 (original), 579 (patched)
> > 
> >
> > If a bucket file is missing in the list of files, then bucketNum < 
> > numBuckets .. so this will trigger the fallback loop below?

Yes. It is part of the fallback logic which is basically the existing logic and 
works with existing user data.

If a file is missing, the join is screwed up, just like it is right now. With 
the current naming convention it is not possible to identify a file with its 
name.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/#review197216
---


On Feb. 10, 2018, 12:41 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64688/
> ---
> 
> (Updated Feb. 10, 2018, 12:41 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucket based Join : Handle buckets with no splits.
> 
> The current logic in CustomPartitionVertex assumes that there is a split for 
> each bucket whereas in Tez, we can have no splits for empty buckets.
> Also falls back to reduceside join if small table has more buckets than big 
> table.
> 
> Disallow loading files in bucketed tables if the file name format is not like 
> 00_0, 01_0_copy_1 etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> 26afe90faa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
> ef5e7edcd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> dc698c8de8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 54f5bab6de 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
> 91408df129 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
> 91408df129 
> 
> 
> Diff: https://reviews.apache.org/r/64688/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/#review197216
---




ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java
Line 548 (original), 579 (patched)


If a bucket file is missing in the list of files, then bucketNum < 
numBuckets .. so this will trigger the fallback loop below?


- Jason Dere


On Feb. 10, 2018, 12:41 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64688/
> ---
> 
> (Updated Feb. 10, 2018, 12:41 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucket based Join : Handle buckets with no splits.
> 
> The current logic in CustomPartitionVertex assumes that there is a split for 
> each bucket whereas in Tez, we can have no splits for empty buckets.
> Also falls back to reduceside join if small table has more buckets than big 
> table.
> 
> Disallow loading files in bucketed tables if the file name format is not like 
> 00_0, 01_0_copy_1 etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> 26afe90faa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
> ef5e7edcd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> dc698c8de8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 54f5bab6de 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
> 91408df129 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
> 91408df129 
> 
> 
> Diff: https://reviews.apache.org/r/64688/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Deepak Jaiswal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/
---

(Updated Feb. 10, 2018, 12:41 a.m.)


Review request for hive, Ashutosh Chauhan and Jason Dere.


Changes
---

Added explain plan of the query with and without SMB. The one with SMB does 
shuffle join.


Repository: hive-git


Description
---

Bucket based Join : Handle buckets with no splits.

The current logic in CustomPartitionVertex assumes that there is a split for 
each bucket whereas in Tez, we can have no splits for empty buckets.
Also falls back to reduceside join if small table has more buckets than big 
table.

Disallow loading files in bucketed tables if the file name format is not like 
00_0, 01_0_copy_1 etc.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
26afe90faa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
ef5e7edcd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
dc698c8de8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
54f5bab6de 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
91408df129 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
91408df129 


Diff: https://reviews.apache.org/r/64688/diff/4/

Changes: https://reviews.apache.org/r/64688/diff/3-4/


Testing
---


Thanks,

Deepak Jaiswal

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Deepak Jaiswal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/
---

(Updated Feb. 10, 2018, 12:05 a.m.)


Review request for hive, Ashutosh Chauhan and Jason Dere.


Changes
---

Fixed following issues,
- Handled the case when small table has more buckets than big table by taking 
mod of obtained bucket id
- Handled the fallback case for old logic when bigt table has more buckets than 
smaller table(s)
- Updated auto_sortmerge_join_16. This test would fail with SMB by default due 
to missing buckets but now gives correct results.
- Reverted all the updated tests which originally tested small tables with more 
buckets.


Repository: hive-git


Description
---

Bucket based Join : Handle buckets with no splits.

The current logic in CustomPartitionVertex assumes that there is a split for 
each bucket whereas in Tez, we can have no splits for empty buckets.
Also falls back to reduceside join if small table has more buckets than big 
table.

Disallow loading files in bucketed tables if the file name format is not like 
00_0, 01_0_copy_1 etc.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
26afe90faa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
ef5e7edcd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
dc698c8de8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
54f5bab6de 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
91408df129 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
91408df129 


Diff: https://reviews.apache.org/r/64688/diff/3/

Changes: https://reviews.apache.org/r/64688/diff/2-3/


Testing
---


Thanks,

Deepak Jaiswal

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/#review197177
---




ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out
Line 75 (original), 56 (patched)


How does the output from this test change so much? What changed here?


- Jason Dere


On Feb. 8, 2018, 10:09 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64688/
> ---
> 
> (Updated Feb. 8, 2018, 10:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucket based Join : Handle buckets with no splits.
> 
> The current logic in CustomPartitionVertex assumes that there is a split for 
> each bucket whereas in Tez, we can have no splits for empty buckets.
> Also falls back to reduceside join if small table has more buckets than big 
> table.
> 
> Disallow loading files in bucketed tables if the file name format is not like 
> 00_0, 01_0_copy_1 etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> 26afe90faa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
> ef5e7edcd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> dc698c8de8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 54f5bab6de 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_2.q e5fdcb57e4 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_4.q abf09e5534 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_5.q b85c4a7aa3 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_7.q bd780861e3 
>   ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 
> b9c2e6f827 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 5cfc35aa73 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 0d586fd26b 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 45704d1253 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out 1959075912 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_2.q.out 
> 054b0d00be 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_4.q.out 
> 95d329862c 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_5.q.out 
> e711715aa5 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_7.q.out 
> 53c685cb11 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
> 8cfa113794 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
> fce5e0cfc4 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
> 8250eca099 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
> eb813c1734 
> 
> 
> Diff: https://reviews.apache.org/r/64688/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Jason Dere



> On Feb. 9, 2018, 5:59 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
> > Lines 582 (patched)
> > 
> >
> > Thought different number of buckets was supposed to work as long as the 
> > buckets were a multiple of each other. So this case doesn't work even if 
> > the num small buckets is a multiple of the big table buckets?

If this case is broken in the Hive-on-Tez case, can you open a followup bug for 
this issue?


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/#review197145
---


On Feb. 8, 2018, 10:09 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64688/
> ---
> 
> (Updated Feb. 8, 2018, 10:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucket based Join : Handle buckets with no splits.
> 
> The current logic in CustomPartitionVertex assumes that there is a split for 
> each bucket whereas in Tez, we can have no splits for empty buckets.
> Also falls back to reduceside join if small table has more buckets than big 
> table.
> 
> Disallow loading files in bucketed tables if the file name format is not like 
> 00_0, 01_0_copy_1 etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> 26afe90faa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
> ef5e7edcd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> dc698c8de8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 54f5bab6de 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_2.q e5fdcb57e4 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_4.q abf09e5534 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_5.q b85c4a7aa3 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_7.q bd780861e3 
>   ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 
> b9c2e6f827 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 5cfc35aa73 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 0d586fd26b 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 45704d1253 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out 1959075912 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_2.q.out 
> 054b0d00be 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_4.q.out 
> 95d329862c 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_5.q.out 
> e711715aa5 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_7.q.out 
> 53c685cb11 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
> 8cfa113794 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
> fce5e0cfc4 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
> 8250eca099 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
> eb813c1734 
> 
> 
> Diff: https://reviews.apache.org/r/64688/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Re: Review Request 64688: HIVE-18218

2018-02-09 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/#review197145
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
Lines 582 (patched)


Thought different number of buckets was supposed to work as long as the 
buckets were a multiple of each other. So this case doesn't work even if the 
num small buckets is a multiple of the big table buckets?


- Jason Dere


On Feb. 8, 2018, 10:09 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64688/
> ---
> 
> (Updated Feb. 8, 2018, 10:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucket based Join : Handle buckets with no splits.
> 
> The current logic in CustomPartitionVertex assumes that there is a split for 
> each bucket whereas in Tez, we can have no splits for empty buckets.
> Also falls back to reduceside join if small table has more buckets than big 
> table.
> 
> Disallow loading files in bucketed tables if the file name format is not like 
> 00_0, 01_0_copy_1 etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> 26afe90faa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
> ef5e7edcd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> dc698c8de8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 54f5bab6de 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_2.q e5fdcb57e4 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_4.q abf09e5534 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_5.q b85c4a7aa3 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_7.q bd780861e3 
>   ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 
> b9c2e6f827 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 5cfc35aa73 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 0d586fd26b 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 45704d1253 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out 1959075912 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_2.q.out 
> 054b0d00be 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_4.q.out 
> 95d329862c 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_5.q.out 
> e711715aa5 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_7.q.out 
> 53c685cb11 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
> 8cfa113794 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
> fce5e0cfc4 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
> 8250eca099 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
> eb813c1734 
> 
> 
> Diff: https://reviews.apache.org/r/64688/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Re: Review Request 64688: HIVE-18218

2018-02-08 Thread Deepak Jaiswal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/
---

(Updated Feb. 8, 2018, 10:09 p.m.)


Review request for hive, Ashutosh Chauhan and Jason Dere.


Repository: hive-git


Description (updated)
---

Bucket based Join : Handle buckets with no splits.

The current logic in CustomPartitionVertex assumes that there is a split for 
each bucket whereas in Tez, we can have no splits for empty buckets.
Also falls back to reduceside join if small table has more buckets than big 
table.

Disallow loading files in bucketed tables if the file name format is not like 
00_0, 01_0_copy_1 etc.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
26afe90faa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
ef5e7edcd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
dc698c8de8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
54f5bab6de 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_2.q e5fdcb57e4 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_4.q abf09e5534 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_5.q b85c4a7aa3 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_7.q bd780861e3 
  ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out b9c2e6f827 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 5cfc35aa73 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 0d586fd26b 
  ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 45704d1253 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out 1959075912 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_2.q.out 
054b0d00be 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_4.q.out 
95d329862c 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_5.q.out 
e711715aa5 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_7.q.out 
53c685cb11 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
8cfa113794 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
fce5e0cfc4 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
8250eca099 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
eb813c1734 


Diff: https://reviews.apache.org/r/64688/diff/2/

Changes: https://reviews.apache.org/r/64688/diff/1-2/


Testing
---


Thanks,

Deepak Jaiswal

Review Request 64688: HIVE-18218

2017-12-18 Thread Deepak Jaiswal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64688/
---

Review request for hive, Gopal V, Gunther Hagleitner, and Jason Dere.


Repository: hive-git


Description
---

SMB Join : Handle buckets with no splits.

The current logic in CustomPartitionVertex assumes that there is a split for 
each bucket whereas in Tez, we can have no splits for empty buckets.
Also falls back to reduceside join if small table has more buckets than big 
table.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
8974e9b79b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java 
5dd7bf3f1c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java e4a6f627d1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
0c6e1e0288 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 12ab1fa1d1 
  ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
cb8564fd78 


Diff: https://reviews.apache.org/r/64688/diff/1/


Testing
---


Thanks,

Deepak Jaiswal

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Re: Review Request 64688: HIVE-18218

Review Request 64688: HIVE-18218

10 matches

Site Navigation

Mail list logo

Footer information