[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302844#comment-15302844
 ] 

Gopal V commented on HIVE-13818:


The bug is limited to Fast hashtables

{code}
hive.mapjoin.hybridgrace.hashtable=false;
hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;
{code}

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch, vector_bug.q, 
> vector_bug.q.out
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301770#comment-15301770
 ] 

Matt McCline commented on HIVE-13818:
-

[~gopalv] Thank you very much for working on a repro.

I've attached vector_bug.q and its Tez output.  The bug isn't triggered.  Can 
you see what I did wrong?  Thanks

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301635#comment-15301635
 ] 

Gopal V commented on HIVE-13818:


Here's the smallest scenario which triggers the issue right now.

{code}
create temporary table x (a int) stored as orc;
create temporary table y (b int) stored as orc;
insert into x values(1);
insert into y values(1);
select count(1) from x, y where a = b;

Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
at 
org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableDeserializeRead.readCheckNull(BinarySortableDeserializeRead.java:182)
at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashTable.putRow(VectorMapJoinFastLongHashTable.java:81)
at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.putRow(VectorMapJoinFastTableContainer.java:181)
at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:98)
{code}

To test the theory, I tried with

{code}
create temporary table x1 (a bigint) stored as orc;
create temporary table y1 (b bigint) stored as orc;
insert into x1 values(1);
insert into y1 values(1);
select count(1) from x1, y1 where a = b;

OK
1
Time taken: 1.532 seconds, Fetched: 1 row(s)
{code}

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301587#comment-15301587
 ] 

Gopal V commented on HIVE-13818:


TPC-DS 55 is the one that failed - I will try to get a smaller repro for this, 
but it looks like it should fail in a two row test-case (or produce incorrect 
results at least).

The larger range join keys in hive-testbench were upgraded to BigInt sometime 
during the 100Tb testing, you might need to undo some schema changes there to 
repro this - you might not be testing the Integer:Integer join scenario.

I'm repeating this from memory from debugging this a couple of nights ago 
(might build a repro tomorrow).

{code}
if (keyBinarySortableDeserializeRead.readCheckNull()) {
  return;
}

long key = VectorMapJoinFastLongHashUtil.deserializeLongKey(
keyBinarySortableDeserializeRead, hashTableKeyType);
{code}

As explained above, looks like the BinarySortableSerDe handles Long and Integer 
differently, so just because the Join ops says LongLongInner, the deserializer 
for Long cannot be used for joins involving integers.

This is *not* an issue with LazyBinary, only BinarySortable encodes Long and 
Integers differently. In all the runs I could manage, the join worked whenever 
I cast up to bigint.

The problem seems to be that readCheckNull() does not know of what the actual 
hashTableKeyType here & reads a Long out of an encoded Int & runs out of bytes 
to read (i.e not 8 bytes).

>From the readCheckNull(), this is where it goes into the deep end.

{code}
/*
 * We have a field and are positioned to it.  Read it.
 */
...
case INT:
  {
final boolean invert = columnSortOrderIsDesc[fieldIndex];
int v = inputByteBuffer.read(invert) ^ 0x80;
for (int i = 0; i < 3; i++) {
  v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
}
currentInt = v;
  }
  break;
case LONG:
  {
final boolean invert = columnSortOrderIsDesc[fieldIndex];
long v = inputByteBuffer.read(invert) ^ 0x80;
for (int i = 0; i < 7; i++) {
  v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
}
currentLong = v;
  }
  break;
{code}

The integer:integer join case hits the 2nd case expression there and throws an 
EOF. 

Changing all joins to Long:Long allows me to run queries successfully.

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-25 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301573#comment-15301573
 ] 

Matt McCline commented on HIVE-13818:
-

[~gopalv] [~ndembla] [~rajesh.balamohan] I need a simple repro for this.  A Q 
file and some data.  I just loaded up 1 Gb QE test data and TPCDS-15 ran fine.  
So, I'm missing something.  Thanks

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296167#comment-15296167
 ] 

Gopal V commented on HIVE-13818:


This specific case looks like it internally confuses an {{int:int}} join asa a 
{{long:long}} join.

Both are handled by same vector ops for joins, but need different ser/deser 
codepaths when reading off the SerDe into the LongColumnVector.

There's no real difference once the deserialization is complete.

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

2016-05-23 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296154#comment-15296154
 ] 

Matt McCline commented on HIVE-13818:
-

Added try/catch and detailed information to deserializing in Fast Vector 
MapJoin hash table during addRow, too.  But this probably will not provide too 
much information here.

But yes, the BinarySortableDeserializeRead is not configured with the sortOrder 
in this case and this could be the bug.

[~gopalv]  thanks for your comments.

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> 
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)