[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

Gopal V (JIRA) Wed, 25 May 2016 23:09:39 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301587#comment-15301587
 ]


Gopal V commented on HIVE-13818:
--------------------------------

TPC-DS 55 is the one that failed - I will try to get a smaller repro for this, 
but it looks like it should fail in a two row test-case (or produce incorrect 
results at least).

The larger range join keys in hive-testbench were upgraded to BigInt sometime 
during the 100Tb testing, you might need to undo some schema changes there to 
repro this - you might not be testing the Integer:Integer join scenario.

I'm repeating this from memory from debugging this a couple of nights ago 
(might build a repro tomorrow).

{code}
    if (keyBinarySortableDeserializeRead.readCheckNull()) {
      return;
    }

    long key = VectorMapJoinFastLongHashUtil.deserializeLongKey(
                            keyBinarySortableDeserializeRead, hashTableKeyType);
{code}

As explained above, looks like the BinarySortableSerDe handles Long and Integer 
differently, so just because the Join ops says LongLongInner, the deserializer 
for Long cannot be used for joins involving integers.

This is *not* an issue with LazyBinary, only BinarySortable encodes Long and 
Integers differently. In all the runs I could manage, the join worked whenever 
I cast up to bigint.

The problem seems to be that readCheckNull() does not know of what the actual 
hashTableKeyType here & reads a Long out of an encoded Int & runs out of bytes 
to read (i.e not 8 bytes).

>From the readCheckNull(), this is where it goes into the deep end.

{code}
    /*
     * We have a field and are positioned to it.  Read it.
     */
...
        case INT:
      {
        final boolean invert = columnSortOrderIsDesc[fieldIndex];
        int v = inputByteBuffer.read(invert) ^ 0x80;
        for (int i = 0; i < 3; i++) {
          v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
        }
        currentInt = v;
      }
      break;
    case LONG:
      {
        final boolean invert = columnSortOrderIsDesc[fieldIndex];
        long v = inputByteBuffer.read(invert) ^ 0x80;
        for (int i = 0; i < 7; i++) {
          v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
        }
        currentLong = v;
      }
      break;
{code}

The integer:integer join case hits the 2nd case expression there and throws an 
EOF. 

Changing all joins to Long:Long allows me to run queries successfully.

> Fast Vector MapJoin not enhanced to use sortOrder when handling 
> BinarySortable keys for Small Table?
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13818
>                 URL: https://issues.apache.org/jira/browse/HIVE-13818
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not 
> this issue according to Gopal/Rajesh/Nita.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13818) Fast Vector MapJoin not enhanced to use sortOrder when handling BinarySortable keys for Small Table?

Reply via email to