[ https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301587#comment-15301587 ]
Gopal V commented on HIVE-13818: -------------------------------- TPC-DS 55 is the one that failed - I will try to get a smaller repro for this, but it looks like it should fail in a two row test-case (or produce incorrect results at least). The larger range join keys in hive-testbench were upgraded to BigInt sometime during the 100Tb testing, you might need to undo some schema changes there to repro this - you might not be testing the Integer:Integer join scenario. I'm repeating this from memory from debugging this a couple of nights ago (might build a repro tomorrow). {code} if (keyBinarySortableDeserializeRead.readCheckNull()) { return; } long key = VectorMapJoinFastLongHashUtil.deserializeLongKey( keyBinarySortableDeserializeRead, hashTableKeyType); {code} As explained above, looks like the BinarySortableSerDe handles Long and Integer differently, so just because the Join ops says LongLongInner, the deserializer for Long cannot be used for joins involving integers. This is *not* an issue with LazyBinary, only BinarySortable encodes Long and Integers differently. In all the runs I could manage, the join worked whenever I cast up to bigint. The problem seems to be that readCheckNull() does not know of what the actual hashTableKeyType here & reads a Long out of an encoded Int & runs out of bytes to read (i.e not 8 bytes). >From the readCheckNull(), this is where it goes into the deep end. {code} /* * We have a field and are positioned to it. Read it. */ ... case INT: { final boolean invert = columnSortOrderIsDesc[fieldIndex]; int v = inputByteBuffer.read(invert) ^ 0x80; for (int i = 0; i < 3; i++) { v = (v << 8) + (inputByteBuffer.read(invert) & 0xff); } currentInt = v; } break; case LONG: { final boolean invert = columnSortOrderIsDesc[fieldIndex]; long v = inputByteBuffer.read(invert) ^ 0x80; for (int i = 0; i < 7; i++) { v = (v << 8) + (inputByteBuffer.read(invert) & 0xff); } currentLong = v; } break; {code} The integer:integer join case hits the 2nd case expression there and throws an EOF. Changing all joins to Long:Long allows me to run queries successfully. > Fast Vector MapJoin not enhanced to use sortOrder when handling > BinarySortable keys for Small Table? > ---------------------------------------------------------------------------------------------------- > > Key: HIVE-13818 > URL: https://issues.apache.org/jira/browse/HIVE-13818 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-13818.01.patch, HIVE-13818.02.patch > > > Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not > this issue according to Gopal/Rajesh/Nita. -- This message was sent by Atlassian JIRA (v6.3.4#6332)