[
https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-7617:
--------------------------
Attachment: hashmap-wb-fixes.png
I find that this increased memory usage for small JOINs with this on my VM & I
can't find any perf difference from the shift-size fixes.
Once the JIT kicks in, both pre-patch and post-patch have inline constant
replacements for the "ldiv".
{code}
0x00007fe284b89782: dec ebp
0x00007fe284b89783: mov edx, ebp
0x00007fe284b89785: dec ecx
0x00007fe284b89786: and edx, 0x0000000000008000
0x00007fe284b8978c: dec ecx
0x00007fe284b8978d: mov ecx, ebp
0x00007fe284b8978f: dec eax
0x00007fe284b89790: shr ecx, 0x0000000000000018
{code}
The rest is less clear for me, the new class for IntGetAdaptor has turned off
the inlining for the other GetAdaptor so this is only faster if I have only int
keys in all my JOINs.
If you mix an INT key and a STRING key in the same vertex (not even the same
JOIN cond), then the JIT seems to get a bit confused and turns off all
mono-morphic optimizations that the previous impl had.
This still triggers slow code in copyToStandardObject() before entering the
fast-path.
The first change in perf happens after about ~9k rows, the sampling profiler
seems to turn off a bunch of these optimizations as I'm able to confirm with my
linux perf counters instead.
!hashmap-wb-fixes.png!
I can say one thing for sure, this would've probably helped us if we wrote C++,
where the runtime recompilation with constants do not happen.
I'm not sure whether this patch is useful as long as we use the JVM.
> optimize bytes mapjoin hash table read path wrt serialization, at least for
> common cases
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-7617
> URL: https://issues.apache.org/jira/browse/HIVE-7617
> Project: Hive
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-7617.01.patch, HIVE-7617.02.patch, HIVE-7617.patch,
> HIVE-7617.prelim.patch, hashmap-wb-fixes.png
>
>
> BytesBytes has table stores keys in the byte array for compact
> representation, however that means that the straightforward implementation of
> lookups serializes lookup keys to byte arrays, which is relatively expensive.
> We can either shortcut hashcode and compare for common types on read path
> (integral types which would cover most of the real-world keys), or specialize
> hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever.
> First one seems simpler now.
--
This message was sent by Atlassian JIRA
(v6.2#6252)