[jira] [Updated] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases

Gopal V (JIRA) Thu, 14 Aug 2014 00:49:16 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V updated HIVE-7617:
--------------------------

    Attachment: hashmap-wb-fixes.png

I find that this increased memory usage for small JOINs with this on my VM & I 
can't find any perf difference from the shift-size fixes.

Once the JIT kicks in, both pre-patch and post-patch have inline constant 
replacements for the "ldiv".

{code}
  0x00007fe284b89782: dec   ebp
  0x00007fe284b89783: mov   edx, ebp
  0x00007fe284b89785: dec   ecx
  0x00007fe284b89786: and   edx, 0x0000000000008000
  0x00007fe284b8978c: dec   ecx
  0x00007fe284b8978d: mov   ecx, ebp
  0x00007fe284b8978f: dec   eax
  0x00007fe284b89790: shr   ecx, 0x0000000000000018
{code}

The rest is less clear for me, the new class for IntGetAdaptor has turned off 
the inlining for the other GetAdaptor so this is only faster if I have only int 
keys in all my JOINs.

If you mix an INT key and a STRING key in the same vertex (not even the same 
JOIN cond), then the JIT seems to get a bit confused and turns off all 
mono-morphic optimizations that the previous impl had.

This still triggers slow code in copyToStandardObject() before entering the 
fast-path.

The first change in perf happens after about ~9k rows, the sampling profiler 
seems to turn off a bunch of these optimizations as I'm able to confirm with my 
linux perf counters instead.

!hashmap-wb-fixes.png!

I can say one thing for sure, this would've probably helped us if we wrote C++, 
where the runtime recompilation with constants do not happen.

I'm not sure whether this patch is useful as long as we use the JVM.

> optimize bytes mapjoin hash table read path wrt serialization, at least for 
> common cases
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-7617
>                 URL: https://issues.apache.org/jira/browse/HIVE-7617
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-7617.01.patch, HIVE-7617.02.patch, HIVE-7617.patch, 
> HIVE-7617.prelim.patch, hashmap-wb-fixes.png
>
>
> BytesBytes has table stores keys in the byte array for compact 
> representation, however that means that the straightforward implementation of 
> lookups serializes lookup keys to byte arrays, which is relatively expensive.
> We can either shortcut hashcode and compare for common types on read path 
> (integral types which would cover most of the real-world keys), or specialize 
> hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever. 
> First one seems simpler now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases

Reply via email to