[ 
https://issues.apache.org/jira/browse/PIG-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734545#comment-13734545
 ] 

Sergey commented on PIG-3409:
-----------------------------

I've used murmur3 128bit from google guava and got 10x perforance. I've joined 
by hashes. 
In general I can't do it because of possible collisions. In my case join keys 
do fit in 128 bits. 
I've tried to read apache pig code and failed. If someone can help me and 
answer my questions by email I can try to create prototype.

                
> org.apache.pig.data.DefaultTuple hashcode perfomance issue
> ----------------------------------------------------------
>
>                 Key: PIG-3409
>                 URL: https://issues.apache.org/jira/browse/PIG-3409
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11
>            Reporter: Sergey
>            Priority: Critical
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> I've met serious perfomance issue.
> please see visualvm screenshot.
> Here is hashCode implementation from the class:
> {code}
>  @Override
>     public int hashCode() {
>         int hash = 17;
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object o = it.next();
>             if (o != null) {
>                 hash = 31 * hash + o.hashCode();
>             }
>         }
>         return hash;
>     }
> {code}
> I don't see any reason here to iterate over the whole tuple, aggregate hash 
> value and then return it.
> I can fix it, if it's possible to take part in dev process. I'm new to it :(
> The idea for any join:
> If we have a plan we know for sure which relations would be joined.
> It means that we can precalculate hashcode values.
> The difference is: m+n hashcode calculations or m*n (current implementation).
> It think it should bring significant perfomance boost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to