[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024866#comment-15024866
 ] 

Zelaine Fong commented on DRILL-4119:
-------------------------------------

[~amansinha100] - when you say "the original XXHash's C implementation and 
based on an initial analysis that one produces different hash value than our 
implementation and does not seem to have the same 'even number' pattern", are 
you saying that our current hash64 implementation is different from the 
original one?  If yes, does that mean something got lost in the translation 
from C to Java?

> Skew in hash distribution for varchar (and possibly other) types of data
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4119
>                 URL: https://issues.apache.org/jira/browse/DRILL-4119
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.3.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02          HashAgg(group=[{0}])
> 01-03            Project(SomeId=[$0])
> 01-04              HashToRandomExchange(dist0=[[$0]])
> 02-01                UnorderedMuxExchange
> 03-01                  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02                    HashAgg(group=[{0}])
> 03-03                      Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to