[ 
https://issues.apache.org/jira/browse/CALCITE-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391838#comment-17391838
 ] 

Haisheng Yuan commented on CALCITE-4712:
----------------------------------------

(1, null) and (2, null) will have different hash value, so when redistributing, 
they will go to different bucket. The flag is useful for outer joins. For 
example, "select * from foo outer join bar on foo.a = bar.a",  bar.a may have a 
lot of null value, a.k.a. skew value, when doing outer join, there will be long 
tail on the worker that have a lot of bar.a=null values when null are 
colocated. So instead we can request a hash distribution that does't colocate 
null value, so that the computation are evenly distributed. For group by, the 
hash distribution always requires "nullsColocated" to be true.

> Add RelHashDistribution
> -----------------------
>
>                 Key: CALCITE-4712
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4712
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Haisheng Yuan
>            Priority: Major
>
> Add RelHashDistribution. The hash distribution should have the following 
> properties:
> {code:java}
> 1. ImmutableIntList keys; // distribution keys
> 2. ImmutableList<ImmutableBitSet> equivKeys; // equivalent keys for each 
> distribution key
> 3. int bucketNum; // number of buckets or shards
> 4. boolean nullsColocated; // are NULLS colocated?
> 5. String hashFunc; // name or identity of hash function
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to