[ https://issues.apache.org/jira/browse/CALCITE-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391838#comment-17391838 ]
Haisheng Yuan commented on CALCITE-4712: ---------------------------------------- (1, null) and (2, null) will have different hash value, so when redistributing, they will go to different bucket. The flag is useful for outer joins. For example, "select * from foo outer join bar on foo.a = bar.a", bar.a may have a lot of null value, a.k.a. skew value, when doing outer join, there will be long tail on the worker that have a lot of bar.a=null values when null are colocated. So instead we can request a hash distribution that does't colocate null value, so that the computation are evenly distributed. For group by, the hash distribution always requires "nullsColocated" to be true. > Add RelHashDistribution > ----------------------- > > Key: CALCITE-4712 > URL: https://issues.apache.org/jira/browse/CALCITE-4712 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Haisheng Yuan > Priority: Major > > Add RelHashDistribution. The hash distribution should have the following > properties: > {code:java} > 1. ImmutableIntList keys; // distribution keys > 2. ImmutableList<ImmutableBitSet> equivKeys; // equivalent keys for each > distribution key > 3. int bucketNum; // number of buckets or shards > 4. boolean nullsColocated; // are NULLS colocated? > 5. String hashFunc; // name or identity of hash function > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)