[ 
https://issues.apache.org/jira/browse/DRILL-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-2803:
----------------------------------
    Assignee: Mehant Baid  (was: Jacques Nadeau)

> Severe skew due to null values in columns even when other columns are non-null
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-2803
>                 URL: https://issues.apache.org/jira/browse/DRILL-2803
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Mehant Baid
>         Attachments: DRILL-2803.patch
>
>
> If you have 2 columns that are hashed (either for distribution or for hash 
> based operators) and one of those columns has lots of null values, it can 
> result in substantial skew even if the other column has non-null values. 
> In the following query the combined hash value of 2 columns is 0 even when 1 
> column is non-null.   The reason is that if the starting value is null (for 
> cr_reason_sk all values are null in the above query), it does not matter what 
> seed is passed in.   The hash function treats the second parameter as a seed 
> and not as a combiner, so it gets ignored. 
> {code}
> select cr_call_center_sk, cr_reason_sk, hash64(cr_reason_sk, 
> hash64(cr_call_center_sk)) as hash_value from catalog_returns  where 
> cr_reason_sk is null and cr_call_center_sk is not null limit 10;
> +-------------------+--------------+------------+
> | cr_call_center_sk | cr_reason_sk | hash_value |
> +-------------------+--------------+------------+
> | 1                 | null         | 0          |
> | 1                 | null         | 0          |
> | 4                 | null         | 0          |
> | 1                 | null         | 0          |
> | 4                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> +-------------------+--------------+------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to