[ 
https://issues.apache.org/jira/browse/DRILL-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-2803.
-----------------------------
    Assignee: Chun Chang  (was: Jacques Nadeau)

verified fix

{code}
0: jdbc:drill:schema=dfs.drillTestDirAdvanced> select t.id, t.nul, 
hash64(t.nul, hash64(t.id)) as hash_value from `complex.json` t where t.nul is 
null order by t.id limit 10;
+------------+------------+------------+
|     id     |    nul     | hash_value |
+------------+------------+------------+
| 2          | null       | 2087691676675520095 |
| 7          | null       | 1124679897442190205 |
| 8          | null       | 6873016763837711824 |
| 9          | null       | 5689101538750744972 |
| 14         | null       | 3054377820775748426 |
| 15         | null       | 7945380393500328936 |
| 18         | null       | 3527854738750367914 |
| 20         | null       | 3939724015981009335 |
| 23         | null       | 6367888492325306020 |
| 25         | null       | 1786113389955931412 |
+------------+------------+------------+
10 rows selected (13.679 seconds)
0: jdbc:drill:schema=dfs.drillTestDirAdvanced> select * from sys.version;
+------------+----------------+-------------+-------------+------------+
| commit_id  | commit_message | commit_time | build_email | build_time |
+------------+----------------+-------------+-------------+------------+
| 57a96d200e12c0efcad3f3ca9d935c42647234b1 | DRILL-2083: Fix bug in merging 
receiver | 27.04.2015 @ 17:12:13 EDT | Unknown     | 27.04.2015 @ 23:19:52 EDT |
+------------+----------------+-------------+-------------+------------+
{code}

add a new test case - complex320.q

> Severe skew due to null values in columns even when other columns are non-null
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-2803
>                 URL: https://issues.apache.org/jira/browse/DRILL-2803
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Chun Chang
>             Fix For: 0.9.0
>
>         Attachments: DRILL-2803.patch
>
>
> If you have 2 columns that are hashed (either for distribution or for hash 
> based operators) and one of those columns has lots of null values, it can 
> result in substantial skew even if the other column has non-null values. 
> In the following query the combined hash value of 2 columns is 0 even when 1 
> column is non-null.   The reason is that if the starting value is null (for 
> cr_reason_sk all values are null in the above query), it does not matter what 
> seed is passed in.   The hash function treats the second parameter as a seed 
> and not as a combiner, so it gets ignored. 
> {code}
> select cr_call_center_sk, cr_reason_sk, hash64(cr_reason_sk, 
> hash64(cr_call_center_sk)) as hash_value from catalog_returns  where 
> cr_reason_sk is null and cr_call_center_sk is not null limit 10;
> +-------------------+--------------+------------+
> | cr_call_center_sk | cr_reason_sk | hash_value |
> +-------------------+--------------+------------+
> | 1                 | null         | 0          |
> | 1                 | null         | 0          |
> | 4                 | null         | 0          |
> | 1                 | null         | 0          |
> | 4                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> | 2                 | null         | 0          |
> +-------------------+--------------+------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to