Aman Sinha created DRILL-4122: --------------------------------- Summary: Create unit test suite for checking quality of hashing for hash based operators Key: DRILL-4122 URL: https://issues.apache.org/jira/browse/DRILL-4122 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.3.0 Reporter: Aman Sinha
We have encountered substantial skew in the hash based operators (hash distribution, hash aggregation, hash join) for certain data sets. Two such issues are DRILL-2803, DRILL-4119. It would be very useful to have a unit test suite to test the quality of hashing. The number of combinations is large: num_data_types x nullability x num_hash_function_types (32bit, 64bit, AsDouble variations). Plus, the nature of the data itself. We would have to be judicious about picking a reasonable subset of this space. We should also look at open source test suites in this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)