Nishant Bangarwa created HIVE-20683:
---------------------------------------
Summary: Add the Ability to push Dynamic Between and Bloom filters
to Druid
Key: HIVE-20683
URL: https://issues.apache.org/jira/browse/HIVE-20683
Project: Hive
Issue Type: New Feature
Components: Druid integration
Reporter: Nishant Bangarwa
Assignee: Nishant Bangarwa
For optimizing joins, Hive generates BETWEEN filter with min-max and BLOOM
filter for filtering one side of semi-join.
Druid 0.13.0 will have support for Bloom filters (Added via
https://github.com/apache/incubator-druid/pull/6222)
Implementation details -
# Hive generates and passes the filters as part of 'filterExpr' in TableScan.
# DruidQueryBasedRecordReader gets this filter passed as part of the conf.
# During execution phase, before sending the query to druid in
DruidQueryBasedRecordReader we will deserialize this filter, translate it into
a DruidDimFilter and add it to existing DruidQuery. Tez executor already
ensures that when we start reading results from the record reader, all the
dynamic values are initialized.
# Explaining a druid query also prints the query sent to druid as
{{druid.json.query}}. We also need to make sure to update the druid query with
the filters. During explain we do not have the actual values for the dynamic
values, so instead of values we will print the dynamic expression itself as
part of druid query.
Note:- This work needs druid to be updated to version 0.13.0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)