Mert Hocanin created HIVE-21419:
-----------------------------------
Summary: Partition Pruning not happening when using Apache Ranger
masking
Key: HIVE-21419
URL: https://issues.apache.org/jira/browse/HIVE-21419
Project: Hive
Issue Type: Bug
Components: Physical Optimizer, Query Planning
Affects Versions: 2.3.2
Environment: I used an AWS Cloudformation script from AWS's big data
blog[1]. The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0.
Source Table:
CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
`l_orderkey` int,
`l_partkey` int,
`l_suppkey` int,
`l_linenumber` int,
`l_quantity` double,
`l_extendedprice` double,
`l_discount` double,
`l_tax` double,
`l_returnflag` string,
`l_linestatus` string,
`l_commitdate` string,
`l_receiptdate` string,
`l_shipinstruct` string,
`l_shipmode` string,
`l_comment` string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem';
Destination Table:
CREATE EXTERNAL TABLE analyst1.test1(
l_commitdate string,
l_receiptdate string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';
Query:
insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from default.lineitem_parq_partitioned
where l_shipdate = '1992-01-02';
Ranger Masking Rule:
Hive Database: analyst1
Hive Table: lineitem_partitioned
Mask Condition Option: Custom: "XXXXXX" (replace the column with a static
string for simplicity, but our use case uses a complex UDF).
[1]
https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
Reporter: Mert Hocanin
Attachments: Operators-in-debugger-with-masking.png,
Operators-in-debugger-without-masking.png, hive-jira-schema-explain-plan.txt
I have a partitioned table, which I have a Ranger masking policy on a
non-partition column. When I am attempting to query the table that includes the
column that has masking enabled, then partition pruning no longer occurs.
To reproduce:
Create two partitioned tables. I used TPC-H tables as they are publicly
available and will provide the schemas and queries I used. Insert into the
second table from the first table. For example:
insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from analyst1.lineitem_partitioned
where l_shipdate = '1992-01-02';
I have attached the explain plan when a masking rule on l_commitdate is enabled
and when not enabled.
I have done a bit of deep dive and see that the pruning expression is not being
set when the masking rule is enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)