Hi all,

We are adding the support for Ranger column masking and need to reach a
consensus on the behavior design.

A column masking policy is something like "only show last 4 chars of phone
column to user X". When user X reads the phone column, the value woule be
something like "xxxxx6789" instead of the real value "123456789".

The behavior is clear when the query is simple. However, there're two
different behaviors when the query contains subqueries. The key part is
where we should perform the masking, whether in the outer most select list,
or in the select list of the inner most subquery.

To be specifit, consider these two queries:
(1) subquery contains predicates on unmasked value
  SELECT concat(name, phone) FROM (
    SELECT name, phone FROM customer WHERE phone = '123456789'
  ) t;
(2) subquery contains predicates on masked value
  SELECT concat(name, phone) FROM (
    SELECT name, phone FROM customer WHERE phone = 'xxxxx6789'
  ) t;

Let's say there's actually one row in table 'customer' satisfying phone =
'123456789'. When user X runs the queries, the two different behaviors are:
(a) Query1 returns nothing. Query2 returns one result: "Bobxxxxx6789".
(b) Query1 returns one result: "Bobxxxxx6789". Query2 returns nothing.

Hive is in behavior (a) since it does a table masking that replaces the
TableRef with a subquery containing masked columns. See more in codes:
https://github.com/apache/hive/blob/rel/release-3.1.2/ql/src/java/org/apache/hadoop/hive/ql/parse/TableMask.java#L86-L155
and some experiments I did:
https://docs.google.com/document/d/1LYk2wxT3GMw4ur5y9JBBykolfAs31P3gWRStk21PomM/edit?usp=sharing

Kurt mentions that traditional dbs like DB2 are in behavior (b). I think we
need to decide which behavior we'd like to support. The pros for behavior
(a) is no security leak. Because user X can't guess whether there are some
customers with phone number '123456789'. The pros for behavior (b) is users
don't need to rewrite their existing queries after admin applies column
masking policies.

What do you think?

Thanks,
Quanlong

Reply via email to