Hi,
I wanted to check on the compatibility and recommended usage of multi-value
resources in Ranger Data Mask policies, specifically for the table and
column definitions.
Context (Trino)
-
In the Ranger Admin UI, when creating a masking policy, we can only
select one catalog, schema, table, and column at a time (
https://github.com/apache/ranger/blob/82082f1ac6abe9a1f5d3c6974ce57110d888ab72/agents-common/src/main/resources/service-defs/ranger-servicedef-trino.json#L503
).
-
This creates a challenge If we need to mask 10–20 fields across multiple
tables, we end up having to create a very large number of policies.
-
If both table and column are configured as multiValue in the service
definition, a single policy can cover multiple tables and multiple columns.
-
This means the Ranger plugin will effectively expand all table–column
combinations (e.g., if tables = [orders, customers] and columns =
[email, phone], masking applies to orders.email, orders.phone,
customers.email, customers.phone).
We are exploring making table and column resources support multi-value in
the service definition.
-
*Option 1:* Allow multi-value for columns only (table remains
single-value).
-
Simpler, fewer conflicts, easier auditing.
-
Still many policies if masking the same column across multiple tables.
-
*Option 2:* Allow multi-value for both tables and columns.
-
One policy can cover multiple tables and multiple columns.
-
Possible backward compatibility issues, overlapping policy conflicts,
harder audit/debugging, and slight performance overhead in Trino.
Observation from Testing
I tried modifying the service definition to allow columns as multi-value.
While testing, I noticed that Ranger did not throw any exception in the
case of overlapping policies.
-
Example:
-
Policy 1 → region.name
-
Policy 2 → region.name, region.key
-
Even though region.name is repeated, Ranger allowed both policies to be
created.
This could potentially lead to conflicts or ambiguous behavior in Trino
when deciding which mask is applied.
Request
Could you confirm:
-
Do we have full compatibility in Ranger and Trino when configuring both
table and column as multiValue?
-
Are there any known limitations, best practices, or risks (particularly
around policy evaluation and query execution in Trino)?
-
What is your recommendation for making columns and tables as multivalue?
-
Would you recommend using multiValue for tables in addition to columns,
or should tables remain singleValue?
Thanks,
Vikash