[ 
https://issues.apache.org/jira/browse/PARQUET-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034588#comment-17034588
 ] 

Xinli Shang commented on PARQUET-1792:
--------------------------------------

[~gershinsky], this is just a simple offline tool to replace the raw columns 
with masked value. It is different from what we talked about earlier for the 
data obfuscation feature. The difference is that users have to run this tool 
explicitly and they are aware of what the data to be after translation. There 
is no chance that they accidentally, implicitly or doing it by default.

The tool can provide a different way to translate the raw data to masked value 
and can allow the user to define their own if they have security concerns. We 
just provide the tool to make their work easier. In addition, ORC already has 
those mask mechanism released.  

As mentioned earlier, I can send an email to dev email group to see if they 
have the needs of this tool. 

Again, this proposal is independent of the data obfuscation that we are jointly 
working on it. 

 

 

 

> Add 'mask' command to parquet-tools/parquet-cli
> -----------------------------------------------
>
>                 Key: PARQUET-1792
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1792
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> Some personal data columns need to be masked instead of being 
> pruned(Parquet-1791). We need a tool to replace the raw data columns with 
> masked value. The masked value could be hash, null, redact etc.  For the 
> unchanged columns, they should be moved as a whole like 'merge', 'prune' 
> command in Parquet-tools. 
>  
> Implementing this feature in file format is 10X faster than doing it by 
> rewriting the table data in the query engine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to