[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-05 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393788#comment-17393788
 ] 

Gabor Szadovszky commented on PARQUET-2071:
---

I think it is a great idea to skip unnecessary deserialization/serialization 
steps in such cases. Meanwhile, we already have some tools with similar 
approach like trans-compression or prune columns. What do you think of 
implementing a more universal tool where you can configure the projection 
schema and the configuration of the target file. Then the tool can decide which 
level of deserialization/serialization is required. For example for 
trans-compression you need to decompress the pages while for encryption you 
don't. What do you think?

> Encryption translation tool 
> 
>
> Key: PARQUET-2071
> URL: https://issues.apache.org/jira/browse/PARQUET-2071
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When translating existing data to encryption state, we could develop a tool 
> like TransCompression to translate the data at page level to encryption state 
> without reading to record and rewrite. This will speed up the process a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-05 Thread Gidon Gershinsky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393982#comment-17393982
 ] 

Gidon Gershinsky commented on PARQUET-2071:
---

A very useful tool, I'll be glad to review the pr.

> Encryption translation tool 
> 
>
> Key: PARQUET-2071
> URL: https://issues.apache.org/jira/browse/PARQUET-2071
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When translating existing data to encryption state, we could develop a tool 
> like TransCompression to translate the data at page level to encryption state 
> without reading to record and rewrite. This will speed up the process a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-05 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394098#comment-17394098
 ] 

Xinli Shang commented on PARQUET-2071:
--

Thanks, Gabor and Gidon! I think it is a good idea of 'universal tool' and load 
it for different use cases. I opened 
https://issues.apache.org/jira/browse/PARQUET-2075 for it. 

> Encryption translation tool 
> 
>
> Key: PARQUET-2071
> URL: https://issues.apache.org/jira/browse/PARQUET-2071
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When translating existing data to encryption state, we could develop a tool 
> like TransCompression to translate the data at page level to encryption state 
> without reading to record and rewrite. This will speed up the process a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-21 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402670#comment-17402670
 ] 

Xinli Shang commented on PARQUET-2071:
--

I just drafted the tool and had [~gershinsky] to have an earlier look(Thanks 
Gidon!). It is working now and I just had a comparison with a regular tool(I 
simply write a tool that read each record and write it back immediately). The 
result is promising that it is 20X faster than the regular tool. 

[~gszadovszky] Are you open to having the tool merge in first and then we 
refactor all the existing similar tools to have the universal tool? If yes, I 
am going to make a PR shortly. 

> Encryption translation tool 
> 
>
> Key: PARQUET-2071
> URL: https://issues.apache.org/jira/browse/PARQUET-2071
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When translating existing data to encryption state, we could develop a tool 
> like TransCompression to translate the data at page level to encryption state 
> without reading to record and rewrite. This will speed up the process a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-23 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403039#comment-17403039
 ] 

Gabor Szadovszky commented on PARQUET-2071:
---

[~sha...@uber.com], sure, I am fine with having the "universal tool" and the 
required refactors be handled under the separate jira.

> Encryption translation tool 
> 
>
> Key: PARQUET-2071
> URL: https://issues.apache.org/jira/browse/PARQUET-2071
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When translating existing data to encryption state, we could develop a tool 
> like TransCompression to translate the data at page level to encryption state 
> without reading to record and rewrite. This will speed up the process a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)