[ 
https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134030#comment-17134030
 ] 

Gabor Szadovszky commented on PARQUET-1872:
-------------------------------------------

[~sha...@uber.com], I don't know why the PR was not linked here automatically. 
Please, add it manually to have the reference.
I don't get the sub-tasks. In the PR #796 header you reference this jira while 
you already resolve the parquet-tools related sub-task in it. I think, adding 
this functionality to {{parquet-cli}} shouldn't be a big deal to separate to 
another task. (I would suggest implementing the functionality at one place and 
invoke it from {{parquet-tools}} and {{parquet-cli}}.
What is the bloom filter support is about? I am not sure about the bloom 
filters but offset indexes surely have to be updated as the page offsets will 
change. Without it the feature is incorrect so I would not merge a PR to master 
without implementing it.

> Add TransCompression command 
> -----------------------------
>
>                 Key: PARQUET-1872
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1872
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>
> When ZSTD becomes more popular, there is a need to translate existing data 
> ZSTD compressed which can achieve a higher compression ratio. It would be 
> useful if we can have a tool to convert a Parquet file directly by just 
> decompressing/compressing each page without decoding/encoding or assembling 
> the record because it is much faster. The initial result shows it is ~5 times 
> faster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to