[ https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134030#comment-17134030 ]
Gabor Szadovszky commented on PARQUET-1872: ------------------------------------------- [~sha...@uber.com], I don't know why the PR was not linked here automatically. Please, add it manually to have the reference. I don't get the sub-tasks. In the PR #796 header you reference this jira while you already resolve the parquet-tools related sub-task in it. I think, adding this functionality to {{parquet-cli}} shouldn't be a big deal to separate to another task. (I would suggest implementing the functionality at one place and invoke it from {{parquet-tools}} and {{parquet-cli}}. What is the bloom filter support is about? I am not sure about the bloom filters but offset indexes surely have to be updated as the page offsets will change. Without it the feature is incorrect so I would not merge a PR to master without implementing it. > Add TransCompression command > ----------------------------- > > Key: PARQUET-1872 > URL: https://issues.apache.org/jira/browse/PARQUET-1872 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.12.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > > When ZSTD becomes more popular, there is a need to translate existing data > ZSTD compressed which can achieve a higher compression ratio. It would be > useful if we can have a tool to convert a Parquet file directly by just > decompressing/compressing each page without decoding/encoding or assembling > the record because it is much faster. The initial result shows it is ~5 times > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)