[ https://issues.apache.org/jira/browse/PARQUET-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988610#comment-16988610 ]
Gabor Szadovszky commented on PARQUET-1622: ------------------------------------------- parquet-format 2.8.0 is not released yet. To start a PR to parquet-mr master you will need the parquet-format 2.8.0 released and the parquet-mr to be depending on it. For longer feature developments we use feature branches. On a feature branch, you may use a workaround that installs the current parquet-format master as a snapshot and depend on it. But if your change would not require several loops of development I don't think it worth the additional efforts. Usually, parquet-format releases can be done in a week or two. Are you planning to initiate a PR to parquet-mr soon? > Adding an encoding for FP data > ------------------------------ > > Key: PARQUET-1622 > URL: https://issues.apache.org/jira/browse/PARQUET-1622 > Project: Parquet > Issue Type: Wish > Components: parquet-cpp, parquet-format, parquet-mr, parquet-thrift > Reporter: Martin Radev > Assignee: Martin Radev > Priority: Minor > Labels: features, pull-request-available > Original Estimate: 48h > Remaining Estimate: 48h > > Apache Parquet does not have any encodings suitable for FP data and the > available text compressors (zstd, gzip, etc) do not handle FP data very well. > It is possible to apply a simple data transformation named "stream > splitting". Such could be "byte stream splitting" which creates K streams of > length N where K is the number of bytes in the data type (4 for floats, 8 for > doubles) and N is the number of elements in the sequence. > The transformed data compresses significantly better on average than the > original data and for some cases there is a performance improvement in > compression and decompression speed. > You can read a more detailed report here: > https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view -- This message was sent by Atlassian Jira (v8.3.4#803005)