[jira] [Commented] (PARQUET-678) Allow for custom compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195461#comment-16195461 ] Ryan Blue commented on PARQUET-678: --- I think custom codecs is a bad idea. It will only cause compatibility issues to support arbitrary codecs, so I recommend we implement a small set. Probably just adding brotli and zstd. > Allow for custom compression codecs > --- > > Key: PARQUET-678 > URL: https://issues.apache.org/jira/browse/PARQUET-678 > Project: Parquet > Issue Type: Wish >Reporter: Steven Anton >Priority: Minor > > I understand that the list of accepted compression codecs is explicity > limited to uncompressed, snappy, gzip, and lzo. (See > parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for > this? Or is there an easy workaround? On the surface it seems like an > unnecessary restriction. > I ask because I have written a custom codec to implement encryption and I'm > unable to use it with Parquet, which is a real shame because it is the main > storage format I was hoping to use. > Other thoughts on how to implement encryption in Parquet with this limitation? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-678) Allow for custom compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860944#comment-15860944 ] Uwe L. Korn commented on PARQUET-678: - [~cotton] A patch would be very welcome, I can help for that on the C++ side once we have a Java patch available. > Allow for custom compression codecs > --- > > Key: PARQUET-678 > URL: https://issues.apache.org/jira/browse/PARQUET-678 > Project: Parquet > Issue Type: Wish >Reporter: Steven Anton >Priority: Minor > > I understand that the list of accepted compression codecs is explicity > limited to uncompressed, snappy, gzip, and lzo. (See > parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for > this? Or is there an easy workaround? On the surface it seems like an > unnecessary restriction. > I ask because I have written a custom codec to implement encryption and I'm > unable to use it with Parquet, which is a real shame because it is the main > storage format I was hoping to use. > Other thoughts on how to implement encryption in Parquet with this limitation? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-678) Allow for custom compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860940#comment-15860940 ] Uwe L. Korn commented on PARQUET-678: - Adding them to parquet-cpp and parquet-format is easy, the only thing that looks a bit harder from my side is to add to Hadoop as a codec so it can be used in parquet-mr. At least for Zstd, this seems to be done already: https://issues.apache.org/jira/browse/HADOOP-13578 > Allow for custom compression codecs > --- > > Key: PARQUET-678 > URL: https://issues.apache.org/jira/browse/PARQUET-678 > Project: Parquet > Issue Type: Wish >Reporter: Steven Anton >Priority: Minor > > I understand that the list of accepted compression codecs is explicity > limited to uncompressed, snappy, gzip, and lzo. (See > parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for > this? Or is there an easy workaround? On the surface it seems like an > unnecessary restriction. > I ask because I have written a custom codec to implement encryption and I'm > unable to use it with Parquet, which is a real shame because it is the main > storage format I was hoping to use. > Other thoughts on how to implement encryption in Parquet with this limitation? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-678) Allow for custom compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860404#comment-15860404 ] Wes McKinney commented on PARQUET-678: -- The format also provides for Brotli compression: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L331 I am sure that LZ4 and zstd would be welcome additions -- at least on the C++ side adding these would not cause us much hardshop (we have added Brotli support already) > Allow for custom compression codecs > --- > > Key: PARQUET-678 > URL: https://issues.apache.org/jira/browse/PARQUET-678 > Project: Parquet > Issue Type: Wish >Reporter: Steven Anton >Priority: Minor > > I understand that the list of accepted compression codecs is explicity > limited to uncompressed, snappy, gzip, and lzo. (See > parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for > this? Or is there an easy workaround? On the surface it seems like an > unnecessary restriction. > I ask because I have written a custom codec to implement encryption and I'm > unable to use it with Parquet, which is a real shame because it is the main > storage format I was hoping to use. > Other thoughts on how to implement encryption in Parquet with this limitation? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-678) Allow for custom compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860338#comment-15860338 ] Cotton Seed commented on PARQUET-678: - We find lz4 gives similar compression and is about 20% faster for our application. In addition to zstd, I'm sure there is interest in other new compression algorithms, like brotli. It would seem natural for Parquet to work with any Hadoop compression codec. I can work up a patch if there would be interest in accepting it. > Allow for custom compression codecs > --- > > Key: PARQUET-678 > URL: https://issues.apache.org/jira/browse/PARQUET-678 > Project: Parquet > Issue Type: Wish >Reporter: Steven Anton >Priority: Minor > > I understand that the list of accepted compression codecs is explicity > limited to uncompressed, snappy, gzip, and lzo. (See > parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for > this? Or is there an easy workaround? On the surface it seems like an > unnecessary restriction. > I ask because I have written a custom codec to implement encryption and I'm > unable to use it with Parquet, which is a real shame because it is the main > storage format I was hoping to use. > Other thoughts on how to implement encryption in Parquet with this limitation? -- This message was sent by Atlassian JIRA (v6.3.15#6346)