[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-10-06 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195461#comment-16195461
 ] 

Ryan Blue commented on PARQUET-678:
---

I think custom codecs is a bad idea. It will only cause compatibility issues to 
support arbitrary codecs, so I recommend we implement a small set. Probably 
just adding brotli and zstd.

> Allow for custom compression codecs
> ---
>
> Key: PARQUET-678
> URL: https://issues.apache.org/jira/browse/PARQUET-678
> Project: Parquet
>  Issue Type: Wish
>Reporter: Steven Anton
>Priority: Minor
>
> I understand that the list of accepted compression codecs is explicity 
> limited to uncompressed, snappy, gzip, and lzo. (See 
> parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for 
> this? Or is there an easy workaround? On the surface it seems like an 
> unnecessary restriction.
> I ask because I have written a custom codec to implement encryption and I'm 
> unable to use it with Parquet, which is a real shame because it is the main 
> storage format I was hoping to use.
> Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-10 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860944#comment-15860944
 ] 

Uwe L. Korn commented on PARQUET-678:
-

[~cotton] A patch would be very welcome, I can help for that on the C++ side 
once we have a Java patch available.

> Allow for custom compression codecs
> ---
>
> Key: PARQUET-678
> URL: https://issues.apache.org/jira/browse/PARQUET-678
> Project: Parquet
>  Issue Type: Wish
>Reporter: Steven Anton
>Priority: Minor
>
> I understand that the list of accepted compression codecs is explicity 
> limited to uncompressed, snappy, gzip, and lzo. (See 
> parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for 
> this? Or is there an easy workaround? On the surface it seems like an 
> unnecessary restriction.
> I ask because I have written a custom codec to implement encryption and I'm 
> unable to use it with Parquet, which is a real shame because it is the main 
> storage format I was hoping to use.
> Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-10 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860940#comment-15860940
 ] 

Uwe L. Korn commented on PARQUET-678:
-

Adding them to parquet-cpp and parquet-format is easy, the only thing that 
looks a bit harder from my side is to add to Hadoop as a codec so it can be 
used in parquet-mr. At least for Zstd, this seems to be done already: 
https://issues.apache.org/jira/browse/HADOOP-13578

> Allow for custom compression codecs
> ---
>
> Key: PARQUET-678
> URL: https://issues.apache.org/jira/browse/PARQUET-678
> Project: Parquet
>  Issue Type: Wish
>Reporter: Steven Anton
>Priority: Minor
>
> I understand that the list of accepted compression codecs is explicity 
> limited to uncompressed, snappy, gzip, and lzo. (See 
> parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for 
> this? Or is there an easy workaround? On the surface it seems like an 
> unnecessary restriction.
> I ask because I have written a custom codec to implement encryption and I'm 
> unable to use it with Parquet, which is a real shame because it is the main 
> storage format I was hoping to use.
> Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-09 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860404#comment-15860404
 ] 

Wes McKinney commented on PARQUET-678:
--

The format also provides for Brotli compression: 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L331

I am sure that LZ4 and zstd would be welcome additions -- at least on the C++ 
side adding these would not cause us much hardshop (we have added Brotli 
support already)

> Allow for custom compression codecs
> ---
>
> Key: PARQUET-678
> URL: https://issues.apache.org/jira/browse/PARQUET-678
> Project: Parquet
>  Issue Type: Wish
>Reporter: Steven Anton
>Priority: Minor
>
> I understand that the list of accepted compression codecs is explicity 
> limited to uncompressed, snappy, gzip, and lzo. (See 
> parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for 
> this? Or is there an easy workaround? On the surface it seems like an 
> unnecessary restriction.
> I ask because I have written a custom codec to implement encryption and I'm 
> unable to use it with Parquet, which is a real shame because it is the main 
> storage format I was hoping to use.
> Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-09 Thread Cotton Seed (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860338#comment-15860338
 ] 

Cotton Seed commented on PARQUET-678:
-

We find lz4 gives similar compression and is about 20% faster for our 
application.  In addition to zstd, I'm sure there is interest in other new 
compression algorithms, like brotli.  It would seem natural for Parquet to work 
with any Hadoop compression codec.  I can work up a patch if there would be 
interest in accepting it.

> Allow for custom compression codecs
> ---
>
> Key: PARQUET-678
> URL: https://issues.apache.org/jira/browse/PARQUET-678
> Project: Parquet
>  Issue Type: Wish
>Reporter: Steven Anton
>Priority: Minor
>
> I understand that the list of accepted compression codecs is explicity 
> limited to uncompressed, snappy, gzip, and lzo. (See 
> parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for 
> this? Or is there an easy workaround? On the surface it seems like an 
> unnecessary restriction.
> I ask because I have written a custom codec to implement encryption and I'm 
> unable to use it with Parquet, which is a real shame because it is the main 
> storage format I was hoping to use.
> Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)