[jira] [Updated] (SPARK-3131) Allow user to set parquet compression codec for writing ParquetFile in SQLContext

2014-08-19 Thread Teng Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teng Qiu updated SPARK-3131:


Summary: Allow user to set parquet compression codec for writing 
ParquetFile in SQLContext  (was: Allow user to set parquet compression codec)

 Allow user to set parquet compression codec for writing ParquetFile in 
 SQLContext
 -

 Key: SPARK-3131
 URL: https://issues.apache.org/jira/browse/SPARK-3131
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Teng Qiu

 There are 4 different compression codec available for ParquetOutputFormat
 currently it was set as a hard-coded value in 
 {code}ParquetRelation.defaultCompression{code}
 original discuss:
 https://github.com/apache/spark/pull/195#discussion-diff-11002083
 so we need to add a new config property in SQLConf to allow user change this 
 compression codec, and i used similar short names syntax as described in 
 SPARK-2953
 btw, which codec should we use as default? it was set to GZIP 
 (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we 
 should change this to SNAPPY, since SNAPPY is already the default codec for 
 shuffling in spark-core (SPARK-2469), and parquet-mr supports Snappy codec 
 natively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3131) Allow user to set parquet compression codec for writing ParquetFile in SQLContext

2014-08-19 Thread Teng Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teng Qiu updated SPARK-3131:


Description: 
There are 4 different compression codec available for ParquetOutputFormat

in Spark SQL it was set as a hard-coded value in 
{code}ParquetRelation.defaultCompression{code}

original discuss:
https://github.com/apache/spark/pull/195#discussion-diff-11002083


so we need to add a new config property in SQLConf to allow user to change this 
compression codec, and i used similar short names syntax as described in 
SPARK-2953


btw, which codec should we use as default? it was set to GZIP 
(https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we 
should change this to SNAPPY, since SNAPPY is already the default codec for 
shuffling in spark-core (SPARK-2469), and parquet-mr supports Snappy codec 
natively.

  was:
There are 4 different compression codec available for ParquetOutputFormat

currently it was set as a hard-coded value in 
{code}ParquetRelation.defaultCompression{code}

original discuss:
https://github.com/apache/spark/pull/195#discussion-diff-11002083


so we need to add a new config property in SQLConf to allow user change this 
compression codec, and i used similar short names syntax as described in 
SPARK-2953


btw, which codec should we use as default? it was set to GZIP 
(https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we 
should change this to SNAPPY, since SNAPPY is already the default codec for 
shuffling in spark-core (SPARK-2469), and parquet-mr supports Snappy codec 
natively.


 Allow user to set parquet compression codec for writing ParquetFile in 
 SQLContext
 -

 Key: SPARK-3131
 URL: https://issues.apache.org/jira/browse/SPARK-3131
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Teng Qiu

 There are 4 different compression codec available for ParquetOutputFormat
 in Spark SQL it was set as a hard-coded value in 
 {code}ParquetRelation.defaultCompression{code}
 original discuss:
 https://github.com/apache/spark/pull/195#discussion-diff-11002083
 so we need to add a new config property in SQLConf to allow user to change 
 this compression codec, and i used similar short names syntax as described in 
 SPARK-2953
 btw, which codec should we use as default? it was set to GZIP 
 (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we 
 should change this to SNAPPY, since SNAPPY is already the default codec for 
 shuffling in spark-core (SPARK-2469), and parquet-mr supports Snappy codec 
 natively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org