I was using hive context an not sql context, therefore ("SET
spark.sql.parquet.compression.codec=gzip") was "ignored".

Michael Armbrust pointed out that "parquet.compression" should be used
instead, witch solved the issue.

I am still wondering if this behavior is "normal", it would be better if
"spark.sql.parquet.compression.codec" would be "translated" to
"parquet.compression" in case of hive context.
Other wise the documentation should be updated to be more precise.



2015-02-04 19:13 GMT+01:00 sahanbull <sa...@skimlinks.com>:

> Hi Ayoub,
>
> You could try using the sql format to set the compression type:
>
> sc = SparkContext()
> sqc = SQLContext(sc)
> sqc.sql("SET spark.sql.parquet.compression.codec=gzip")
>
> You get a notification on screen while running the spark job when you set
> the compression codec like this. I havent compared it with different
> compression methods, Please let the mailing list knows if this works for
> you.
>
> Best
> Sahan
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-compression-codecs-not-applied-tp21058p21498.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Parquet-compression-codecs-not-applied-tp21499.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to