Re: Parquet compression codecs not applied

Michael Armbrust Fri, 09 Jan 2015 08:55:10 -0800

This is a little confusing, but that code path is actually going through
hive.  So the spark sql configuration does not help.


Perhaps, try:
set parquet.compression=GZIP;

On Fri, Jan 9, 2015 at 2:41 AM, Ayoub <benali.ayoub.i...@gmail.com> wrote:

> Hello,
>
> I tried to save a table created via the hive context as a parquet file but
> whatever compression codec (uncompressed, snappy, gzip or lzo) I set via
> setConf like:
>
> setConf("spark.sql.parquet.compression.codec", "gzip")
>
> the size of the generated files is the always the same, so it seems like
> spark context ignores the compression codec that I set.
>
> Here is a code sample applied via the spark shell:
>
> import org.apache.spark.sql.hive.HiveContext
> val hiveContext = new HiveContext(sc)
>
> hiveContext.sql("SET hive.exec.dynamic.partition = true")
> hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict")
> hiveContext.setConf("spark.sql.parquet.binaryAsString", "true") // required
> to make data compatible with impala
> hiveContext.setConf("spark.sql.parquet.compression.codec", "gzip")
>
> hiveContext.sql("create external table if not exists foo (bar STRING, ts
> INT) Partitioned by (year INT, month INT, day INT) STORED AS PARQUET
> Location 'hdfs://path/data/foo'")
>
> hiveContext.sql("insert into table foo partition(year, month,day) select *,
> year(from_unixtime(ts)) as year, month(from_unixtime(ts)) as month,
> day(from_unixtime(ts)) as day from raw_foo")
>
> I tried that with spark 1.2 and 1.3 snapshot against hive 0.13
> and I also tried that with Impala on the same cluster which applied
> correctly the compression codecs.
>
> Does anyone know what could be the problem ?
>
> Thanks,
> Ayoub.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-compression-codecs-not-applied-tp21058.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Parquet compression codecs not applied

Reply via email to