Hi Group,

I am not able to load data into external hive table which is partitioned.

Trace :-

1. create external table test(id int, name string) stored as parquet
location 'hdfs://testcluster/user/abc/test' tblproperties
('PARQUET.COMPRESS'='SNAPPY');

2.Spark code

   val spark =
SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition",
"true")
          .config("hive.exec.dynamic.partition.mode",
"nonstrict").getOrCreate()
        spark.sql("use default").show
        val rdd = sc.parallelize(Seq((1, "one"), (2, "two")))
        val df = spark.createDataFrame(rdd).toDF("id", "name")
        df.write.mode(SaveMode.Overwrite).insertInto("test")

3. I can see few snappy.parquet files.

4. create external table test(id int) partitioned by  (name string)  stored
as parquet location 'hdfs://testcluster/user/abc/test' tblproperties
('PARQUET.COMPRESS'='SNAPPY');

5.Spark code

   val spark =
SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition",
"true")
          .config("hive.exec.dynamic.partition.mode",
"nonstrict").getOrCreate()
        spark.sql("use default").show
        val rdd = sc.parallelize(Seq((1, "one"), (2, "two")))
        val df = spark.createDataFrame(rdd).toDF("id", "name")
        df.write.mode(SaveMode.Overwrite).insertInto("test")

6. I see uncompressed files without snappy.parquet extension.
parquet-tools.jar also confirms that this is uncompressed parquet file.

7.i tried following options as well, but no luck

df.write.mode(SaveMode.Overwrite).format("parquet").option("compression",
"snappy").insertInto("test")


Thanks in advance.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-create-parquet-with-snappy-output-for-hive-external-table-tp28687.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to