Hi Group, I am not able to load data into external hive table which is partitioned.
Trace :- 1. create external table test(id int, name string) stored as parquet location 'hdfs://testcluster/user/abc/test' tblproperties ('PARQUET.COMPRESS'='SNAPPY'); 2.Spark code val spark = SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate() spark.sql("use default").show val rdd = sc.parallelize(Seq((1, "one"), (2, "two"))) val df = spark.createDataFrame(rdd).toDF("id", "name") df.write.mode(SaveMode.Overwrite).insertInto("test") 3. I can see few snappy.parquet files. 4. create external table test(id int) partitioned by (name string) stored as parquet location 'hdfs://testcluster/user/abc/test' tblproperties ('PARQUET.COMPRESS'='SNAPPY'); 5.Spark code val spark = SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate() spark.sql("use default").show val rdd = sc.parallelize(Seq((1, "one"), (2, "two"))) val df = spark.createDataFrame(rdd).toDF("id", "name") df.write.mode(SaveMode.Overwrite).insertInto("test") 6. I see uncompressed files without snappy.parquet extension. parquet-tools.jar also confirms that this is uncompressed parquet file. 7.i tried following options as well, but no luck df.write.mode(SaveMode.Overwrite).format("parquet").option("compression", "snappy").insertInto("test") Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-create-parquet-with-snappy-output-for-hive-external-table-tp28687.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org