Hi Faced one issue:
- Writing Hive Partitioned table using df.withColumn("partition_date",to_date(df["INTERVAL_DATE"])).write.partitionBy('partition_date').saveAsTable("sometable",mode="overwrite") - Data got written to HDFS fine. I can see the folders with partition names such as /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28 /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29 and so on. - Also, _common_metadata & _metadata files are written properly - I can read data from spark fine using read.parquet("/app/somedb/hive/somedb.db/sometable"). Printschema showing all columns. - However, I can not read from hive. Problem 1: Hive does not think the table is partitioned Problem 2: Hive sees only 1 column array<string> from deserializer Problem 3: MSCK repair table failed, saying partitions are not in Metadata. Question: Is it a known issue with Spark to write to Hive partitioned table? -- Best Regards, Ayan Guha