ayan guha Thu, 06 Oct 2016 18:38:56 -0700

Hi

Faced one issue:


- Writing Hive Partitioned table using

df.withColumn("partition_date",to_date(df["INTERVAL_DATE"])).write.partitionBy('partition_date').saveAsTable("sometable",mode="overwrite")

- Data got written to HDFS fine. I can see the folders with partition names
such as

/app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
/app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29

and so on.
- Also, _common_metadata & _metadata files are written properly

- I can read data from spark fine using
read.parquet("/app/somedb/hive/somedb.db/sometable"). Printschema showing
all columns.

- However, I can not read from hive.

Problem 1: Hive does not think the table is partitioned
Problem 2: Hive sees only 1 column
array<string> from deserializer
Problem 3: MSCK repair table failed, saying partitions are not in Metadata.

Question: Is it a known issue with Spark to write to Hive partitioned table?


-- 
Best Regards,
Ayan Guha

Reply via email to