Hi Group Sorry to rekindle this thread.
Using Spark 1.6.0 on CDH 5.7. Any idea? Best Ayan On Fri, Oct 7, 2016 at 5:08 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Ayan, > > Depends on the version of Spark you are using. > > Have you tried updating stats in Hive? > > ANALYZE TABLE ${DATABASE}.${TABLE} PARTITION (${PARTITION_NAME}) COMPUTE > STATISTICS FOR COLUMNS > > and then do > > show create table ${TABLE} > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 7 October 2016 at 03:46, ayan guha <guha.a...@gmail.com> wrote: > >> Posting with correct subject..... >> >> On Fri, Oct 7, 2016 at 12:37 PM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Hi >>> >>> Faced one issue: >>> >>> - Writing Hive Partitioned table using >>> >>> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"])) >>> .write.partitionBy('partition_date').saveAsTable("sometable" >>> ,mode="overwrite") >>> >>> - Data got written to HDFS fine. I can see the folders with partition >>> names such as >>> >>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28 >>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29 >>> >>> and so on. >>> - Also, _common_metadata & _metadata files are written properly >>> >>> - I can read data from spark fine using >>> read.parquet("/app/somedb/hive/somedb.db/sometable"). >>> Printschema showing all columns. >>> >>> - However, I can not read from hive. >>> >>> Problem 1: Hive does not think the table is partitioned >>> Problem 2: Hive sees only 1 column >>> array<string> from deserializer >>> Problem 3: MSCK repair table failed, saying partitions are not in >>> Metadata. >>> >>> Question: Is it a known issue with Spark to write to Hive partitioned >>> table? >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> >> >> -- >> Best Regards, >> Ayan Guha >> > > -- Best Regards, Ayan Guha