Re: [Spark][issue]Writing Hive Partitioned table

ayan guha Wed, 19 Oct 2016 19:19:19 -0700

Hi Group

Sorry to rekindle this thread.


Using Spark 1.6.0 on CDH 5.7.

Any idea?


Best
Ayan

On Fri, Oct 7, 2016 at 5:08 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi Ayan,
>
> Depends on the version of Spark you are using.
>
> Have you tried updating stats in Hive?
>
> ANALYZE TABLE ${DATABASE}.${TABLE} PARTITION (${PARTITION_NAME}) COMPUTE
> STATISTICS FOR COLUMNS
>
> and then do
>
> show create table ${TABLE}
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 7 October 2016 at 03:46, ayan guha <guha.a...@gmail.com> wrote:
>
>> Posting with correct subject.....
>>
>> On Fri, Oct 7, 2016 at 12:37 PM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Faced one issue:
>>>
>>> - Writing Hive Partitioned table using
>>>
>>> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"]))
>>> .write.partitionBy('partition_date').saveAsTable("sometable"
>>> ,mode="overwrite")
>>>
>>> - Data got written to HDFS fine. I can see the folders with partition
>>> names such as
>>>
>>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
>>> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29
>>>
>>> and so on.
>>> - Also, _common_metadata & _metadata files are written properly
>>>
>>> - I can read data from spark fine using 
>>> read.parquet("/app/somedb/hive/somedb.db/sometable").
>>> Printschema showing all columns.
>>>
>>> - However, I can not read from hive.
>>>
>>> Problem 1: Hive does not think the table is partitioned
>>> Problem 2: Hive sees only 1 column
>>> array<string> from deserializer
>>> Problem 3: MSCK repair table failed, saying partitions are not in
>>> Metadata.
>>>
>>> Question: Is it a known issue with Spark to write to Hive partitioned
>>> table?
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: [Spark][issue]Writing Hive Partitioned table

Reply via email to