There is no such thing as primary keys in the Hive metastore, but Spark SQL
does support partitioned hive tables:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables

DataFrameWriter also has a partitionBy method.

On Thu, Aug 20, 2015 at 7:29 AM, VIJAYAKUMAR JAWAHARLAL <sparkh...@data2o.io
> wrote:

> Hi
>
> I have a question regarding data frame partition. I read a hive table from
> spark and following spark api converts it as DF.
>
> test_df = sqlContext.sql(“select * from hivetable1”)
>
> How does spark decide partition of test_df? Is there a way to partition
> test_df based on some column while reading hive table? Second question is,
> if that hive table has primary key declared, does spark honor PK in hive
> table and partition based on PKs?
>
> Thanks
> Vijay
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to