There is no such thing as primary keys in the Hive metastore, but Spark SQL does support partitioned hive tables: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables
DataFrameWriter also has a partitionBy method. On Thu, Aug 20, 2015 at 7:29 AM, VIJAYAKUMAR JAWAHARLAL <sparkh...@data2o.io > wrote: > Hi > > I have a question regarding data frame partition. I read a hive table from > spark and following spark api converts it as DF. > > test_df = sqlContext.sql(“select * from hivetable1”) > > How does spark decide partition of test_df? Is there a way to partition > test_df based on some column while reading hive table? Second question is, > if that hive table has primary key declared, does spark honor PK in hive > table and partition based on PKs? > > Thanks > Vijay > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >