Partitioned Parquet based external table

Chandra Mohan, Ananda Vel Murugan Thu, 12 Nov 2015 03:40:36 -0800

Hi,

I am using Spark 1.5.1.


https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java.
 I have slightly modified this example to create partitioned parquet file

Instead of this line

schemaPeople.write().parquet("people.parquet");

I use this line

schemaPeople.write().partitionBy("country").parquet("/user/Ananda/people.parquet");

I have also updated the Person class and added country attribute. I have also 
updated my input file accordingly.

When I run this code in spark, it seems to work. I could see partitioned folder 
and parquet file inside it in HDFS where I store this parquet file.

But when I create a external table in Hive, it does not work. When I do "select 
 *  from person5", it returns no rows.

This is how I create the table

CREATE EXTERNAL TABLE person5(name string, age int,city string)
PARTITIONED BY (country string)
STORED AS PARQUET
LOCATION '/user/ananda/people.parquet/';

When I create a non partitioned table, it works fine.

Please help if you have any idea.

Regards,
Anand.C

Partitioned Parquet based external table

Reply via email to