Re: How to control the number of parquet files getting created under a partition ?

2016-03-02 Thread swetha kasireddy
Thanks. I tried this yesterday and it seems to be working. On Wed, Mar 2, 2016 at 1:49 AM, James Hammerton wrote: > Hi, > > Based on the behaviour I've seen using parquet, the number of partitions > in the DataFrame will determine the number of files in each parquet > partition.

Re: How to control the number of parquet files getting created under a partition ?

2016-03-02 Thread James Hammerton
Hi, Based on the behaviour I've seen using parquet, the number of partitions in the DataFrame will determine the number of files in each parquet partition. I.e. when you use "PARTITION BY" you're actually partitioning twice, once via the partitions spark has created internally and then again

How to control the number of parquet files getting created under a partition ?

2016-03-01 Thread SRK
Hi, How can I control the number of parquet files getting created under a partition? I have my sqlContext queries to create a table and insert the records as follows. It seems to create around 250 parquet files under each partition though I was expecting that to create around 2 or 3 files. Due to