Hey Friends,

I am trying to use sqlContext.write.parquet() to write dataframe to parquet
files. I have the following questions.

1. number of partitions
The default number of partition seems to be 200. Is there any way other
than using df.repartition(n) to change this number? I was told repartition
can be very expensive.

2. partition by size
When I use df.partitionBy(['year']), if the number of entries with
"year=2006" is very small, the sizes of the files under partition
"year=2006" can be very small. If we can assign a size to each partition
file, that'll be very helpful.


Thank you,
Wei

Reply via email to