How Spark determines Parquet partition size

Selvam Raman Tue, 08 Nov 2016 13:41:12 -0800

Hi,

Can you please tell me how parquet partitions the data while saving the
dataframe.


I have a dataframe which contains 10 values like below

+------------+

|field_num|

+------------+

|         139|

|         140|

|          40|

|          41|

|         148|

|         149|

|         151|

|         152|

|         153|

|         154|

+------------+


df.write.partitionBy("field_num").parquet("/Users/rs/parti/")

it saves the file like (field_num=140,.....filed_num=154)..


when i try the below command it gives 5.

scala> spark.read.parquet("file:///Users/rs/parti").rdd.partitions.length

res4: Int = 5


so how does parquet partitioning the data in spark?


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

How Spark determines Parquet partition size

Reply via email to