But wouldn’t partitioning column partition the data only in Spark RDD? Would it also partition columns at disk when data is written (diving data in folders)?
From: ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> Date: Friday, July 21, 2017 at 3:25 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark Data Frame Writer - Range Partiotioning How about creating a partituon column and use it? On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote: Is it possible to have Spark Data Frame Writer write based on RangePartioning? For Ex - I have 10 distinct values for column_a, say 1 to 10. df.write .partitionBy("column_a") Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10 I want to see if it is possible to have these partitions based on bucket - col_a=1to5, col_a=5-10 .. or something like that? Then also have query engine respect it Thanks, Nishit -- Best Regards, Ayan Guha