Re: Spark Data Frame Writer - Range Partiotioning

Jain, Nishit Tue, 25 Jul 2017 06:50:14 -0700

But wouldn’t partitioning column partition the data only in Spark RDD? Would it 
also partition columns at disk when data is written (diving data in folders)?

From: ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>>
Date: Friday, July 21, 2017 at 3:25 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark Data Frame Writer - Range Partiotioning

How about creating a partituon column and use it?

On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit 
<nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote:

Is it possible to have Spark Data Frame Writer write based on RangePartioning?

For Ex -

I have 10 distinct values for column_a, say 1 to 10.

df.write
.partitionBy("column_a")

Above code by default will create 10 folders .. column_a=1,column_a=2 
...column_a=10

I want to see if it is possible to have these partitions based on bucket - 
col_a=1to5, col_a=5-10 .. or something like that? Then also have query engine 
respect it

Thanks,

Nishit

--
Best Regards,
Ayan Guha

Re: Spark Data Frame Writer - Range Partiotioning

Reply via email to