Re: how to investigate skew and DataFrames and RangePartitioner

Takeshi Yamamuro Tue, 14 Jun 2016 08:43:33 -0700

Hi,

I'm afraid there is currently no api to define RangeParititoner in df.


// maropu

On Tue, Jun 14, 2016 at 5:04 AM, Peter Halliday <pjh...@cornell.edu> wrote:

> I have two questions
>
> First,I have a failure when I write parquet from Spark 1.6.1 on Amazon EMR
> to S3.  This is full batch, which is over 200GB of source data.  The
> partitioning is based on a geographic identifier we use, and also a date we
> got the data.  However, because of geographical density we certainly could
> be hitting the fact we are getting tiles too dense.  I’m trying to figure
> out how to figure out the size of the file it’s trying to write out.
>
> Second, We use to use RDDs and RangePartitioner for task partitioning.
> However, I don’t see this available in DataFrames.  How does one achieve
> this now.
>
> Peter Halliday
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: how to investigate skew and DataFrames and RangePartitioner

Reply via email to