Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Takeshi Yamamuro
..@wellsfargo.com" <saif.a.ell...@wellsfargo.com> > *Date: *Friday, June 3, 2016 at 8:31 AM > *To: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *Strategies for propery load-balanced partitioning > > > > Hello everyone! > >

Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Silvio Fiorito
lt;saif.a.ell...@wellsfargo.com> Date: Friday, June 3, 2016 at 8:31 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Strategies for propery load-balanced partitioning Hello everyone! I was noticing that, when reading parquet files or actually any kind of source data fram

RE: Strategies for propery load-balanced partitioning

2016-06-03 Thread Saif.A.Ellafi
: user; Reynold Xin; mich...@databricks.com Subject: Re: Strategies for propery load-balanced partitioning I suppose you are running on 1.6. I guess you need some solution based on [1], [2] features which are coming in 2.0. [1] https://issues.apache.org/jira/browse/SPARK-12538 / https

Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Ovidiu-Cristian MARCU
I suppose you are running on 1.6. I guess you need some solution based on [1], [2] features which are coming in 2.0. [1] https://issues.apache.org/jira/browse/SPARK-12538 / https://issues.apache.org/jira/browse/SPARK-12394

Strategies for propery load-balanced partitioning

2016-06-03 Thread Saif.A.Ellafi
Hello everyone! I was noticing that, when reading parquet files or actually any kind of source data frame data (spark-csv, etc), default partinioning is not fair. Action tasks usually act very fast on some partitions and very slow on some others, and frequently, even fast on all but last