Re: sc.textFile the number of the workers to parallelize

2016-02-04 Thread Takeshi Yamamuro
Hi, ISTM these tasks are just assigned with executors in preferred nodes, so how about repartitioning rdd? s3File.repartition(9).count On Fri, Feb 5, 2016 at 5:04 AM, Lin, Hao wrote: > Hi, > > > > I have a question on the number of workers that Spark enable to > parallelize

Re: sc.textFile the number of the workers to parallelize

2016-02-04 Thread Koert Kuipers
increase minPartitions: sc.textFile(path, minPartitions = 9) On Thu, Feb 4, 2016 at 11:41 PM, Takeshi Yamamuro wrote: > Hi, > > ISTM these tasks are just assigned with executors in preferred nodes, so > how about repartitioning rdd? > > s3File.repartition(9).count > > On

sc.textFile the number of the workers to parallelize

2016-02-04 Thread Lin, Hao
Hi, I have a question on the number of workers that Spark enable to parallelize the loading of files using sc.textFile. When I used sc.textFile to access multiple files in AWS S3, it seems to only enable 2 workers regardless of how many worker nodes I have in my cluster. So how does Spark