Hi, Could you share the code with foreachPartition?
Jacek 11.03.2016 7:33 PM "Darin McBeath" <ddmcbe...@yahoo.com> napisał(a): > > > I can verify this by looking at the log file for the workers. > > Since I output logging statements in the object called by the > foreachPartition, I can see the statements being logged. Oddly, these > output statements only occur in one executor (and not the other). It > occurs 16 times in this executor since there are 16 partitions. This > seems odd as there are only 8 cores on the executor and the other executor > doesn't appear to be called at all in the foreachPartition call. But, when > I go to do a map function on this same RDD then things start blowing up on > the other executor as it starts doing work for some partitions (although, > it would appear that all partitions were only initialized on the other > executor). The executor that was used in the foreachPartition call works > fine and doesn't experience issue. But, because the other executor is > failing on every request the job dies. > > Darin. > > > ________________________________ > From: Jacek Laskowski <ja...@japila.pl> > To: Darin McBeath <ddmcbe...@yahoo.com> > Cc: user <user@spark.apache.org> > Sent: Friday, March 11, 2016 1:24 PM > Subject: Re: spark 1.6 foreachPartition only appears to be running on one > executor > > > > Hi, > How do you check which executor is used? Can you include a screenshot of > the master's webUI with workers? > Jacek > 11.03.2016 6:57 PM "Darin McBeath" <ddmcbe...@yahoo.com.invalid> > napisał(a): > > I've run into a situation where it would appear that foreachPartition is > only running on one of my executors. > > > >I have a small cluster (2 executors with 8 cores each). > > > >When I run a job with a small file (with 16 partitions) I can see that > the 16 partitions are initialized but they all appear to be initialized on > only one executor. All of the work then runs on this one executor (even > though the number of partitions is 16). This seems odd, but at least it > works. Not sure why the other executor was not used. > > > >However, when I run a larger file (once again with 16 partitions) I can > see that the 16 partitions are initialized once again (but all on the same > executor). But, this time subsequent work is now spread across the 2 > executors. This of course results in problems because the other executor > was not initialized as all of the partitions were only initialized on the > other executor. > > > >Does anyone have any suggestions for where I might want to investigate? > Has anyone else seen something like this before? Any thoughts/insights > would be appreciated. I'm using the Stand Alone Cluster manager, cluster > started with the spark ec2 scripts and submitting my job using > spark-submit. > > > >Thanks. > > > >Darin. > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >For additional commands, e-mail: user-h...@spark.apache.org > > > > >