Re: Spark shuffle: FileNotFound exception

2016-12-04 Thread Evgenii Morozov
Swapnil, What do you think might be the size of the file that’s not found? For spark version below 2.0.0 there might be issues with blocks of size 2g. Is the file actually on a file system? I’d try to increase default parallelism to make sure partitions got smaller. Hope, this helps. > On

Spark shuffle: FileNotFound exception

2016-12-03 Thread Swapnil Shinde
Hello All I am facing FileNotFoundException for shuffle index file when running job with large data. Same job runs fine with smaller datasets. These our my cluster specifications - No of nodes - 19 Total cores - 380 Memory per executor - 32G Spark 1.6 mapr version

Shuffle FileNotFound Exception

2015-11-18 Thread Tom Arnfeld
Hey, I’m wondering if anyone has run into issues with Spark 1.5 and a FileNotFound exception with shuffle.index files? It’s been cropping up with very large joins and aggregations, and causing all of our jobs to fail towards the end. The memory limit for the executors (we’re running on mesos)

Re: Shuffle FileNotFound Exception

2015-11-18 Thread Romi Kuntsman
take executor memory times spark.shuffle.memoryFraction and divide the data so that each partition is less than the above *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Nov 18, 2015 at 2:09 PM, Tom Arnfeld wrote: > Hi Romi, > > Thanks! Could you give me an

Re: Shuffle FileNotFound Exception

2015-11-18 Thread Tom Arnfeld
Hi Romi, Thanks! Could you give me an indication of how much increase the partitions by? We’ll take a stab in the dark, the input data is around 5M records (though each record is fairly small). We’ve had trouble both with DataFrames and RDDs. Tom. > On 18 Nov 2015, at 12:04, Romi Kuntsman