So, 1. I reduced my -XX:ThreadStackSize to 5m (instead of 10m - default is 1m), which is still OK for my need. 2. I reduced the executor memory to 44GB for a 60GB machine (instead of 49GB).
This seems to have helped. Thanks to Matthew and Sean. Thomas On Tue, Mar 24, 2015 at 3:49 PM, Matt Silvey <matt.sil...@videoamp.com> wrote: > My memory is hazy on this but aren't there "hidden" limitations to > Linux-based threads? I ran into some issues a couple of years ago where, > and here is the fuzzy part, the kernel wants to reserve virtual memory per > thread equal to the stack size. When the total amount of reserved memory > (not necessarily resident memory) exceeds the memory of the system it > throws an OOM. I'm looking for material to back this up. Sorry for the > initial vague response. > > Matthew > > On Tue, Mar 24, 2015 at 12:53 PM, Thomas Gerber <thomas.ger...@radius.com> > wrote: > >> Additional notes: >> I did not find anything wrong with the number of threads (ps -u USER -L | >> wc -l): around 780 on the master and 400 on executors. I am running on 100 >> r3.2xlarge. >> >> On Tue, Mar 24, 2015 at 12:38 PM, Thomas Gerber <thomas.ger...@radius.com >> > wrote: >> >>> Hello, >>> >>> I am seeing various crashes in spark on large jobs which all share a >>> similar exception: >>> >>> java.lang.OutOfMemoryError: unable to create new native thread >>> at java.lang.Thread.start0(Native Method) >>> at java.lang.Thread.start(Thread.java:714) >>> >>> I increased nproc (i.e. ulimit -u) 10 fold, but it doesn't help. >>> >>> Does anyone know how to avoid those kinds of errors? >>> >>> Noteworthy: I added -XX:ThreadStackSize=10m on both driver and executor >>> extra java options, which might have amplified the problem. >>> >>> Thanks for you help, >>> Thomas >>> >> >> >