On 1/9/07, Gregory Shimansky <[EMAIL PROTECTED]> wrote:
Geir Magnusson Jr. wrote: > I started a new thread because I think this is really important. > > I've also added a page in the wiki to track this stuff, because I can't > keep it in my head: > > http://wiki.apache.org/harmony/MegaSpawnThreadingBug > > which you can get to from the home page via the "WhiteBoards" section, > intended to be a place where we can work as a team on a whiteboard, with > the intention that once the mini-project is over, we erase...
This is a good idea. I still want to put some of the discussion on email so that we have a permanent record of our investigations. I have some thoughts inlined below.
> I think this is a scary scary problem :) I've tried to analyze MegaSpawn test on windows and here's what I found out. OOME is thrown because process virtual size easily gets up to 2Gb. This happens at about ~1.5k simultaneously running threads. I think it happens because all of virtual process memory is mapped for thread stacks. When virtual memory is exhausted all kind of problems may occur. In many places there are assertions that malloc returns non-NULL, and these assertions fail. In some places there are no checks for malloc, and NULL pointer is used for addressing, this also crashes VM.
Good job! I got the same sort of hunch when I looked at the source code did not have enough time to pin down specifics. The only guidance I found regarding what happens when too many threads are spawned is the following in the java.lang.Thread reference manual, "...specifying a lower [stacksize] value may allow a greater number of threads to exist concurrently without throwing an OutOfMemoryError (or other internal error)." I think what the above implies is that it is OK for the JVM to error and exit if the app tries to create too many threads. If this is the case, it sort of looks like we need to clean up the handling of malloc() errors so that the JVM can exit gracefully. Another approach would be to throw something like a, "TooManyThreadsAtOnceException" and keep running the app. I can't find anything like this kind of exception. Its probably not an option. Another approach would be to make Thread.start() method wait until there are enough resources to create a new thread. Most likely the app would hang mysteriously without warning. This is probably not an option either. Another item we need to discuss is what are the Q1/Q2 goals for max number of threads supported? It seems we can do lots of useful stuff with a max of 1500 threads. The useful stuff being items like the bringup of enterprise apps, fixing stability problems... I tried to watch Sun implementation and it looks like they map smaller
amounts of memory for thread stacks. Maybe they map only initial stack memory somehow and allow it to grow later (although I don't quite understand how it is possible in continuous address space). When Sun VM executes this test it created up to ~6k simultaneously running threads and process size at the same moment was smaller than 2Gb. I think the same problem may happen on Linux because it spills out OOMEs on Ubuntu as well. If somehow test doesn't crash on failed mallocs and gets to the shutdown stage and hangs with 2 or more dead locked threads. So far I didn't quite understand how they lock each other. -- Gregory
-- Weldon Washburn Intel Enterprise Solutions Software Division
