Geir Magnusson Jr. wrote:
I started a new thread because I think this is really important.

I've also added a page in the wiki to track this stuff, because I can't keep it in my head:

  http://wiki.apache.org/harmony/MegaSpawnThreadingBug

which you can get to from the home page via the "WhiteBoards" section, intended to be a place where we can work as a team on a whiteboard, with the intention that once the mini-project is over, we erase...

I think this is a scary scary problem :)

I've tried to analyze MegaSpawn test on windows and here's what I found out.

OOME is thrown because process virtual size easily gets up to 2Gb. This happens at about ~1.5k simultaneously running threads. I think it happens because all of virtual process memory is mapped for thread stacks.

When virtual memory is exhausted all kind of problems may occur. In many places there are assertions that malloc returns non-NULL, and these assertions fail. In some places there are no checks for malloc, and NULL pointer is used for addressing, this also crashes VM.

I tried to watch Sun implementation and it looks like they map smaller amounts of memory for thread stacks. Maybe they map only initial stack memory somehow and allow it to grow later (although I don't quite understand how it is possible in continuous address space). When Sun VM executes this test it created up to ~6k simultaneously running threads and process size at the same moment was smaller than 2Gb.

I think the same problem may happen on Linux because it spills out OOMEs on Ubuntu as well.

If somehow test doesn't crash on failed mallocs and gets to the shutdown stage and hangs with 2 or more dead locked threads. So far I didn't quite understand how they lock each other.

--
Gregory

Reply via email to