Re: [drlvm] stress.Mix / MegaSpawn threading bug

Aleksey Ignatenko Thu, 11 Jan 2007 00:13:17 -0800

On 1/10/07, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:


On Jan 9, 2007, at 1:14 PM, Rana Dasgupta wrote:

> On 1/9/07, Weldon Washburn <[EMAIL PROTECTED]> wrote:
>>
>> On 1/9/07, Gregory Shimansky <[EMAIL PROTECTED]> wrote:
>> >> I've tried to analyze MegaSpawn test on windows and here's what
>> I found
>> >> out.
>> >>
>> >> OOME is thrown because process virtual size easily gets up to
>> 2Gb. This
>> >> happens at about ~1.5k simultaneously running threads. I think it
>> >> happens because all of virtual process memory is mapped for thread
>> stacks.
>> >>
>>
>> >Good job!  I got the same sort of hunch when I looked at the
>> source code
>> did
>> >not have enough time to pin down specifics.  The only guidance I
>> >found regarding what happens when too many threads are spawned is
>> the
>> >following in the java.lang.Thread reference manual, "...specifying a
>> lower
>> >[stacksize] value may allow a greater number of threads to exist
>> >concurrently without throwing an OutOfMemoryError (or other internal
>> >error)."
>>
>> >I think what the above implies is that it is OK for the JVM to
>> error and
>> >exit if the app tries to create too many threads.  If this is the
>> case,
>> it
>> >sort of looks like we need to clean up the handling of malloc()
>> errors so
>> >that the JVM can exit gracefully.
>
>
> I am not sure that we need to do something about this. The default
> initial
> stack size on Windows is 1M,

Yikes!  There's our problem on windows...

> and that is the recommended init size for real
> applications. The fact that our threads start with a larger intial
> stack
> mapped( default ) than RI is a design issue, it is not a bug. We
> could start
> with 2K and create many more threads!

That's right.  The fact that the VM crashes and burns is the bug, and
a serious one, IMO.

> Exactly as Gregory points out,
> ultimately we will hit virtual memory limits and fail. The reason
> the RI
> seems to fail less is that the test ends before running out of virtual
> memory.On my 32 bit RHEL Linux box, RI fails almost every time with
> MegaSpawn, with an identical OOME error message and stack dump.
>
> We can catch the exception in the test and print a message. But I
> am not
> very sure what purpose that would serve. A resource exhaustion
> exception is
> a fatal exception and the process is hosed,

No, it's not.

> no real app would be able to do
> anything more at this point.

That's not true.

> We should not use this test ( which is not a
> real app ) as guidance to tune the initial stack size. My
> suggestion is to
> lower the test duration so that we can create about a 1000( or
> whatever
> magic number ) threads at least. That is the stress condition we
> should test
> for.

The big thing for me is ensuring that we can drive the VM to the
limit, and it maintains internal integrity, so applications that are
designed to gracefully deal with resource exhaustion can do so w/
confidence that the VM isn't about to crumble out from underneath them.



"VM maintains internal integrity" on OOME situations is a good point. And I
have one very interesting idea how to monitor low memory state in runtime
and change VMs behaviour correspondingly:
the sample code below checks systems memory usage level. So at any moment (
e.g. adding new thread or commiting another portions of Java heap) one can
check that the system memory is almost exhausted. I suppose the number of
such places in drlvm where a lot memory is allocated at once is limited.
void exn_throw_if_exhasted(){
if (port_vmem_usage_rate() > UPPER_MEMORY_BORDER)  {
... throw_exception(OOME);
}
}
This functionality will not garantee that we get 100% OOME working, but
variating UPPER_MEMORY_BORDER value we can get low fail rate for such stress
tests like stress.Mix.

Index: vm/port/include/port_vmem.h
===================================================================
--- vm/port/include/port_vmem.h (revision 495134)
+++ vm/port/include/port_vmem.h (working copy)
@@ -94,7 +94,12 @@
*/
APR_DECLARE(size_t *) port_vmem_page_sizes();

+/**
+* Returns % of system memory usage.
+*/
+APR_DECLARE(size_t) port_vmem_usage_rate();

+
#ifdef __cplusplus
}
#endif
Index: vm/port/src/vmem/linux/port_vmem.c
===================================================================
--- vm/port/src/vmem/linux/port_vmem.c (revision 495134)
+++ vm/port/src/vmem/linux/port_vmem.c (working copy)
@@ -20,6 +20,7 @@
 */

#include <sys/mman.h>
+#include <sys/sysinfo.h>
#include <unistd.h>
#include <errno.h>
#include <malloc.h>
@@ -131,6 +132,12 @@
 return page_sizes;
}

+APR_DECLARE(size_t) port_vmem_usage_rate() {
+    struct sysinfo info;
+    sysinfo(&info);
+    return (info.totalram - info.freeram)*100/info.totalram;
+}
+
#ifdef __cplusplus
}
#endif
Index: vm/port/src/vmem/win/port_vmem.c
===================================================================
--- vm/port/src/vmem/win/port_vmem.c (revision 495134)
+++ vm/port/src/vmem/win/port_vmem.c (working copy)
@@ -215,6 +215,14 @@
 return page_sizes;
}

+APR_DECLARE(size_t) port_vmem_usage_rate(){
+    MEMORYSTATUSEX ms;
+    ms.dwLength = sizeof (ms);
+    GlobalMemoryStatusEx(&ms);
+
+ return ms.dwMemoryLoad;
+}
+
#ifdef __cplusplus
}
#endif

What do you think about the idea?

Best regards,
Aleksey.

geir


> a
> Thanks,
> Rana

Re: [drlvm] stress.Mix / MegaSpawn threading bug

Reply via email to