Gregory Shimansky wrote:
Evgueni Brevnov wrote:
hmmm.... strange. The patch was tested on multi-processor system
running SUSE9. I will check if the patch misses something. Anyway, we
need to wait with the patch submission until we 100% sure how
hythread_monitor_init should behave.

Thanks
Evgueni

On 11/11/06, Gregory Shimansky <[EMAIL PROTECTED]> wrote:
On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> Hi,
>
> While investigating deadlock scenario which is described in
> HARMONY-2006 I found out one interesting thing. It turned out that DRL
> implementation of hythread_monitor_init /
> hythread_monitor_init_with_name initializes and acquires a monitor.
> Original spec reads: "Acquire and initialize a new monitor from the
> threading library...." AFAIU that doesn't mean to lock the monitor but
> get it from the threading library. So the hythread_monitor_init should
> not lock the monitor.
>
> Could somebody comment on that?

It might be that semantic is different on different platforms which is
probably even worse. Your patch in HARMONY-2149 breaks nearly all of
acceptance tests on Linux while everything on Windows works (ok I tested on
laptop with 1 processor while Linux was a HT server, sometimes it is
important for threading).

I've tried to investigate the problem but didn't find the end of it yet. The bug seems to be ubuntu specific (<joke>shall we maybe call this distribution buggy and move on?</joke>).

There is something odd about it, I'll admit... Remember the EOMEM bugs I found in forking?


I didn't reproduce it on
gentoo, all tests work just fine.

The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM segfaults. The stack looks like an infinite recursion of 4 stack frames:

#0 0xb6dcb814 in null_java_reference_handler (signum=11, info=0xb71a503c, context=0xb71a50bc) at /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:443
#1  <signal handler called>
#2 0xb6dcc20a in get_stack_addr () at /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:293
#3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
at /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:399
#4 0xb6dcb900 in null_java_reference_handler (signum=11, info=0xb71a546c, context=0xb71a54ec) at /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:451

and so on. The stack is very long. When I run VM with -Xtrace:signals I get a very long log of messages that "NPE or SOE detected at ...". The first time address always varies, but it appears to be memcpy. The next addresses are always the same, they point to get_stack_addr function.

So I tried to find out why memcpy crashes in the first place. It appears to be a struct copy called from jsig_handler hysig. The stack looks like this (if I can trust gdb on ubuntu):

#0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0) at hysigunix.c:169
#2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
#3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
at /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
#4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
#5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
#6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6

In jsig_handler a struct of type sigaction is copied

act = saved_sigaction[sig];

and gcc replaces this statement with a call to memcpy it seems. But the parameter sig is quite weird if you look at it. It is sig=-1215196204... Now if I could only find where and this sig happened there... I cannot find it in the depth of classlib native code this late at night.


Reply via email to