Gregory Shimansky wrote:
> Evgueni Brevnov wrote:
>> hmmm.... strange. The patch was tested on multi-processor system
>> running SUSE9. I will check if the patch misses something. Anyway, we
>> need to wait with the patch submission until we 100% sure how
>> hythread_monitor_init should behave.
>>
>> Thanks
>> Evgueni
>>
>> On 11/11/06, Gregory Shimansky <[EMAIL PROTECTED]> wrote:
>>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>>> > Hi,
>>> >
>>> > While investigating deadlock scenario which is described in
>>> > HARMONY-2006 I found out one interesting thing. It turned out
that DRL
>>> > implementation of hythread_monitor_init /
>>> > hythread_monitor_init_with_name initializes and acquires a monitor.
>>> > Original spec reads: "Acquire and initialize a new monitor from the
>>> > threading library...." AFAIU that doesn't mean to lock the
monitor but
>>> > get it from the threading library. So the hythread_monitor_init
should
>>> > not lock the monitor.
>>> >
>>> > Could somebody comment on that?
>>>
>>> It might be that semantic is different on different platforms
which is
>>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
>>> acceptance tests on Linux while everything on Windows works (ok I
>>> tested on
>>> laptop with 1 processor while Linux was a HT server, sometimes it is
>>> important for threading).
>
> I've tried to investigate the problem but didn't find the end of it
yet.
> The bug seems to be ubuntu specific (<joke>shall we maybe call this
> distribution buggy and move on?</joke>).
There is something odd about it, I'll admit... Remember the EOMEM bugs
I found in forking?
I didn't reproduce it on
> gentoo, all tests work just fine.
>
> The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> gc.PhantomReferenceTest, gc.WeakReferenceTest,
stress.WeakHashMapTest VM
> segfaults. The stack looks like an infinite recursion of 4 stack
frames:
>
> #0 0xb6dcb814 in null_java_reference_handler (signum=11,
> info=0xb71a503c, context=0xb71a50bc) at
>
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:443
> #1 <signal handler called>
> #2 0xb6dcc20a in get_stack_addr () at
>
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:293
> #3 0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
> at
>
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:399
> #4 0xb6dcb900 in null_java_reference_handler (signum=11,
> info=0xb71a546c, context=0xb71a54ec) at
>
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:451
>
> and so on. The stack is very long. When I run VM with -Xtrace:signals I
> get a very long log of messages that "NPE or SOE detected at ...". The
> first time address always varies, but it appears to be memcpy. The next
> addresses are always the same, they point to get_stack_addr function.
>
> So I tried to find out why memcpy crashes in the first place. It
appears
> to be a struct copy called from jsig_handler hysig. The stack looks
like
> this (if I can trust gdb on ubuntu):
>
> #0 0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> #1 0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0)
> at hysigunix.c:169
> #2 0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
> #3 0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
> at
>
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
>
> #4 0xb7bb0ed4 in dummy_worker (opaque=0x0) at
threadproc/unix/thread.c:138
> #5 0xb7b65341 in start_thread () from
lib/tls/i686/cmov/libpthread.so.0
> #6 0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
>
> In jsig_handler a struct of type sigaction is copied
>
> act = saved_sigaction[sig];
>
> and gcc replaces this statement with a call to memcpy it seems. But the
> parameter sig is quite weird if you look at it. It is
sig=-1215196204...
> Now if I could only find where and this sig happened there... I cannot
> find it in the depth of classlib native code this late at night.
>