You can look at the change here http://issues.apache.org/jira/browse/HARMONY-2203
On 11/16/06, Evgueni Brevnov <[EMAIL PROTECTED]> wrote:
I haven't published it yet...will file a JIRA soon... On 11/16/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > ah. whew. > > can you point me to that change you made? > > geir > > Evgueni Brevnov wrote: > > I'm not aware if classlib uses SIGUSR2. In this particular case > > classlib (to be more precise it is the portlib module) does sem_wait > > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait" > > with "while (hysem_wait() != 0) {}". It helped to pass all tests. > > > > Evgueni > > > > On 11/16/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > >> um... classlib uses SIGUSR2 as well? Doesn't our thread manager use it? > >> > >> Evgueni Brevnov wrote: > >> > Hey, > >> > > >> > Seems like the pretty old problem shows itself again. I'm talking > >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter > >> > uses system semaphores for synchronization purposes...and hysem_wait > >> > is interrupted by the signal: > >> > > >> > (gdb) p perror("sym_wait error:") > >> > sym_wait error:: Interrupted system call > >> > > >> > Do we have good (universal) solution for such cases? > >> > > >> > Thanks > >> > Evgueni > >> > > >> > On 11/15/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > >> >> > >> >> > >> >> Gregory Shimansky wrote: > >> >> > Evgueni Brevnov wrote: > >> >> >> hmmm.... strange. The patch was tested on multi-processor system > >> >> >> running SUSE9. I will check if the patch misses something. > >> Anyway, we > >> >> >> need to wait with the patch submission until we 100% sure how > >> >> >> hythread_monitor_init should behave. > >> >> >> > >> >> >> Thanks > >> >> >> Evgueni > >> >> >> > >> >> >> On 11/11/06, Gregory Shimansky <[EMAIL PROTECTED]> wrote: > >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote: > >> >> >>> > Hi, > >> >> >>> > > >> >> >>> > While investigating deadlock scenario which is described in > >> >> >>> > HARMONY-2006 I found out one interesting thing. It turned out > >> >> that DRL > >> >> >>> > implementation of hythread_monitor_init / > >> >> >>> > hythread_monitor_init_with_name initializes and acquires a > >> monitor. > >> >> >>> > Original spec reads: "Acquire and initialize a new monitor > >> from the > >> >> >>> > threading library...." AFAIU that doesn't mean to lock the > >> >> monitor but > >> >> >>> > get it from the threading library. So the hythread_monitor_init > >> >> should > >> >> >>> > not lock the monitor. > >> >> >>> > > >> >> >>> > Could somebody comment on that? > >> >> >>> > >> >> >>> It might be that semantic is different on different platforms > >> >> which is > >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly > >> all of > >> >> >>> acceptance tests on Linux while everything on Windows works (ok I > >> >> >>> tested on > >> >> >>> laptop with 1 processor while Linux was a HT server, sometimes > >> it is > >> >> >>> important for threading). > >> >> > > >> >> > I've tried to investigate the problem but didn't find the end of it > >> >> yet. > >> >> > The bug seems to be ubuntu specific (<joke>shall we maybe call this > >> >> > distribution buggy and move on?</joke>). > >> >> > >> >> There is something odd about it, I'll admit... Remember the EOMEM > >> bugs > >> >> I found in forking? > >> >> > >> >> > >> >> I didn't reproduce it on > >> >> > gentoo, all tests work just fine. > >> >> > > >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, > >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest, > >> >> stress.WeakHashMapTest VM > >> >> > segfaults. The stack looks like an infinite recursion of 4 stack > >> >> frames: > >> >> > > >> >> > #0 0xb6dcb814 in null_java_reference_handler (signum=11, > >> >> > info=0xb71a503c, context=0xb71a50bc) at > >> >> > > >> >> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > >> >> > re/src/util/linux/signals_ia32.cpp:443 > >> >> > #1 <signal handler called> > >> >> > #2 0xb6dcc20a in get_stack_addr () at > >> >> > > >> >> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > >> >> > re/src/util/linux/signals_ia32.cpp:293 > >> >> > #3 0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, > >> uc=0xb71a54ec) > >> >> > at > >> >> > > >> >> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > >> >> > re/src/util/linux/signals_ia32.cpp:399 > >> >> > #4 0xb6dcb900 in null_java_reference_handler (signum=11, > >> >> > info=0xb71a546c, context=0xb71a54ec) at > >> >> > > >> >> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > >> >> > re/src/util/linux/signals_ia32.cpp:451 > >> >> > > >> >> > and so on. The stack is very long. When I run VM with > >> -Xtrace:signals I > >> >> > get a very long log of messages that "NPE or SOE detected at > >> ...". The > >> >> > first time address always varies, but it appears to be memcpy. > >> The next > >> >> > addresses are always the same, they point to get_stack_addr > >> function. > >> >> > > >> >> > So I tried to find out why memcpy crashes in the first place. It > >> >> appears > >> >> > to be a struct copy called from jsig_handler hysig. The stack looks > >> >> like > >> >> > this (if I can trust gdb on ubuntu): > >> >> > > >> >> > #0 0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6 > >> >> > #1 0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, > >> uc=0x0) > >> >> > at hysigunix.c:169 > >> >> > #2 0xb7f9ec8b in asynchSignalReporter (userData=0x0) at > >> hysignal.c:971 > >> >> > #3 0xb7baa8ef in thread_start_proc (thd=0x807a8e8, > >> p_args=0x807a8d8) > >> >> > at > >> >> > > >> >> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712 > >> > >> >> > >> >> > > >> >> > #4 0xb7bb0ed4 in dummy_worker (opaque=0x0) at > >> >> threadproc/unix/thread.c:138 > >> >> > #5 0xb7b65341 in start_thread () from > >> >> lib/tls/i686/cmov/libpthread.so.0 > >> >> > #6 0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6 > >> >> > > >> >> > In jsig_handler a struct of type sigaction is copied > >> >> > > >> >> > act = saved_sigaction[sig]; > >> >> > > >> >> > and gcc replaces this statement with a call to memcpy it seems. > >> But the > >> >> > parameter sig is quite weird if you look at it. It is > >> >> sig=-1215196204... > >> >> > Now if I could only find where and this sig happened there... I > >> cannot > >> >> > find it in the depth of classlib native code this late at night. > >> >> > > >> >> > >> >> > >> > > >> > > >