I think that 1.2 is a lost cause in this regard - I thought we were just
looking forward on the trunk.


On 6/11/07 8:17 AM, "Brian Barrett" <bbarr...@lanl.gov> wrote:

> Yes, this is a known issue.  I don't know -- are we trying to make
> threads work on the 1.2 branch, or just the trunk?  I had thought
> just the trunk?
> 
> Brian
> 
> 
> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
> 
>> I had similar problems on the trunk, which was fixed by Brian with
>> r14877.
>> 
>> Perhaps 1.2 needs something similar?
>> 
>> Tim
>> 
>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>> Per the teleconf last week, I have started to revamp the Cisco MTT
>>> infrastructure to do simplistic thread testing.  Specifically, I'm
>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>> enable-mpi-threads".
>>> 
>>> I haven't switched this into my production MTT setup yet, but in the
>>> first trial runs, I'm noticing a segv in the test/threads/
>>> opal_condition program.
>>> 
>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>> opal_progress() underneath the condition variable wait, at some point
>>> in there current_base is getting to be NULL.  Hence, the following
>>> segv's because the passed in value of "base" is NULL (event.c):
>>> 
>>> int
>>> opal_event_base_loop(struct event_base *base, int flags)
>>> {
>>>          const struct opal_eventop *evsel = base->evsel;
>>> ...
>>> 
>>> Here's the full call stack:
>>> 
>>> #0  0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>>>      at event.c:520
>>> #1  0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514
>>> #2  0x0000002a95599111 in opal_progress () at runtime/
>>> opal_progress.c:
>>> 259
>>> #3  0x00000000004012c8 in opal_condition_wait (c=0x5025a0,
>>> m=0x502600)
>>>      at ../../opal/threads/condition.h:81
>>> #4  0x0000000000401146 in thr1_run (obj=0x503110) at
>>> opal_condition.c:46
>>> #5  0x00000036e290610a in start_thread () from /lib64/tls/
>>> libpthread.so.0
>>> #6  0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>> 
>>> This test seems to work fine on the trunk (at least, it didn't segv
>>> in my small number of trail runs).
>>> 
>>> Is this a known problem in the 1.2 branch?  Should I skip the thread
>>> testing on the 1.2 branch and concentrate on the trunk?
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to