Re: [OMPI devel] threaded builds

2007-06-12 Thread Richard Graham
We should not pretend that threads work in the 1.2 code branch.  Thread
safety has been
 designed in, but we are just kicking off an effort to complete and verify
the thread
 safety.

Rich


On 6/11/07 2:49 PM, "Paul H. Hargrove"  wrote:

> If Jeff has the resources to run threaded tests against 1.2, *and* to
> examine the results, then it might be valuable to have a summary the
> known threading issues in 1.2 written down somewhere for the benefit of
> those who don't chase the trunk.
> 
> -Paul
> 
> Graham, Richard L. wrote:
>> > I would second this - thread safety should be a 1.3 item, unless someone
>> has a lot of spare time.
>> >
>> > Rich
>> >
>> > -Original Message-
>> > From: devel-boun...@open-mpi.org 
>> > To: Open MPI Developers 
>> > Sent: Mon Jun 11 10:44:33 2007
>> > Subject: Re: [OMPI devel] threaded builds
>> >
>> >
>> > On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:
>> >
>> >  
>>> >> I leave it to the thread subgroup to decide...  Should we discuss on
>>> >> the call tomorrow?
>>> >>
>>> >> I don't have a strong opinion; I was just testing both because it was
>>> >> easy to do so.  If we want to concentrate on the trunk, I can adjust
>>> >> my MTT setup.
>>> >>
>>> >>
>> >
>> > I think trying to worry about 1.2 would just be a time sink. We know
>> > that there are architectural issues with threads in some parts of the
>> > code. I don't see us re-architecting 1.2 in this regard.
>> > Seems we should only focus on the trunk.
>> >
>> >
>> > - Galen
>> >
>> >
>> >  
>>> >> On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:
>>> >>
>>> >>
>>>> >>> Yes, this is a known issue.  I don't know -- are we trying to make
>>>> >>> threads work on the 1.2 branch, or just the trunk?  I had thought
>>>> >>> just the trunk?
>>>> >>>
>>>> >>> Brian
>>>> >>>
>>>> >>>
>>>> >>> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
>>>> >>>
>>>> >>>  
>>>>> >>>> I had similar problems on the trunk, which was fixed by Brian with
>>>>> >>>> r14877.
>>>>> >>>>
>>>>> >>>> Perhaps 1.2 needs something similar?
>>>>> >>>>
>>>>> >>>> Tim
>>>>> >>>>
>>>>> >>>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>>>> >>>>
>>>>>> >>>>> Per the teleconf last week, I have started to revamp the Cisco MTT
>>>>>> >>>>> infrastructure to do simplistic thread testing.  Specifically, I'm
>>>>>> >>>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>>>>> >>>>> enable-mpi-threads".
>>>>>> >>>>>
>>>>>> >>>>> I haven't switched this into my production MTT setup yet, but in
>>>>>> >>>>> the
>>>>>> >>>>> first trial runs, I'm noticing a segv in the test/threads/
>>>>>> >>>>> opal_condition program.
>>>>>> >>>>>
>>>>>> >>>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>>>>> >>>>> opal_progress() underneath the condition variable wait, at some
>>>>>> >>>>> point
>>>>>> >>>>> in there current_base is getting to be NULL.  Hence, the following
>>>>>> >>>>> segv's because the passed in value of "base" is NULL (event.c):
>>>>>> >>>>>
>>>>>> >>>>> int
>>>>>> >>>>> opal_event_base_loop(struct event_base *base, int flags)
>>>>>> >>>>> {
>>>>>> >>>>>  const struct opal_eventop *evsel = base->evsel;
>>>>>> >>>>> ...
>>>>>> >>>>>
>>>>>> >>>>> Here's the full call stack:
>>>>>> 

Re: [OMPI devel] threaded builds

2007-06-12 Thread Jeff Squyres

Heh.  I don't.  :-)

Well, I should specify: since the group is [pretty strongly] leaning  
towards threading being the issue for 1.3, then it makes sense to  
dedicate my resources elsewhere (rather than 1.2 thread testing).



On Jun 11, 2007, at 2:49 PM, Paul H. Hargrove wrote:


If Jeff has the resources to run threaded tests against 1.2, *and* to
examine the results, then it might be valuable to have a summary the
known threading issues in 1.2 written down somewhere for the  
benefit of

those who don't chase the trunk.

-Paul

Graham, Richard L. wrote:
I would second this - thread safety should be a 1.3 item, unless  
someone has a lot of spare time.


Rich

-Original Message-
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Mon Jun 11 10:44:33 2007
Subject: Re: [OMPI devel] threaded builds


On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:



I leave it to the thread subgroup to decide...  Should we discuss on
the call tomorrow?

I don't have a strong opinion; I was just testing both because it  
was

easy to do so.  If we want to concentrate on the trunk, I can adjust
my MTT setup.




I think trying to worry about 1.2 would just be a time sink. We know
that there are architectural issues with threads in some parts of the
code. I don't see us re-architecting 1.2 in this regard.
Seems we should only focus on the trunk.


- Galen




On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:



Yes, this is a known issue.  I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk?  I had thought
just the trunk?

Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:



I had similar problems on the trunk, which was fixed by Brian with
r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:

Per the teleconf last week, I have started to revamp the Cisco  
MTT
infrastructure to do simplistic thread testing.  Specifically,  
I'm

building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in
the
first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some
point
in there current_base is getting to be NULL.  Hence, the  
following

segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
 const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0,  
flags=5)

 at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c: 
514

#2  0x002a95599111 in opal_progress () at runtime/
opal_progress.c:
259
#3  0x004012c8 in opal_condition_wait (c=0x5025a0,
m=0x502600)
 at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at
opal_condition.c:46
#5  0x0036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't  
segv

in my small number of trail runs).

Is this a known problem in the 1.2 branch?  Should I skip the
thread
testing on the 1.2 branch and concentrate on the trunk?


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] threaded builds

2007-06-11 Thread Paul H. Hargrove
If Jeff has the resources to run threaded tests against 1.2, *and* to 
examine the results, then it might be valuable to have a summary the 
known threading issues in 1.2 written down somewhere for the benefit of 
those who don't chase the trunk.


-Paul

Graham, Richard L. wrote:

I would second this - thread safety should be a 1.3 item, unless someone has a 
lot of spare time.

Rich

-Original Message-
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Mon Jun 11 10:44:33 2007
Subject: Re: [OMPI devel] threaded builds


On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:

  

I leave it to the thread subgroup to decide...  Should we discuss on
the call tomorrow?

I don't have a strong opinion; I was just testing both because it was
easy to do so.  If we want to concentrate on the trunk, I can adjust
my MTT setup.




I think trying to worry about 1.2 would just be a time sink. We know  
that there are architectural issues with threads in some parts of the  
code. I don't see us re-architecting 1.2 in this regard.

Seems we should only focus on the trunk.


- Galen


  

On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:



Yes, this is a known issue.  I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk?  I had thought
just the trunk?

Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:

  

I had similar problems on the trunk, which was fixed by Brian with
r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:


Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing.  Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in  
the

first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some
point
in there current_base is getting to be NULL.  Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
 const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
 at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x002a95599111 in opal_progress () at runtime/
opal_progress.c:
259
#3  0x004012c8 in opal_condition_wait (c=0x5025a0,
m=0x502600)
 at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at
opal_condition.c:46
#5  0x0036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch?  Should I skip the  
thread

testing on the 1.2 branch and concentrate on the trunk?
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group 
HPC Research Department   Tel: +1-510-495-2352

Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] threaded builds

2007-06-11 Thread Graham, Richard L.
I would second this - thread safety should be a 1.3 item, unless someone has a 
lot of spare time.

Rich

-Original Message-
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Mon Jun 11 10:44:33 2007
Subject: Re: [OMPI devel] threaded builds


On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:

> I leave it to the thread subgroup to decide...  Should we discuss on
> the call tomorrow?
>
> I don't have a strong opinion; I was just testing both because it was
> easy to do so.  If we want to concentrate on the trunk, I can adjust
> my MTT setup.
>

I think trying to worry about 1.2 would just be a time sink. We know  
that there are architectural issues with threads in some parts of the  
code. I don't see us re-architecting 1.2 in this regard.
Seems we should only focus on the trunk.


- Galen


>
> On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:
>
>> Yes, this is a known issue.  I don't know -- are we trying to make
>> threads work on the 1.2 branch, or just the trunk?  I had thought
>> just the trunk?
>>
>> Brian
>>
>>
>> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
>>
>>> I had similar problems on the trunk, which was fixed by Brian with
>>> r14877.
>>>
>>> Perhaps 1.2 needs something similar?
>>>
>>> Tim
>>>
>>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>>> Per the teleconf last week, I have started to revamp the Cisco MTT
>>>> infrastructure to do simplistic thread testing.  Specifically, I'm
>>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>>> enable-mpi-threads".
>>>>
>>>> I haven't switched this into my production MTT setup yet, but in  
>>>> the
>>>> first trial runs, I'm noticing a segv in the test/threads/
>>>> opal_condition program.
>>>>
>>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>>> opal_progress() underneath the condition variable wait, at some
>>>> point
>>>> in there current_base is getting to be NULL.  Hence, the following
>>>> segv's because the passed in value of "base" is NULL (event.c):
>>>>
>>>> int
>>>> opal_event_base_loop(struct event_base *base, int flags)
>>>> {
>>>>  const struct opal_eventop *evsel = base->evsel;
>>>> ...
>>>>
>>>> Here's the full call stack:
>>>>
>>>> #0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>>>>  at event.c:520
>>>> #1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
>>>> #2  0x002a95599111 in opal_progress () at runtime/
>>>> opal_progress.c:
>>>> 259
>>>> #3  0x004012c8 in opal_condition_wait (c=0x5025a0,
>>>> m=0x502600)
>>>>  at ../../opal/threads/condition.h:81
>>>> #4  0x00401146 in thr1_run (obj=0x503110) at
>>>> opal_condition.c:46
>>>> #5  0x0036e290610a in start_thread () from /lib64/tls/
>>>> libpthread.so.0
>>>> #6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>>>> #7  0x in ?? ()
>>>>
>>>> This test seems to work fine on the trunk (at least, it didn't segv
>>>> in my small number of trail runs).
>>>>
>>>> Is this a known problem in the 1.2 branch?  Should I skip the  
>>>> thread
>>>> testing on the 1.2 branch and concentrate on the trunk?
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] threaded builds

2007-06-11 Thread Galen Shipman


On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:


I leave it to the thread subgroup to decide...  Should we discuss on
the call tomorrow?

I don't have a strong opinion; I was just testing both because it was
easy to do so.  If we want to concentrate on the trunk, I can adjust
my MTT setup.



I think trying to worry about 1.2 would just be a time sink. We know  
that there are architectural issues with threads in some parts of the  
code. I don't see us re-architecting 1.2 in this regard.

Seems we should only focus on the trunk.


- Galen




On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:


Yes, this is a known issue.  I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk?  I had thought
just the trunk?

Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:


I had similar problems on the trunk, which was fixed by Brian with
r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:

Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing.  Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in  
the

first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some
point
in there current_base is getting to be NULL.  Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
 const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
 at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x002a95599111 in opal_progress () at runtime/
opal_progress.c:
259
#3  0x004012c8 in opal_condition_wait (c=0x5025a0,
m=0x502600)
 at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at
opal_condition.c:46
#5  0x0036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch?  Should I skip the  
thread

testing on the 1.2 branch and concentrate on the trunk?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] threaded builds

2007-06-11 Thread Jeff Squyres
I leave it to the thread subgroup to decide...  Should we discuss on  
the call tomorrow?


I don't have a strong opinion; I was just testing both because it was  
easy to do so.  If we want to concentrate on the trunk, I can adjust  
my MTT setup.



On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:


Yes, this is a known issue.  I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk?  I had thought
just the trunk?

Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:


I had similar problems on the trunk, which was fixed by Brian with
r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:

Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing.  Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in the
first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some  
point

in there current_base is getting to be NULL.  Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
 const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
 at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x002a95599111 in opal_progress () at runtime/
opal_progress.c:
259
#3  0x004012c8 in opal_condition_wait (c=0x5025a0,
m=0x502600)
 at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at
opal_condition.c:46
#5  0x0036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch?  Should I skip the thread
testing on the 1.2 branch and concentrate on the trunk?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] threaded builds

2007-06-11 Thread Ralph H Castain
I think that 1.2 is a lost cause in this regard - I thought we were just
looking forward on the trunk.


On 6/11/07 8:17 AM, "Brian Barrett"  wrote:

> Yes, this is a known issue.  I don't know -- are we trying to make
> threads work on the 1.2 branch, or just the trunk?  I had thought
> just the trunk?
> 
> Brian
> 
> 
> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:
> 
>> I had similar problems on the trunk, which was fixed by Brian with
>> r14877.
>> 
>> Perhaps 1.2 needs something similar?
>> 
>> Tim
>> 
>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
>>> Per the teleconf last week, I have started to revamp the Cisco MTT
>>> infrastructure to do simplistic thread testing.  Specifically, I'm
>>> building the OMPI trunk and v1.2 branches with "--with-threads --
>>> enable-mpi-threads".
>>> 
>>> I haven't switched this into my production MTT setup yet, but in the
>>> first trial runs, I'm noticing a segv in the test/threads/
>>> opal_condition program.
>>> 
>>> It seems that in the thr1 test on the v1.2 branch, when it calls
>>> opal_progress() underneath the condition variable wait, at some point
>>> in there current_base is getting to be NULL.  Hence, the following
>>> segv's because the passed in value of "base" is NULL (event.c):
>>> 
>>> int
>>> opal_event_base_loop(struct event_base *base, int flags)
>>> {
>>>  const struct opal_eventop *evsel = base->evsel;
>>> ...
>>> 
>>> Here's the full call stack:
>>> 
>>> #0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>>>  at event.c:520
>>> #1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
>>> #2  0x002a95599111 in opal_progress () at runtime/
>>> opal_progress.c:
>>> 259
>>> #3  0x004012c8 in opal_condition_wait (c=0x5025a0,
>>> m=0x502600)
>>>  at ../../opal/threads/condition.h:81
>>> #4  0x00401146 in thr1_run (obj=0x503110) at
>>> opal_condition.c:46
>>> #5  0x0036e290610a in start_thread () from /lib64/tls/
>>> libpthread.so.0
>>> #6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
>>> #7  0x in ?? ()
>>> 
>>> This test seems to work fine on the trunk (at least, it didn't segv
>>> in my small number of trail runs).
>>> 
>>> Is this a known problem in the 1.2 branch?  Should I skip the thread
>>> testing on the 1.2 branch and concentrate on the trunk?
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] threaded builds

2007-06-11 Thread Brian Barrett
Yes, this is a known issue.  I don't know -- are we trying to make  
threads work on the 1.2 branch, or just the trunk?  I had thought  
just the trunk?


Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:

I had similar problems on the trunk, which was fixed by Brian with  
r14877.


Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:

Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing.  Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in the
first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some point
in there current_base is getting to be NULL.  Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
 const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
 at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x002a95599111 in opal_progress () at runtime/ 
opal_progress.c:

259
#3  0x004012c8 in opal_condition_wait (c=0x5025a0,  
m=0x502600)

 at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at  
opal_condition.c:46

#5  0x0036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch?  Should I skip the thread
testing on the 1.2 branch and concentrate on the trunk?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] threaded builds

2007-06-11 Thread Tim Prins
I had similar problems on the trunk, which was fixed by Brian with r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
> Per the teleconf last week, I have started to revamp the Cisco MTT
> infrastructure to do simplistic thread testing.  Specifically, I'm
> building the OMPI trunk and v1.2 branches with "--with-threads --
> enable-mpi-threads".
>
> I haven't switched this into my production MTT setup yet, but in the
> first trial runs, I'm noticing a segv in the test/threads/
> opal_condition program.
>
> It seems that in the thr1 test on the v1.2 branch, when it calls
> opal_progress() underneath the condition variable wait, at some point
> in there current_base is getting to be NULL.  Hence, the following
> segv's because the passed in value of "base" is NULL (event.c):
>
> int
> opal_event_base_loop(struct event_base *base, int flags)
> {
>  const struct opal_eventop *evsel = base->evsel;
> ...
>
> Here's the full call stack:
>
> #0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
>  at event.c:520
> #1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
> #2  0x002a95599111 in opal_progress () at runtime/opal_progress.c:
> 259
> #3  0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600)
>  at ../../opal/threads/condition.h:81
> #4  0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46
> #5  0x0036e290610a in start_thread () from /lib64/tls/
> libpthread.so.0
> #6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
> #7  0x in ?? ()
>
> This test seems to work fine on the trunk (at least, it didn't segv
> in my small number of trail runs).
>
> Is this a known problem in the 1.2 branch?  Should I skip the thread
> testing on the 1.2 branch and concentrate on the trunk?


[OMPI devel] threaded builds

2007-06-11 Thread Jeff Squyres
Per the teleconf last week, I have started to revamp the Cisco MTT  
infrastructure to do simplistic thread testing.  Specifically, I'm  
building the OMPI trunk and v1.2 branches with "--with-threads -- 
enable-mpi-threads".


I haven't switched this into my production MTT setup yet, but in the  
first trial runs, I'm noticing a segv in the test/threads/ 
opal_condition program.


It seems that in the thr1 test on the v1.2 branch, when it calls  
opal_progress() underneath the condition variable wait, at some point  
in there current_base is getting to be NULL.  Hence, the following  
segv's because the passed in value of "base" is NULL (event.c):


int
opal_event_base_loop(struct event_base *base, int flags)
{
const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x002a955a020e in opal_event_base_loop (base=0x0, flags=5)
at event.c:520
#1  0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x002a95599111 in opal_progress () at runtime/opal_progress.c: 
259

#3  0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600)
at ../../opal/threads/condition.h:81
#4  0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46
#5  0x0036e290610a in start_thread () from /lib64/tls/ 
libpthread.so.0

#6  0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv  
in my small number of trail runs).


Is this a known problem in the 1.2 branch?  Should I skip the thread  
testing on the 1.2 branch and concentrate on the trunk?


--
Jeff Squyres
Cisco Systems