Re: [OMPI devel] threaded builds
We should not pretend that threads work in the 1.2 code branch. Thread safety has been designed in, but we are just kicking off an effort to complete and verify the thread safety. Rich On 6/11/07 2:49 PM, "Paul H. Hargrove" wrote: > If Jeff has the resources to run threaded tests against 1.2, *and* to > examine the results, then it might be valuable to have a summary the > known threading issues in 1.2 written down somewhere for the benefit of > those who don't chase the trunk. > > -Paul > > Graham, Richard L. wrote: >> > I would second this - thread safety should be a 1.3 item, unless someone >> has a lot of spare time. >> > >> > Rich >> > >> > -Original Message- >> > From: devel-boun...@open-mpi.org >> > To: Open MPI Developers >> > Sent: Mon Jun 11 10:44:33 2007 >> > Subject: Re: [OMPI devel] threaded builds >> > >> > >> > On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: >> > >> > >>> >> I leave it to the thread subgroup to decide... Should we discuss on >>> >> the call tomorrow? >>> >> >>> >> I don't have a strong opinion; I was just testing both because it was >>> >> easy to do so. If we want to concentrate on the trunk, I can adjust >>> >> my MTT setup. >>> >> >>> >> >> > >> > I think trying to worry about 1.2 would just be a time sink. We know >> > that there are architectural issues with threads in some parts of the >> > code. I don't see us re-architecting 1.2 in this regard. >> > Seems we should only focus on the trunk. >> > >> > >> > - Galen >> > >> > >> > >>> >> On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: >>> >> >>> >> >>>> >>> Yes, this is a known issue. I don't know -- are we trying to make >>>> >>> threads work on the 1.2 branch, or just the trunk? I had thought >>>> >>> just the trunk? >>>> >>> >>>> >>> Brian >>>> >>> >>>> >>> >>>> >>> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: >>>> >>> >>>> >>> >>>>> >>>> I had similar problems on the trunk, which was fixed by Brian with >>>>> >>>> r14877. >>>>> >>>> >>>>> >>>> Perhaps 1.2 needs something similar? >>>>> >>>> >>>>> >>>> Tim >>>>> >>>> >>>>> >>>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: >>>>> >>>> >>>>>> >>>>> Per the teleconf last week, I have started to revamp the Cisco MTT >>>>>> >>>>> infrastructure to do simplistic thread testing. Specifically, I'm >>>>>> >>>>> building the OMPI trunk and v1.2 branches with "--with-threads -- >>>>>> >>>>> enable-mpi-threads". >>>>>> >>>>> >>>>>> >>>>> I haven't switched this into my production MTT setup yet, but in >>>>>> >>>>> the >>>>>> >>>>> first trial runs, I'm noticing a segv in the test/threads/ >>>>>> >>>>> opal_condition program. >>>>>> >>>>> >>>>>> >>>>> It seems that in the thr1 test on the v1.2 branch, when it calls >>>>>> >>>>> opal_progress() underneath the condition variable wait, at some >>>>>> >>>>> point >>>>>> >>>>> in there current_base is getting to be NULL. Hence, the following >>>>>> >>>>> segv's because the passed in value of "base" is NULL (event.c): >>>>>> >>>>> >>>>>> >>>>> int >>>>>> >>>>> opal_event_base_loop(struct event_base *base, int flags) >>>>>> >>>>> { >>>>>> >>>>> const struct opal_eventop *evsel = base->evsel; >>>>>> >>>>> ... >>>>>> >>>>> >>>>>> >>>>> Here's the full call stack: >>>>>>
Re: [OMPI devel] threaded builds
Heh. I don't. :-) Well, I should specify: since the group is [pretty strongly] leaning towards threading being the issue for 1.3, then it makes sense to dedicate my resources elsewhere (rather than 1.2 thread testing). On Jun 11, 2007, at 2:49 PM, Paul H. Hargrove wrote: If Jeff has the resources to run threaded tests against 1.2, *and* to examine the results, then it might be valuable to have a summary the known threading issues in 1.2 written down somewhere for the benefit of those who don't chase the trunk. -Paul Graham, Richard L. wrote: I would second this - thread safety should be a 1.3 item, unless someone has a lot of spare time. Rich -Original Message- From: devel-boun...@open-mpi.org To: Open MPI Developers Sent: Mon Jun 11 10:44:33 2007 Subject: Re: [OMPI devel] threaded builds On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: I leave it to the thread subgroup to decide... Should we discuss on the call tomorrow? I don't have a strong opinion; I was just testing both because it was easy to do so. If we want to concentrate on the trunk, I can adjust my MTT setup. I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard. Seems we should only focus on the trunk. - Galen On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: Yes, this is a known issue. I don't know -- are we trying to make threads work on the 1.2 branch, or just the trunk? I had thought just the trunk? Brian On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c: 514 #2 0x002a95599111 in opal_progress () at runtime/ opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] threaded builds
If Jeff has the resources to run threaded tests against 1.2, *and* to examine the results, then it might be valuable to have a summary the known threading issues in 1.2 written down somewhere for the benefit of those who don't chase the trunk. -Paul Graham, Richard L. wrote: I would second this - thread safety should be a 1.3 item, unless someone has a lot of spare time. Rich -Original Message- From: devel-boun...@open-mpi.org To: Open MPI Developers Sent: Mon Jun 11 10:44:33 2007 Subject: Re: [OMPI devel] threaded builds On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: I leave it to the thread subgroup to decide... Should we discuss on the call tomorrow? I don't have a strong opinion; I was just testing both because it was easy to do so. If we want to concentrate on the trunk, I can adjust my MTT setup. I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard. Seems we should only focus on the trunk. - Galen On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: Yes, this is a known issue. I don't know -- are we trying to make threads work on the 1.2 branch, or just the trunk? I had thought just the trunk? Brian On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 #2 0x002a95599111 in opal_progress () at runtime/ opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] threaded builds
I would second this - thread safety should be a 1.3 item, unless someone has a lot of spare time. Rich -Original Message- From: devel-boun...@open-mpi.org To: Open MPI Developers Sent: Mon Jun 11 10:44:33 2007 Subject: Re: [OMPI devel] threaded builds On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: > I leave it to the thread subgroup to decide... Should we discuss on > the call tomorrow? > > I don't have a strong opinion; I was just testing both because it was > easy to do so. If we want to concentrate on the trunk, I can adjust > my MTT setup. > I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard. Seems we should only focus on the trunk. - Galen > > On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: > >> Yes, this is a known issue. I don't know -- are we trying to make >> threads work on the 1.2 branch, or just the trunk? I had thought >> just the trunk? >> >> Brian >> >> >> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: >> >>> I had similar problems on the trunk, which was fixed by Brian with >>> r14877. >>> >>> Perhaps 1.2 needs something similar? >>> >>> Tim >>> >>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: >>>> Per the teleconf last week, I have started to revamp the Cisco MTT >>>> infrastructure to do simplistic thread testing. Specifically, I'm >>>> building the OMPI trunk and v1.2 branches with "--with-threads -- >>>> enable-mpi-threads". >>>> >>>> I haven't switched this into my production MTT setup yet, but in >>>> the >>>> first trial runs, I'm noticing a segv in the test/threads/ >>>> opal_condition program. >>>> >>>> It seems that in the thr1 test on the v1.2 branch, when it calls >>>> opal_progress() underneath the condition variable wait, at some >>>> point >>>> in there current_base is getting to be NULL. Hence, the following >>>> segv's because the passed in value of "base" is NULL (event.c): >>>> >>>> int >>>> opal_event_base_loop(struct event_base *base, int flags) >>>> { >>>> const struct opal_eventop *evsel = base->evsel; >>>> ... >>>> >>>> Here's the full call stack: >>>> >>>> #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) >>>> at event.c:520 >>>> #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 >>>> #2 0x002a95599111 in opal_progress () at runtime/ >>>> opal_progress.c: >>>> 259 >>>> #3 0x004012c8 in opal_condition_wait (c=0x5025a0, >>>> m=0x502600) >>>> at ../../opal/threads/condition.h:81 >>>> #4 0x00401146 in thr1_run (obj=0x503110) at >>>> opal_condition.c:46 >>>> #5 0x0036e290610a in start_thread () from /lib64/tls/ >>>> libpthread.so.0 >>>> #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 >>>> #7 0x in ?? () >>>> >>>> This test seems to work fine on the trunk (at least, it didn't segv >>>> in my small number of trail runs). >>>> >>>> Is this a known problem in the 1.2 branch? Should I skip the >>>> thread >>>> testing on the 1.2 branch and concentrate on the trunk? >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] threaded builds
On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: I leave it to the thread subgroup to decide... Should we discuss on the call tomorrow? I don't have a strong opinion; I was just testing both because it was easy to do so. If we want to concentrate on the trunk, I can adjust my MTT setup. I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard. Seems we should only focus on the trunk. - Galen On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: Yes, this is a known issue. I don't know -- are we trying to make threads work on the 1.2 branch, or just the trunk? I had thought just the trunk? Brian On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 #2 0x002a95599111 in opal_progress () at runtime/ opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] threaded builds
I leave it to the thread subgroup to decide... Should we discuss on the call tomorrow? I don't have a strong opinion; I was just testing both because it was easy to do so. If we want to concentrate on the trunk, I can adjust my MTT setup. On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: Yes, this is a known issue. I don't know -- are we trying to make threads work on the 1.2 branch, or just the trunk? I had thought just the trunk? Brian On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 #2 0x002a95599111 in opal_progress () at runtime/ opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] threaded builds
I think that 1.2 is a lost cause in this regard - I thought we were just looking forward on the trunk. On 6/11/07 8:17 AM, "Brian Barrett" wrote: > Yes, this is a known issue. I don't know -- are we trying to make > threads work on the 1.2 branch, or just the trunk? I had thought > just the trunk? > > Brian > > > On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: > >> I had similar problems on the trunk, which was fixed by Brian with >> r14877. >> >> Perhaps 1.2 needs something similar? >> >> Tim >> >> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: >>> Per the teleconf last week, I have started to revamp the Cisco MTT >>> infrastructure to do simplistic thread testing. Specifically, I'm >>> building the OMPI trunk and v1.2 branches with "--with-threads -- >>> enable-mpi-threads". >>> >>> I haven't switched this into my production MTT setup yet, but in the >>> first trial runs, I'm noticing a segv in the test/threads/ >>> opal_condition program. >>> >>> It seems that in the thr1 test on the v1.2 branch, when it calls >>> opal_progress() underneath the condition variable wait, at some point >>> in there current_base is getting to be NULL. Hence, the following >>> segv's because the passed in value of "base" is NULL (event.c): >>> >>> int >>> opal_event_base_loop(struct event_base *base, int flags) >>> { >>> const struct opal_eventop *evsel = base->evsel; >>> ... >>> >>> Here's the full call stack: >>> >>> #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) >>> at event.c:520 >>> #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 >>> #2 0x002a95599111 in opal_progress () at runtime/ >>> opal_progress.c: >>> 259 >>> #3 0x004012c8 in opal_condition_wait (c=0x5025a0, >>> m=0x502600) >>> at ../../opal/threads/condition.h:81 >>> #4 0x00401146 in thr1_run (obj=0x503110) at >>> opal_condition.c:46 >>> #5 0x0036e290610a in start_thread () from /lib64/tls/ >>> libpthread.so.0 >>> #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 >>> #7 0x in ?? () >>> >>> This test seems to work fine on the trunk (at least, it didn't segv >>> in my small number of trail runs). >>> >>> Is this a known problem in the 1.2 branch? Should I skip the thread >>> testing on the 1.2 branch and concentrate on the trunk? >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] threaded builds
Yes, this is a known issue. I don't know -- are we trying to make threads work on the 1.2 branch, or just the trunk? I had thought just the trunk? Brian On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 #2 0x002a95599111 in opal_progress () at runtime/ opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] threaded builds
I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: > Per the teleconf last week, I have started to revamp the Cisco MTT > infrastructure to do simplistic thread testing. Specifically, I'm > building the OMPI trunk and v1.2 branches with "--with-threads -- > enable-mpi-threads". > > I haven't switched this into my production MTT setup yet, but in the > first trial runs, I'm noticing a segv in the test/threads/ > opal_condition program. > > It seems that in the thr1 test on the v1.2 branch, when it calls > opal_progress() underneath the condition variable wait, at some point > in there current_base is getting to be NULL. Hence, the following > segv's because the passed in value of "base" is NULL (event.c): > > int > opal_event_base_loop(struct event_base *base, int flags) > { > const struct opal_eventop *evsel = base->evsel; > ... > > Here's the full call stack: > > #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) > at event.c:520 > #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 > #2 0x002a95599111 in opal_progress () at runtime/opal_progress.c: > 259 > #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) > at ../../opal/threads/condition.h:81 > #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 > #5 0x0036e290610a in start_thread () from /lib64/tls/ > libpthread.so.0 > #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 > #7 0x in ?? () > > This test seems to work fine on the trunk (at least, it didn't segv > in my small number of trail runs). > > Is this a known problem in the 1.2 branch? Should I skip the thread > testing on the 1.2 branch and concentrate on the trunk?
[OMPI devel] threaded builds
Per the teleconf last week, I have started to revamp the Cisco MTT infrastructure to do simplistic thread testing. Specifically, I'm building the OMPI trunk and v1.2 branches with "--with-threads -- enable-mpi-threads". I haven't switched this into my production MTT setup yet, but in the first trial runs, I'm noticing a segv in the test/threads/ opal_condition program. It seems that in the thr1 test on the v1.2 branch, when it calls opal_progress() underneath the condition variable wait, at some point in there current_base is getting to be NULL. Hence, the following segv's because the passed in value of "base" is NULL (event.c): int opal_event_base_loop(struct event_base *base, int flags) { const struct opal_eventop *evsel = base->evsel; ... Here's the full call stack: #0 0x002a955a020e in opal_event_base_loop (base=0x0, flags=5) at event.c:520 #1 0x002a955a01f9 in opal_event_loop (flags=5) at event.c:514 #2 0x002a95599111 in opal_progress () at runtime/opal_progress.c: 259 #3 0x004012c8 in opal_condition_wait (c=0x5025a0, m=0x502600) at ../../opal/threads/condition.h:81 #4 0x00401146 in thr1_run (obj=0x503110) at opal_condition.c:46 #5 0x0036e290610a in start_thread () from /lib64/tls/ libpthread.so.0 #6 0x0036e1ec68c3 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () This test seems to work fine on the trunk (at least, it didn't segv in my small number of trail runs). Is this a known problem in the 1.2 branch? Should I skip the thread testing on the 1.2 branch and concentrate on the trunk? -- Jeff Squyres Cisco Systems