Re: [OMPI devel] Fwd: Open MPI 1.8: link problem when Fortran+C+Platform LSF

2014-10-17 Thread Paul Hargrove
I know of two possibilities:

1) I cannot be certain but since the message concerns a PC-relative
addressing mode, it is possible that something needs to be compiled with
-fPIC to fix the issue.  See if adding that option to any of the mpicc
commands helps.

2) Try adding ONE of "-ll", "-lfl" or "-lfl_pic" to include the lex/flex
support lib.   This is PROBABLY the wrong solution because that lib defines
its own "main()".

-Paul



On Fri, Oct 17, 2014 at 4:56 PM, Jeff Squyres (jsquyres)  wrote:

> I think the LSF part of this may be a red herring.  Do you really need to
> add "-lbat -llsf" to the command line to make it work?
>
> The error message *sounds* like y.tab.o was compiled differently than
> others...?  It's hard to know without seeing the output of mpicc --showme.
>
>
> On Oct 17, 2014, at 7:51 AM, Ralph Castain  wrote:
>
> > Forwarding this for Paul until his email address gets updated on the
> User list:
> >
> >> Begin forwarded message:
> >>
> >> Date: October 17, 2014 at 6:35:31 AM PDT
> >> From: Paul Kapinos 
> >> To: Open MPI Users 
> >> Cc: "Kapinos, Paul" , <
> fri...@cats.rwth-aachen.de>
> >> Subject: Open MPI 1.8: link problem when Fortran+C+Platform LSF
> >>
> >> Dear Open MPI developer,
> >>
> >> we have both Open MPI 1.6(.5) and 1.8(.3) in our cluster, configured to
> be used with Platform LSF.
> >>
> >> One of our users run into an issue when trying to link his code
> (combination of lex/C and Fortran) with v.1.8, whereby with OpenMPI/1.6er
> the code can be linked OK.
> >>
> >>> $ make
> >>> mpif90 -c main.f90
> >>> yacc -d example4.y
> >>> mpicc -c y.tab.c
> >>> mpicc -c mymain.c
> >>> lex example4.l
> >>> mpicc -c lex.yy.c
> >>> mpif90 -o example main.o y.tab.o mymain.o lex.yy.o
> >>> ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against
> symbol `yylval'
> >>> ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation
> against symbol `yyval'
> >>> ...
> >>
> >> looking into "mpif90 --show-me" let us see that the link line and
> possibly the philosophy behind it has been changed, there is also a note on
> it:
> >>
> >> # Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we
> >> # intentionally only link in the MPI libraries (ORTE, OPAL, etc. are
> >> # pulled in implicitly) because we intend MPI applications to only use
> >> # the MPI API.
> >>
> >>
> >>
> >>
> >> Well, by now we know two workarounds:
> >> a) add "-lbat -llsf" to the link line
> >> b) add " -Wl,--as-needed" to the link line
> >>
> >> What would be better? Maybe one of this should be added to
> linker_flags=..." in the .../share/openmpi/mpif90-wrapper-data.txt file? As
> of the note above, (b) would be better?
> >>
> >> Best
> >>
> >> Paul Kapinos
> >>
> >> P.S. $ mpif90 --show-me
> >>
> >> 1.6.5
> >> ifort -nofor-main -I/opt/MPI/openmpi-1.6.5/linux/intel/include
> -fexceptions -I/opt/MPI/openmpi-1.6.5/linux/intel/lib
> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib
> -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi_f90 -lmpi_f77 -lmpi
> -losmcomp -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf
> -ldl -lm -lnuma -lrt -lnsl -lutil
> >>
> >> 1.8.3
> >> ifort -I/opt/MPI/openmpi-1.8.3/linux/intel/include
> -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib
> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath
> -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath
> -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags
> -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
> >>
> >> P.S.2 $ man ld
> >> 
> >>   --as-needed
> >>   --no-as-needed
> >>   This option affects ELF DT_NEEDED tags for dynamic libraries
> >>   mentioned on the command line after the --as-needed option.
> >>   Normally the linker will add a DT_NEEDED tag for each dynamic
> >>   library mentioned on the command line, regardless of whether
> the
> >>   library is actually needed or not.  --as-needed causes a
> DT_NEEDED
> >>   tag to only be emitted for a library that satisfies an
> undefined
> >>   symbol reference from a regular object file or, if the
> library is
> >>   not found in the DT_NEEDED lists of other libraries linked up
> to
> >>   that point, an undefined symbol reference from another dynamic
> >>   library.  --no-as-needed restores the default behaviour.
> >>
> >> 
> >>
> >> --
> >> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> >> RWTH Aachen University, IT Center
> >> Seffenter Weg 23,  D 52074  Aachen (Germany)
> >> Tel: +49 241/80-24915
> >>
> >
> > 
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> 

Re: [OMPI devel] Fwd: Open MPI 1.8: link problem when Fortran+C+Platform LSF

2014-10-17 Thread Jeff Squyres (jsquyres)
I think the LSF part of this may be a red herring.  Do you really need to add 
"-lbat -llsf" to the command line to make it work?

The error message *sounds* like y.tab.o was compiled differently than 
others...?  It's hard to know without seeing the output of mpicc --showme.


On Oct 17, 2014, at 7:51 AM, Ralph Castain  wrote:

> Forwarding this for Paul until his email address gets updated on the User 
> list:
> 
>> Begin forwarded message:
>> 
>> Date: October 17, 2014 at 6:35:31 AM PDT
>> From: Paul Kapinos 
>> To: Open MPI Users 
>> Cc: "Kapinos, Paul" , 
>> 
>> Subject: Open MPI 1.8: link problem when Fortran+C+Platform LSF
>> 
>> Dear Open MPI developer,
>> 
>> we have both Open MPI 1.6(.5) and 1.8(.3) in our cluster, configured to be 
>> used with Platform LSF.
>> 
>> One of our users run into an issue when trying to link his code (combination 
>> of lex/C and Fortran) with v.1.8, whereby with OpenMPI/1.6er the code can be 
>> linked OK.
>> 
>>> $ make
>>> mpif90 -c main.f90
>>> yacc -d example4.y
>>> mpicc -c y.tab.c
>>> mpicc -c mymain.c
>>> lex example4.l
>>> mpicc -c lex.yy.c
>>> mpif90 -o example main.o y.tab.o mymain.o lex.yy.o
>>> ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against 
>>> symbol `yylval'
>>> ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation against 
>>> symbol `yyval'
>>> ...
>> 
>> looking into "mpif90 --show-me" let us see that the link line and possibly 
>> the philosophy behind it has been changed, there is also a note on it:
>> 
>> # Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we
>> # intentionally only link in the MPI libraries (ORTE, OPAL, etc. are
>> # pulled in implicitly) because we intend MPI applications to only use
>> # the MPI API.
>> 
>> 
>> 
>> 
>> Well, by now we know two workarounds:
>> a) add "-lbat -llsf" to the link line
>> b) add " -Wl,--as-needed" to the link line
>> 
>> What would be better? Maybe one of this should be added to linker_flags=..." 
>> in the .../share/openmpi/mpif90-wrapper-data.txt file? As of the note above, 
>> (b) would be better?
>> 
>> Best
>> 
>> Paul Kapinos
>> 
>> P.S. $ mpif90 --show-me
>> 
>> 1.6.5
>> ifort -nofor-main -I/opt/MPI/openmpi-1.6.5/linux/intel/include -fexceptions 
>> -I/opt/MPI/openmpi-1.6.5/linux/intel/lib 
>> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib 
>> -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi_f90 -lmpi_f77 -lmpi -losmcomp 
>> -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf -ldl -lm 
>> -lnuma -lrt -lnsl -lutil
>> 
>> 1.8.3
>> ifort -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions 
>> -I/opt/MPI/openmpi-1.8.3/linux/intel/lib 
>> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath 
>> -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath 
>> -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags 
>> -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 
>> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
>> 
>> P.S.2 $ man ld
>> 
>>   --as-needed
>>   --no-as-needed
>>   This option affects ELF DT_NEEDED tags for dynamic libraries
>>   mentioned on the command line after the --as-needed option.
>>   Normally the linker will add a DT_NEEDED tag for each dynamic
>>   library mentioned on the command line, regardless of whether the
>>   library is actually needed or not.  --as-needed causes a DT_NEEDED
>>   tag to only be emitted for a library that satisfies an undefined
>>   symbol reference from a regular object file or, if the library is
>>   not found in the DT_NEEDED lists of other libraries linked up to
>>   that point, an undefined symbol reference from another dynamic
>>   library.  --no-as-needed restores the default behaviour.
>> 
>> 
>> 
>> -- 
>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>> RWTH Aachen University, IT Center
>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>> Tel: +49 241/80-24915
>> 
> 
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Missing f08 binding for Win_allocate

2014-10-17 Thread Jeff Squyres (jsquyres)
Christoph -

Thanks for sending this patch. I've been on travel this week, which always 
makes my inbox a disaster - sorry I didn't reply earlier. 

You did the right thing by filing a PR against master. It allows the use of the 
nice GitHub code review tools. 

I've already replied on that PR; just wanted to close the loop here in the mail 
thread. 

Sent from my phone. No type good. 

> On Oct 15, 2014, at 7:00 AM, "Christoph Niethammer"  
> wrote:
> 
> Hello,
> 
> The f08 binding for Win_allocate is missing in master and 1.8 series.
> I fixed the problem based on master. The attached patch also works for 1.8.3.
> 
> I found some documentation in the wiki but I am not sure if this is intended 
> for small fixes like this as well.
> How shall I proceed to get this into master after the svn->git transition?
> * Open a bug first
> * fork + pull request or 
> * email patch from git format-patch to devel list?
> 
> Best regards
> Christoph Niethammer
> 
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> <0001-Add-missing-Fortran-binding-for-Win_allocate.patch>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16044.php


Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Hey, Lena :).

2014-10-17 22:07 GMT+07:00 Elena Elkina :

> Hi Artem,
>
> Actually some time ago there was a known issue with coll ml. I used to run
> my command lines with -mca coll ^ml to avoid these problems, so I don't
> know if it was fixed or not. It looks like you have the same problem.
>

but mine is with bcol, not coll framework. And as you can see modules
itself doesn't brake the program. Only some of their combinations. Also I
am curious why basesmuma module listed twice.



> Best regards,
> Elena
>
> On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov 
> wrote:
>
>> Gilles,
>>
>> I checked your patch and it doesn't solve the problem I observe. I think
>> the reason is somewhere else.
>>
>> 2014-10-17 19:13 GMT+07:00 Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com>:
>>
>>> Artem,
>>>
>>> There is a known issue #235 with modex and i made PR #238 with a
>>> tentative fix.
>>>
>>> Could you please give it a try and reports if it solves your problem ?
>>>
>>> Cheers
>>>
>>> Gilles
>>>
>>>
>>> Artem Polyakov  wrote:
>>> Hello, I have troubles with latest trunk if I use PMI1.
>>>
>>> For example, if I use 2 nodes the application hangs. See backtraces from
>>> both nodes below. From them I can see that second (non launching) node
>>> hangs in bcol component selection. Here is the default setting of
>>> bcol_base_string parameter:
>>> bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll,ugni"
>>> according to ompi_info. I don't know if it is correct that basesmuma is
>>> duplicated or not.
>>>
>>> Experiments with this parameter showed that it directly influences the
>>> bug:
>>> export OMPI_MCA_bcol_base_string="" #  [SEGFAULT]
>>> export OMPI_MCA_bcol_base_string="ptpcoll" #  [OK]
>>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll" #  [OK]
>>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload" #  [OK]
>>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload,ugni" #
>>>  [OK]
>>> export
>>> OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll,iboffload,ugni" #
>>>  [HANG]
>>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll"
>>> # [HANG]
>>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload" # [OK]
>>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ugni" #
>>> [OK]
>>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll" #  [HANG]
>>> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma" #  [OK]
>>> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma,basesmuma" #  [HANG]
>>>
>>> I can provide other information if nessesary.
>>>
>>> cn1:
>>> (gdb) bt
>>> 0  0x7fdebd30ac6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> 1  0x7fdebcca64e0 in poll_dispatch (base=0x1d466b0,
>>> tv=0x7fff71aab880) at poll.c:165
>>> 2  0x7fdebcc9b041 in opal_libevent2021_event_base_loop
>>> (base=0x1d466b0, flags=2) at event.c:1631
>>> 3  0x7fdebcc35891 in opal_progress () at runtime/opal_progress.c:169
>>> 4  0x7fdeb32f78cb in opal_condition_wait (c=0x7fdebdb51bc0
>>> , m=0x7fdebdb51cc0 ) at
>>> ../../../../opal/threads/condition.h:78
>>> 5  0x7fdeb32f79b8 in ompi_request_wait_completion
>>> (req=0x7fff71aab920) at ../../../../ompi/request/request.h:381
>>> 6  0x7fdeb32f84b8 in mca_pml_ob1_recv (addr=0x7fff71aabd80, count=1,
>>> datatype=0x6026c0 , src=1, tag=0, comm=0x6020a0
>>> ,
>>> status=0x7fff71aabd90) at pml_ob1_irecv.c:109
>>> 7  0x7fdebd88f54d in PMPI_Recv (buf=0x7fff71aabd80, count=1,
>>> type=0x6026c0 , source=1, tag=0, comm=0x6020a0
>>> ,
>>> status=0x7fff71aabd90) at precv.c:78
>>> 8  0x00400c44 in main (argc=1, argv=0x7fff71aabe98) at
>>> hellompi.c:33
>>>
>>> cn2:
>>> (gdb) bt
>>> 0  0x7fa65aa78c6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> 1  0x7fa65a4144e0 in poll_dispatch (base=0x20e96b0,
>>> tv=0x7fff46f44a80) at poll.c:165
>>> 2  0x7fa65a409041 in opal_libevent2021_event_base_loop
>>> (base=0x20e96b0, flags=2) at event.c:1631
>>> 3  0x7fa65a3a3891 in opal_progress () at runtime/opal_progress.c:169
>>> 4  0x7fa65afbbc25 in opal_condition_wait (c=0x7fa65b2bfbc0
>>> , m=0x7fa65b2bfcc0 ) at
>>> ../opal/threads/condition.h:78
>>> 5  0x7fa65afbc1b5 in ompi_request_default_wait_all (count=2,
>>> requests=0x7fff46f44c70, statuses=0x0) at request/req_wait.c:287
>>> 6  0x7fa65afc7906 in comm_allgather_pml (src_buf=0x7fff46f44da0,
>>> dest_buf=0x233dac0, count=288, dtype=0x7fa65b29fee0 ,
>>> my_rank_in_group=1,
>>> n_peers=2, ranks_in_comm=0x210a760, comm=0x6020a0
>>> ) at patterns/comm/allgather.c:250
>>> 7  0x7fa64f14ba08 in bcol_basesmuma_smcm_allgather_connection
>>> (sm_bcol_module=0x7fa64e64d010, module=0x232c800,
>>> peer_list=0x7fa64f3513e8 ,
>>> back_files=0x7fa64eae2690, comm=0x6020a0 , input=...,
>>> base_fname=0x7fa64f14ca8c "sm_ctl_mem_", map_all=false) at
>>> bcol_basesmuma_smcm.c:205
>>> 8  

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Elena Elkina
Hi Artem,

Actually some time ago there was a known issue with coll ml. I used to run
my command lines with -mca coll ^ml to avoid these problems, so I don't
know if it was fixed or not. It looks like you have the same problem.

Best regards,
Elena

On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov  wrote:

> Gilles,
>
> I checked your patch and it doesn't solve the problem I observe. I think
> the reason is somewhere else.
>
> 2014-10-17 19:13 GMT+07:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com>:
>
>> Artem,
>>
>> There is a known issue #235 with modex and i made PR #238 with a
>> tentative fix.
>>
>> Could you please give it a try and reports if it solves your problem ?
>>
>> Cheers
>>
>> Gilles
>>
>>
>> Artem Polyakov  wrote:
>> Hello, I have troubles with latest trunk if I use PMI1.
>>
>> For example, if I use 2 nodes the application hangs. See backtraces from
>> both nodes below. From them I can see that second (non launching) node
>> hangs in bcol component selection. Here is the default setting of
>> bcol_base_string parameter:
>> bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll,ugni"
>> according to ompi_info. I don't know if it is correct that basesmuma is
>> duplicated or not.
>>
>> Experiments with this parameter showed that it directly influences the
>> bug:
>> export OMPI_MCA_bcol_base_string="" #  [SEGFAULT]
>> export OMPI_MCA_bcol_base_string="ptpcoll" #  [OK]
>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll" #  [OK]
>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload" #  [OK]
>> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload,ugni" #
>>  [OK]
>> export
>> OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll,iboffload,ugni" #
>>  [HANG]
>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll"
>> # [HANG]
>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload" # [OK]
>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ugni" #
>> [OK]
>> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll" #  [HANG]
>> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma" #  [OK]
>> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma,basesmuma" #  [HANG]
>>
>> I can provide other information if nessesary.
>>
>> cn1:
>> (gdb) bt
>> 0  0x7fdebd30ac6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> 1  0x7fdebcca64e0 in poll_dispatch (base=0x1d466b0,
>> tv=0x7fff71aab880) at poll.c:165
>> 2  0x7fdebcc9b041 in opal_libevent2021_event_base_loop
>> (base=0x1d466b0, flags=2) at event.c:1631
>> 3  0x7fdebcc35891 in opal_progress () at runtime/opal_progress.c:169
>> 4  0x7fdeb32f78cb in opal_condition_wait (c=0x7fdebdb51bc0
>> , m=0x7fdebdb51cc0 ) at
>> ../../../../opal/threads/condition.h:78
>> 5  0x7fdeb32f79b8 in ompi_request_wait_completion
>> (req=0x7fff71aab920) at ../../../../ompi/request/request.h:381
>> 6  0x7fdeb32f84b8 in mca_pml_ob1_recv (addr=0x7fff71aabd80, count=1,
>> datatype=0x6026c0 , src=1, tag=0, comm=0x6020a0
>> ,
>> status=0x7fff71aabd90) at pml_ob1_irecv.c:109
>> 7  0x7fdebd88f54d in PMPI_Recv (buf=0x7fff71aabd80, count=1,
>> type=0x6026c0 , source=1, tag=0, comm=0x6020a0
>> ,
>> status=0x7fff71aabd90) at precv.c:78
>> 8  0x00400c44 in main (argc=1, argv=0x7fff71aabe98) at
>> hellompi.c:33
>>
>> cn2:
>> (gdb) bt
>> 0  0x7fa65aa78c6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> 1  0x7fa65a4144e0 in poll_dispatch (base=0x20e96b0,
>> tv=0x7fff46f44a80) at poll.c:165
>> 2  0x7fa65a409041 in opal_libevent2021_event_base_loop
>> (base=0x20e96b0, flags=2) at event.c:1631
>> 3  0x7fa65a3a3891 in opal_progress () at runtime/opal_progress.c:169
>> 4  0x7fa65afbbc25 in opal_condition_wait (c=0x7fa65b2bfbc0
>> , m=0x7fa65b2bfcc0 ) at
>> ../opal/threads/condition.h:78
>> 5  0x7fa65afbc1b5 in ompi_request_default_wait_all (count=2,
>> requests=0x7fff46f44c70, statuses=0x0) at request/req_wait.c:287
>> 6  0x7fa65afc7906 in comm_allgather_pml (src_buf=0x7fff46f44da0,
>> dest_buf=0x233dac0, count=288, dtype=0x7fa65b29fee0 ,
>> my_rank_in_group=1,
>> n_peers=2, ranks_in_comm=0x210a760, comm=0x6020a0
>> ) at patterns/comm/allgather.c:250
>> 7  0x7fa64f14ba08 in bcol_basesmuma_smcm_allgather_connection
>> (sm_bcol_module=0x7fa64e64d010, module=0x232c800,
>> peer_list=0x7fa64f3513e8 ,
>> back_files=0x7fa64eae2690, comm=0x6020a0 , input=...,
>> base_fname=0x7fa64f14ca8c "sm_ctl_mem_", map_all=false) at
>> bcol_basesmuma_smcm.c:205
>> 8  0x7fa64f146525 in base_bcol_basesmuma_setup_ctl
>> (sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
>> ) at bcol_basesmuma_setup.c:344
>> 9  0x7fa64f146cbb in base_bcol_basesmuma_setup_library_buffers
>> (sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
>> )
>> at bcol_basesmuma_setup.c:550
>> 10 0x7fa64f1418d0 in mca_bcol_basesmuma_comm_query (module=0x232c800,
>> num_modules=0x232e570) at 

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Gilles,

I checked your patch and it doesn't solve the problem I observe. I think
the reason is somewhere else.

2014-10-17 19:13 GMT+07:00 Gilles Gouaillardet <
gilles.gouaillar...@gmail.com>:

> Artem,
>
> There is a known issue #235 with modex and i made PR #238 with a tentative
> fix.
>
> Could you please give it a try and reports if it solves your problem ?
>
> Cheers
>
> Gilles
>
>
> Artem Polyakov  wrote:
> Hello, I have troubles with latest trunk if I use PMI1.
>
> For example, if I use 2 nodes the application hangs. See backtraces from
> both nodes below. From them I can see that second (non launching) node
> hangs in bcol component selection. Here is the default setting of
> bcol_base_string parameter:
> bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll,ugni"
> according to ompi_info. I don't know if it is correct that basesmuma is
> duplicated or not.
>
> Experiments with this parameter showed that it directly influences the bug:
> export OMPI_MCA_bcol_base_string="" #  [SEGFAULT]
> export OMPI_MCA_bcol_base_string="ptpcoll" #  [OK]
> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll" #  [OK]
> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload" #  [OK]
> export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload,ugni" #  [OK]
> export
> OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll,iboffload,ugni" #
>  [HANG]
> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll" #
> [HANG]
> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload" # [OK]
> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ugni" #
> [OK]
> export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll" #  [HANG]
> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma" #  [OK]
> export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma,basesmuma" #  [HANG]
>
> I can provide other information if nessesary.
>
> cn1:
> (gdb) bt
> 0  0x7fdebd30ac6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
> 1  0x7fdebcca64e0 in poll_dispatch (base=0x1d466b0, tv=0x7fff71aab880)
> at poll.c:165
> 2  0x7fdebcc9b041 in opal_libevent2021_event_base_loop
> (base=0x1d466b0, flags=2) at event.c:1631
> 3  0x7fdebcc35891 in opal_progress () at runtime/opal_progress.c:169
> 4  0x7fdeb32f78cb in opal_condition_wait (c=0x7fdebdb51bc0
> , m=0x7fdebdb51cc0 ) at
> ../../../../opal/threads/condition.h:78
> 5  0x7fdeb32f79b8 in ompi_request_wait_completion (req=0x7fff71aab920)
> at ../../../../ompi/request/request.h:381
> 6  0x7fdeb32f84b8 in mca_pml_ob1_recv (addr=0x7fff71aabd80, count=1,
> datatype=0x6026c0 , src=1, tag=0, comm=0x6020a0
> ,
> status=0x7fff71aabd90) at pml_ob1_irecv.c:109
> 7  0x7fdebd88f54d in PMPI_Recv (buf=0x7fff71aabd80, count=1,
> type=0x6026c0 , source=1, tag=0, comm=0x6020a0
> ,
> status=0x7fff71aabd90) at precv.c:78
> 8  0x00400c44 in main (argc=1, argv=0x7fff71aabe98) at
> hellompi.c:33
>
> cn2:
> (gdb) bt
> 0  0x7fa65aa78c6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
> 1  0x7fa65a4144e0 in poll_dispatch (base=0x20e96b0, tv=0x7fff46f44a80)
> at poll.c:165
> 2  0x7fa65a409041 in opal_libevent2021_event_base_loop
> (base=0x20e96b0, flags=2) at event.c:1631
> 3  0x7fa65a3a3891 in opal_progress () at runtime/opal_progress.c:169
> 4  0x7fa65afbbc25 in opal_condition_wait (c=0x7fa65b2bfbc0
> , m=0x7fa65b2bfcc0 ) at
> ../opal/threads/condition.h:78
> 5  0x7fa65afbc1b5 in ompi_request_default_wait_all (count=2,
> requests=0x7fff46f44c70, statuses=0x0) at request/req_wait.c:287
> 6  0x7fa65afc7906 in comm_allgather_pml (src_buf=0x7fff46f44da0,
> dest_buf=0x233dac0, count=288, dtype=0x7fa65b29fee0 ,
> my_rank_in_group=1,
> n_peers=2, ranks_in_comm=0x210a760, comm=0x6020a0
> ) at patterns/comm/allgather.c:250
> 7  0x7fa64f14ba08 in bcol_basesmuma_smcm_allgather_connection
> (sm_bcol_module=0x7fa64e64d010, module=0x232c800,
> peer_list=0x7fa64f3513e8 ,
> back_files=0x7fa64eae2690, comm=0x6020a0 , input=...,
> base_fname=0x7fa64f14ca8c "sm_ctl_mem_", map_all=false) at
> bcol_basesmuma_smcm.c:205
> 8  0x7fa64f146525 in base_bcol_basesmuma_setup_ctl
> (sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
> ) at bcol_basesmuma_setup.c:344
> 9  0x7fa64f146cbb in base_bcol_basesmuma_setup_library_buffers
> (sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
> )
> at bcol_basesmuma_setup.c:550
> 10 0x7fa64f1418d0 in mca_bcol_basesmuma_comm_query (module=0x232c800,
> num_modules=0x232e570) at bcol_basesmuma_module.c:532
> 11 0x7fa64fd9e5f2 in mca_coll_ml_tree_hierarchy_discovery
> (ml_module=0x232fbe0, topo=0x232fd98, n_hierarchies=3,
> exclude_sbgp_name=0x0, include_sbgp_name=0x0)
> at coll_ml_module.c:1964
> 12 0x7fa64fd9f3a3 in mca_coll_ml_fulltree_hierarchy_discovery
> (ml_module=0x232fbe0, n_hierarchies=3) at coll_ml_module.c:2211
> 13 0x7fa64fd9cbe4 in ml_discover_hierarchy (ml_module=0x232fbe0) at
> 

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-118-ge3be1fb

2014-10-17 Thread Jeff Squyres (jsquyres)
Oy!

Thanks for removing that debug opal_output...  :-)


On Oct 17, 2014, at 7:39 AM,   wrote:

> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  e3be1fb9a5f8e3c5a5234e86a88f616a40d2cab8 (commit)
>  from  f9d620e3a772cdeddd40b4f0789cf59c75b44868 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/e3be1fb9a5f8e3c5a5234e86a88f616a40d2cab8
> 
> commit e3be1fb9a5f8e3c5a5234e86a88f616a40d2cab8
> Author: Aurélien Bouteiller 
> Date:   Fri Oct 17 10:38:35 2014 -0400
> 
>Quick pass over the sm-knem code, indent fixes
> 
> diff --git a/opal/mca/btl/sm/btl_sm.c b/opal/mca/btl/sm/btl_sm.c
> index 1f61028..d0cc950 100644
> --- a/opal/mca/btl/sm/btl_sm.c
> +++ b/opal/mca/btl/sm/btl_sm.c
> @@ -395,15 +395,15 @@ sm_btl_first_time_init(mca_btl_sm_t *sm_btl,
> return i;
> 
> i = ompi_free_list_init_new(_btl_sm_component.sm_frags_user, 
> - sizeof(mca_btl_sm_user_t),
> - opal_cache_line_size, OBJ_CLASS(mca_btl_sm_user_t),
> - sizeof(mca_btl_sm_hdr_t), opal_cache_line_size,
> - mca_btl_sm_component.sm_free_list_num,
> - mca_btl_sm_component.sm_free_list_max,
> - mca_btl_sm_component.sm_free_list_inc,
> - mca_btl_sm_component.sm_mpool);
> +sizeof(mca_btl_sm_user_t),
> +opal_cache_line_size, OBJ_CLASS(mca_btl_sm_user_t),
> +sizeof(mca_btl_sm_hdr_t), opal_cache_line_size,
> +mca_btl_sm_component.sm_free_list_num,
> +mca_btl_sm_component.sm_free_list_max,
> +mca_btl_sm_component.sm_free_list_inc,
> +mca_btl_sm_component.sm_mpool);
> if ( OPAL_SUCCESS != i )
> - return i;   
> +return i;   
> 
> mca_btl_sm_component.num_outstanding_frags = 0;
> 
> @@ -1000,14 +1000,14 @@ int mca_btl_sm_send( struct mca_btl_base_module_t* 
> btl,
> 
> #if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA
> struct mca_btl_base_descriptor_t* mca_btl_sm_prepare_dst( 
> - struct mca_btl_base_module_t* btl,
> - struct mca_btl_base_endpoint_t* endpoint,
> - struct mca_mpool_base_registration_t* registration,
> - struct opal_convertor_t* convertor,
> - uint8_t order,
> - size_t reserve,
> - size_t* size,
> - uint32_t flags)
> +struct mca_btl_base_module_t* btl,
> +struct mca_btl_base_endpoint_t* endpoint,
> +struct mca_mpool_base_registration_t* registration,
> +struct opal_convertor_t* convertor,
> +uint8_t order,
> +size_t reserve,
> +size_t* size,
> +uint32_t flags)
> {
> void *ptr;
> mca_btl_sm_frag_t* frag;
> diff --git a/opal/mca/btl/sm/btl_sm.h b/opal/mca/btl/sm/btl_sm.h
> index 358bed5..fd7271f 100644
> --- a/opal/mca/btl/sm/btl_sm.h
> +++ b/opal/mca/btl/sm/btl_sm.h
> @@ -505,19 +505,19 @@ extern int mca_btl_sm_send(
>  * Synchronous knem/cma get
>  */
> extern int mca_btl_sm_get_sync(
> - struct mca_btl_base_module_t* btl,
> - struct mca_btl_base_endpoint_t* endpoint,
> - struct mca_btl_base_descriptor_t* des );
> +struct mca_btl_base_module_t* btl,
> +struct mca_btl_base_endpoint_t* endpoint,
> +struct mca_btl_base_descriptor_t* des );
> 
> extern struct mca_btl_base_descriptor_t* mca_btl_sm_prepare_dst(
> - struct mca_btl_base_module_t* btl,
> - struct mca_btl_base_endpoint_t* endpoint,
> - struct mca_mpool_base_registration_t* registration,
> - struct opal_convertor_t* convertor,
> - uint8_t order,
> - size_t reserve,
> - size_t* size,
> - uint32_t flags);
> +struct mca_btl_base_module_t* btl,
> +struct mca_btl_base_endpoint_t* endpoint,
> +struct mca_mpool_base_registration_t* registration,
> +struct opal_convertor_t* convertor,
> +uint8_t order,
> +size_t reserve,
> +size_t* size,
> +uint32_t flags);
> #endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */
> 
> #if OPAL_BTL_SM_HAVE_KNEM
> diff --git a/opal/mca/btl/sm/btl_sm_component.c 
> b/opal/mca/btl/sm/btl_sm_component.c
> index 37bc9f7..4f06742 100644
> --- a/opal/mca/btl/sm/btl_sm_component.c
> +++ b/opal/mca/btl/sm/btl_sm_component.c
> @@ -168,7 +168,7 @@ static int sm_register(void)
>MCA_BASE_VAR_SCOPE_CONSTANT,
>   

Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jed Brown
"Jeff Squyres (jsquyres)"  writes:

> Meaning: we deliberately chose not to change the development style of
> the community to "develop on release branch" when we moved to git.

Understood.  It's your choice, but workflow is a big feature of Git.

>> Seems to me that most people cloning open-mpi/ompi will want to fetch
>> From open-mpi/ompi-release regularly so they can have this context.
>
> Agreed.
>
> If github would implement per-branch push ACLs, then we'd squash down
> to a single repo, and all this would be easier.
>
> But given the relative inexperience with git in our community (which
> is noticeable via some mistakes on the ompi repo already!) and our
> history of only allowing regulated commits to release branches, we
> chose the (admittedly somewhat awkward) 2-repo model.

You could push release tags to open-mpi/ompi without pushing the branches.

>> and it deprives you of context when you
>> have no idea whether "dev-BIGNUMBER" is earlier or later than a given
>> release.  (Does it have those features/bugs or not?)
>
> Even if OMPI was just in one git repo, the number of commits on master
> since dev is unrelated to a given release.

If integration branches were merged upward, "git describe" would yield
names like v1.8.3-84-g51a7c90, which tells you immediately that it's 84
commits "ahead" of v1.8.3.

> Put differently: the dev tag is solely for ordering of nightly snapshot 
> tarballs.

It affects git describe output as a side-effect and when someone writes
the mailing list with a bug in a year-old nightly snapshot, you'll need
to query the repository (or have a better memory than me) to have any
idea what they're working with.  Perhaps you are blessed with users that
don't do things like this.


pgps5ALZ4CF9h.pgp
Description: PGP signature


[OMPI devel] Fwd: Open MPI 1.8: link problem when Fortran+C+Platform LSF

2014-10-17 Thread Ralph Castain
Forwarding this for Paul until his email address gets updated on the User list:Begin forwarded message:Date: October 17, 2014 at 6:35:31 AM PDTFrom: Paul Kapinos To: Open MPI Users Cc: "Kapinos, Paul" , Subject: Open MPI 1.8: link problem when Fortran+C+Platform LSFDear Open MPI developer,we have both Open MPI 1.6(.5) and 1.8(.3) in our cluster, configured to be used with Platform LSF.One of our users run into an issue when trying to link his code (combination of lex/C and Fortran) with v.1.8, whereby with OpenMPI/1.6er the code can be linked OK. > $ make > mpif90 -c main.f90 > yacc -d example4.y > mpicc -c y.tab.c > mpicc -c mymain.c > lex example4.l > mpicc -c lex.yy.c > mpif90 -o example main.o y.tab.o mymain.o lex.yy.o > ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against symbol `yylval' > ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation against symbol `yyval' > ...looking into "mpif90 --show-me" let us see that the link line and possibly the philosophy behind it has been changed, there is also a note on it:# Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we# intentionally only link in the MPI libraries (ORTE, OPAL, etc. are# pulled in implicitly) because we intend MPI applications to only use# the MPI API.Well, by now we know two workarounds:a) add "-lbat -llsf" to the link lineb) add " -Wl,--as-needed" to the link lineWhat would be better? Maybe one of this should be added to linker_flags=..." in the .../share/openmpi/mpif90-wrapper-data.txt file? As of the note above, (b) would be better?BestPaul KapinosP.S. $ mpif90 --show-me1.6.5ifort -nofor-main -I/opt/MPI/openmpi-1.6.5/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.6.5/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi_f90 -lmpi_f77 -lmpi -losmcomp -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf -ldl -lm -lnuma -lrt -lnsl -lutil1.8.3ifort -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpiP.S.2 $ man ld    --as-needed    --no-as-needed    This option affects ELF DT_NEEDED tags for dynamic libraries    mentioned on the command line after the --as-needed option.    Normally the linker will add a DT_NEEDED tag for each dynamic    library mentioned on the command line, regardless of whether the    library is actually needed or not.  --as-needed causes a DT_NEEDED    tag to only be emitted for a library that satisfies an undefined    symbol reference from a regular object file or, if the library is    not found in the DT_NEEDED lists of other libraries linked up to    that point, an undefined symbol reference from another dynamic    library.  --no-as-needed restores the default behaviour.-- Dipl.-Inform. Paul Kapinos   -   High Performance Computing,RWTH Aachen University, IT CenterSeffenter Weg 23,  D 52074  Aachen (Germany)Tel: +49 241/80-24915<>


Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Ralph Castain
I hear what you are saying - we just beg to disagree :-)

As Jeff said, a lot of it is history, but we’ve found this method works well 
for us and we’ve chosen to continue it.


> On Oct 17, 2014, at 7:35 AM, Jed Brown  wrote:
> 
> Ralph Castain  writes:
>> We go the other direction - all code must be committed to master so it
>> can “soak” prior to moving to a release branch. 
> 
> Maybe we're miscommunicating.  Normal lifecycle for a bug fix is
> 
>  # start from oldest maintenance branch to which the fix is relevant
>  git checkout -b jed/bug-fix maint
>  ... fix bug, commit, submit pull request, review, revise if needed
> 
>  # [optional] If using 'next' for eager users/testing prior to 'master'
>  git checkout next
>  git merge jed/bug-fix
>  ... "eager" users test this bug fix in combination with everything else
> 
>  # feature graduates to 'master'
>  git checkout master
>  git merge jed/bug-fix
>  ... users of 'master' interact with bug fix, bringing more confidence
> 
>  # feature merged to release/maintenance branch
>  git checkout maint
>  git merge jed/bug-fix
> 
> 
> At the end of the day, there is only one commit effecting the change and
> we can easily check who has it.  If someone else needs the fix in their
> development branch, they can get it without side-effects by merging
> jed/bug-fix.  The 'next' branch, which is entirely optional, helps
> further stabilize 'master' - bugs in 'master' *disrupt other developers*
> while bugs in 'next' only disrupt testing.  Here's a diagram:
> 
>  
> https://docs.google.com/drawings/d/1hvwyCIw4Wq3NoRrPpWfPTriJn5MM2_QkaacAaql8FQE/edit
> 
> The "merging upward" is about
> 
>  git checkout master
>  git merge maint
>  git checkout next
>  git merge master
> 
> These merges rarely contain non-merge commits (the topic branches were
> already merged to 'next' and then 'master'); they just tie up the graph
> so that subsequent merges only contain the "interesting" content.
> 
> 
> If instead, your workflow has you committing a bug-fix on 'master', you
> have to rebase/cherry-pick it to get it into a release branch without
> side-effects.  Now your repository has two commits that do the same
> thing.  A workflow should make it easy to get bug-fixes and features to
> different audiences (e.g., release users versus developers) without
> side-effects and without duplication.
> 
>> The “upward” methodology works fine for stable situations where the
>> master isn’t changing much relative to the releases, 
> 
> I disagree.  There are many agile projects that merge upward using topic
> branches, as described above.



Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jed Brown
Ralph Castain  writes:
> We go the other direction - all code must be committed to master so it
> can “soak” prior to moving to a release branch. 

Maybe we're miscommunicating.  Normal lifecycle for a bug fix is

  # start from oldest maintenance branch to which the fix is relevant
  git checkout -b jed/bug-fix maint
  ... fix bug, commit, submit pull request, review, revise if needed

  # [optional] If using 'next' for eager users/testing prior to 'master'
  git checkout next
  git merge jed/bug-fix
  ... "eager" users test this bug fix in combination with everything else

  # feature graduates to 'master'
  git checkout master
  git merge jed/bug-fix
  ... users of 'master' interact with bug fix, bringing more confidence

  # feature merged to release/maintenance branch
  git checkout maint
  git merge jed/bug-fix


At the end of the day, there is only one commit effecting the change and
we can easily check who has it.  If someone else needs the fix in their
development branch, they can get it without side-effects by merging
jed/bug-fix.  The 'next' branch, which is entirely optional, helps
further stabilize 'master' - bugs in 'master' *disrupt other developers*
while bugs in 'next' only disrupt testing.  Here's a diagram:

  
https://docs.google.com/drawings/d/1hvwyCIw4Wq3NoRrPpWfPTriJn5MM2_QkaacAaql8FQE/edit

The "merging upward" is about

  git checkout master
  git merge maint
  git checkout next
  git merge master

These merges rarely contain non-merge commits (the topic branches were
already merged to 'next' and then 'master'); they just tie up the graph
so that subsequent merges only contain the "interesting" content.


If instead, your workflow has you committing a bug-fix on 'master', you
have to rebase/cherry-pick it to get it into a release branch without
side-effects.  Now your repository has two commits that do the same
thing.  A workflow should make it easy to get bug-fixes and features to
different audiences (e.g., release users versus developers) without
side-effects and without duplication.

> The “upward” methodology works fine for stable situations where the
> master isn’t changing much relative to the releases, 

I disagree.  There are many agile projects that merge upward using topic
branches, as described above.


pgpW_Tq0T84ls.pgp
Description: PGP signature


Re: [OMPI devel] ORTE headers in OPAL source

2014-10-17 Thread Adrian Reber
Josh,

I had a look at the code (e.g., opal/mca/btl/sm/btl_sm.c) and there are
two uses of orte code:

if (orte_cr_continue_like_restart)

and

 /* On restart we need the old file names to exist (not necessarily
  * contain content) so the CRS component does not fail when  searching
  * for these old file handles. The restart procedure will make sure
  * these files get cleaned up appropriately.
  */
 orte_sstore.set_attr(orte_sstore_handle_current,
  SSTORE_METADATA_LOCAL_TOUCH,
  mca_btl_sm_component.sm_seg->shmem_ds.seg_name);


Do you have an idea how to fix those two? The first variable
orte_cr_continue_like_restart could probably be moved but I am not sure
how to handle the sstore call.

Adrian


On Sat, Aug 09, 2014 at 08:46:31AM -0500, Josh Hursey wrote:
> Those calls should be protected with the CR FT #define - If I remember
> correctly. We were using the sstore to track the shared memory file names
> so we could clean them up on restart.
> 
> I'm not sure if the sstore framework is necessary in this location, since
> we should be able to tell opal_crs and it will do the right thing. I can
> try to look at it early next week if someone doesn't get to it before then.
> 
> -- Josh
> 
> 
> 
> On Sat, Aug 9, 2014 at 7:06 AM, Jeff Squyres (jsquyres) 
> wrote:
> 
> > I think you're making a joke, right...?
> >
> > I see direct calls to ORTE sstore functionality in all three.
> >
> >
> >
> >
> > On Aug 8, 2014, at 5:42 PM, George Bosilca  wrote:
> >
> > > These are harmless. They are only used when FT is enabled which should
> > rarely be the case.
> > >
> > >   George.
> > >
> > >
> > >
> > > On Fri, Aug 8, 2014 at 4:36 PM, Jeff Squyres (jsquyres) <
> > jsquy...@cisco.com> wrote:
> > > Here's a few ORTE headers in OPAL source -- can respective owners clean
> > these up?  Thanks.
> > >
> > > -
> > > mca/btl/smcuda/btl_smcuda.c
> > > 63:#include "orte/mca/sstore/sstore.h"
> > >
> > > mca/btl/sm/btl_sm.c
> > > 62:#include "orte/mca/sstore/sstore.h"
> > >
> > > mca/mpool/sm/mpool_sm_module.c
> > > 34:#include "orte/mca/sstore/sstore.h"
> > > -
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15570.php
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15571.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15587.php
> >
> 
> 
> 
> -- 
> Joshua Hursey
> Assistant Professor of Computer Science
> University of Wisconsin-La Crosse
> http://cs.uwlax.edu/~jjhursey

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15588.php


Adrian

-- 
Adrian Reber http://lisas.de/~adrian/
ink, n.:
A villainous compound of tannogallate of iron, gum-arabic,
and water, chiefly used to facilitate the infection of
idiocy and promote intellectual crime.
-- H.L. Mencken


Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jeff Squyres (jsquyres)
On Oct 17, 2014, at 6:41 AM, Jed Brown  wrote:

>> The ompi repo only contains the master branch.  Releases are not made
>> from master, and therefore it doesn't make sense to tag it with
>> release tags.  master is therefore not (directly) related to any given
>> release.
> 
> You can have tags in the repository without the branches, though I think
> it's useful for a contributor to
> 
>  git checkout -b my/bug-fix maint-1.8
> 
> instead of making a patch off 'master' that needs to be back-ported to
> the release.  The usual model is that one merges "upward" from the
> maintenance branches to 'master'.

We talked about this when we converted to git.  For better or for worse, this 
model is just not how the OMPI community has worked.  We typically develop on 
the trunk/master and then port to the release branches.  Developing directly on 
a release branch usually only occurs when there's a bug that only exists on the 
release branch (and not on trunk/master).

Meaning: we deliberately chose not to change the development style of the 
community to "develop on release branch" when we moved to git.

> But regardless, isn't it valuable to be able to query things like this?
> 
>  git log v1.8.0..master ompi/mpi/c/iallreduce.c
> 
>  git branch -r --contains $bug_fix_commit
> 
> Seems to me that most people cloning open-mpi/ompi will want to fetch
> From open-mpi/ompi-release regularly so they can have this context.

Agreed.

If github would implement per-branch push ACLs, then we'd squash down to a 
single repo, and all this would be easier.

But given the relative inexperience with git in our community (which is 
noticeable via some mistakes on the ompi repo already!) and our history of only 
allowing regulated commits to release branches, we chose the (admittedly  
somewhat awkward) 2-repo model.

> That number will get big later

Of course.

> and it deprives you of context when you
> have no idea whether "dev-BIGNUMBER" is earlier or later than a given
> release.  (Does it have those features/bugs or not?)

Even if OMPI was just in one git repo, the number of commits on master since 
dev is unrelated to a given release.

Put differently: the dev tag is solely for ordering of nightly snapshot 
tarballs.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Ralph Castain

> On Oct 17, 2014, at 6:41 AM, Jed Brown  wrote:
> 
> "Jeff Squyres (jsquyres)"  writes:
> 
>> The ompi repo only contains the master branch.  Releases are not made
>> from master, and therefore it doesn't make sense to tag it with
>> release tags.  master is therefore not (directly) related to any given
>> release.
> 
> You can have tags in the repository without the branches, though I think
> it's useful for a contributor to
> 
>  git checkout -b my/bug-fix maint-1.8
> 
> instead of making a patch off 'master' that needs to be back-ported to
> the release.  The usual model is that one merges "upward" from the
> maintenance branches to 'master’.

We go the other direction - all code must be committed to master so it can 
“soak” prior to moving to a release branch. The “upward” methodology works fine 
for stable situations where the master isn’t changing much relative to the 
releases, but that generally isn’t the case for OMPI.

> 
> But regardless, isn't it valuable to be able to query things like this?
> 
>  git log v1.8.0..master ompi/mpi/c/iallreduce.c
> 
>  git branch -r --contains $bug_fix_commit

Not really much different than we do today - the wiki explains how to make it 
work.

> 
> Seems to me that most people cloning open-mpi/ompi will want to fetch
> From open-mpi/ompi-release regularly so they can have this context.
> 
>> The "dev" tag is there so that we can make nightly tarballs with a
>> logical sequence (via "git describe").  The "dev" tag is basically
>> there as the point at which we converted to git.  We could have put it
>> back at the beginning of time (i.e., equivalent to SVN r1 (i.e., the
>> first CVS commit!)), but it didn't really matter, so we just opted for
>> a dev that resulted in a smaller "git describe" number.
> 
> That number will get big later and it deprives you of context when you
> have no idea whether "dev-BIGNUMBER" is earlier or later than a given
> release.  (Does it have those features/bugs or not?)

Not sure this has ever been an issue before, to be honest.

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16058.php



Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jed Brown
"Jeff Squyres (jsquyres)"  writes:

> The ompi repo only contains the master branch.  Releases are not made
> from master, and therefore it doesn't make sense to tag it with
> release tags.  master is therefore not (directly) related to any given
> release.

You can have tags in the repository without the branches, though I think
it's useful for a contributor to

  git checkout -b my/bug-fix maint-1.8

instead of making a patch off 'master' that needs to be back-ported to
the release.  The usual model is that one merges "upward" from the
maintenance branches to 'master'.

But regardless, isn't it valuable to be able to query things like this?

  git log v1.8.0..master ompi/mpi/c/iallreduce.c

  git branch -r --contains $bug_fix_commit

Seems to me that most people cloning open-mpi/ompi will want to fetch
From open-mpi/ompi-release regularly so they can have this context.

> The "dev" tag is there so that we can make nightly tarballs with a
> logical sequence (via "git describe").  The "dev" tag is basically
> there as the point at which we converted to git.  We could have put it
> back at the beginning of time (i.e., equivalent to SVN r1 (i.e., the
> first CVS commit!)), but it didn't really matter, so we just opted for
> a dev that resulted in a smaller "git describe" number.

That number will get big later and it deprives you of context when you
have no idea whether "dev-BIGNUMBER" is earlier or later than a given
release.  (Does it have those features/bugs or not?)


pgpWLBl3jKfrr.pgp
Description: PGP signature


Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jeff Squyres (jsquyres)
On Oct 17, 2014, at 6:05 AM, Jed Brown  wrote:

> You can get them locally by fetching from open-mpi/ompi-release, but the
> only tag in open-mpi/ompi is called "dev" and on a seemingly arbitrary
> commit.  Isn't that awkward already, and more so with each passing year?
> Release tags in the development repository are useful to determine which
> released versions have a feature or bug.

I'm not sure what you're asking.

Releases are cut from the release branches, which are in the ompi-release repo. 
 They are tagged appropriately for each release.

The ompi repo only contains the master branch.  Releases are not made from 
master, and therefore it doesn't make sense to tag it with release tags.  
master is therefore not (directly) related to any given release.

The "dev" tag is there so that we can make nightly tarballs with a logical 
sequence (via "git describe").  The "dev" tag is basically there as the point 
at which we converted to git.  We could have put it back at the beginning of 
time (i.e., equivalent to SVN r1 (i.e., the first CVS commit!)), but it didn't 
really matter, so we just opted for a dev that resulted in a smaller "git 
describe" number.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-17 Thread Jed Brown
You can get them locally by fetching from open-mpi/ompi-release, but the
only tag in open-mpi/ompi is called "dev" and on a seemingly arbitrary
commit.  Isn't that awkward already, and more so with each passing year?
Release tags in the development repository are useful to determine which
released versions have a feature or bug.


pgprapt_IUva_.pgp
Description: PGP signature


Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Gilles Gouaillardet
Artem,

There is a known issue #235 with modex and i made PR #238 with a tentative fix.

Could you please give it a try and reports if it solves your problem ?

Cheers

Gilles

Artem Polyakov  wrote:
>Hello, I have troubles with latest trunk if I use PMI1.
>
>
>For example, if I use 2 nodes the application hangs. See backtraces from both 
>nodes below. From them I can see that second (non launching) node hangs in 
>bcol component selection. Here is the default setting of bcol_base_string 
>parameter:
>
>bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll,ugni"
>
>according to ompi_info. I don't know if it is correct that basesmuma is 
>duplicated or not.
>
>
>Experiments with this parameter showed that it directly influences the bug:
>
>export OMPI_MCA_bcol_base_string="" #  [SEGFAULT]
>
>export OMPI_MCA_bcol_base_string="ptpcoll" #  [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll" #  [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload" #  [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload,ugni" #  [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll,iboffload,ugni" 
>#  [HANG]
>
>export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll" # 
>[HANG]
>
>export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload" # [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ugni" # [OK]
>
>export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll" #  [HANG]
>
>export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma" #  [OK]
>
>export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma,basesmuma" #  [HANG]
>
>
>I can provide other information if nessesary.
>
>
>cn1:
>
>(gdb) bt
>
>0  0x7fdebd30ac6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>
>1  0x7fdebcca64e0 in poll_dispatch (base=0x1d466b0, tv=0x7fff71aab880) at 
>poll.c:165
>
>2  0x7fdebcc9b041 in opal_libevent2021_event_base_loop (base=0x1d466b0, 
>flags=2) at event.c:1631
>
>3  0x7fdebcc35891 in opal_progress () at runtime/opal_progress.c:169
>
>4  0x7fdeb32f78cb in opal_condition_wait (c=0x7fdebdb51bc0 
>, m=0x7fdebdb51cc0 ) at 
>../../../../opal/threads/condition.h:78
>
>5  0x7fdeb32f79b8 in ompi_request_wait_completion (req=0x7fff71aab920) at 
>../../../../ompi/request/request.h:381
>
>6  0x7fdeb32f84b8 in mca_pml_ob1_recv (addr=0x7fff71aabd80, count=1, 
>datatype=0x6026c0 , src=1, tag=0, comm=0x6020a0 
>, 
>
>    status=0x7fff71aabd90) at pml_ob1_irecv.c:109
>
>7  0x7fdebd88f54d in PMPI_Recv (buf=0x7fff71aabd80, count=1, type=0x6026c0 
>, source=1, tag=0, comm=0x6020a0 , 
>
>    status=0x7fff71aabd90) at precv.c:78
>
>8  0x00400c44 in main (argc=1, argv=0x7fff71aabe98) at hellompi.c:33
>
>
>cn2:
>
>(gdb) bt
>
>0  0x7fa65aa78c6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
>
>1  0x7fa65a4144e0 in poll_dispatch (base=0x20e96b0, tv=0x7fff46f44a80) at 
>poll.c:165
>
>2  0x7fa65a409041 in opal_libevent2021_event_base_loop (base=0x20e96b0, 
>flags=2) at event.c:1631
>
>3  0x7fa65a3a3891 in opal_progress () at runtime/opal_progress.c:169
>
>4  0x7fa65afbbc25 in opal_condition_wait (c=0x7fa65b2bfbc0 
>, m=0x7fa65b2bfcc0 ) at 
>../opal/threads/condition.h:78
>
>5  0x7fa65afbc1b5 in ompi_request_default_wait_all (count=2, 
>requests=0x7fff46f44c70, statuses=0x0) at request/req_wait.c:287
>
>6  0x7fa65afc7906 in comm_allgather_pml (src_buf=0x7fff46f44da0, 
>dest_buf=0x233dac0, count=288, dtype=0x7fa65b29fee0 , 
>my_rank_in_group=1, 
>
>    n_peers=2, ranks_in_comm=0x210a760, comm=0x6020a0 ) 
>at patterns/comm/allgather.c:250
>
>7  0x7fa64f14ba08 in bcol_basesmuma_smcm_allgather_connection 
>(sm_bcol_module=0x7fa64e64d010, module=0x232c800, 
>
>    peer_list=0x7fa64f3513e8 , 
>back_files=0x7fa64eae2690, comm=0x6020a0 , input=..., 
>
>    base_fname=0x7fa64f14ca8c "sm_ctl_mem_", map_all=false) at 
>bcol_basesmuma_smcm.c:205
>
>8  0x7fa64f146525 in base_bcol_basesmuma_setup_ctl 
>(sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220 
>) at bcol_basesmuma_setup.c:344
>
>9  0x7fa64f146cbb in base_bcol_basesmuma_setup_library_buffers 
>(sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220 
>)
>
>    at bcol_basesmuma_setup.c:550
>
>10 0x7fa64f1418d0 in mca_bcol_basesmuma_comm_query (module=0x232c800, 
>num_modules=0x232e570) at bcol_basesmuma_module.c:532
>
>11 0x7fa64fd9e5f2 in mca_coll_ml_tree_hierarchy_discovery 
>(ml_module=0x232fbe0, topo=0x232fd98, n_hierarchies=3, exclude_sbgp_name=0x0, 
>include_sbgp_name=0x0)
>
>    at coll_ml_module.c:1964
>
>12 0x7fa64fd9f3a3 in mca_coll_ml_fulltree_hierarchy_discovery 
>(ml_module=0x232fbe0, n_hierarchies=3) at coll_ml_module.c:2211
>
>13 0x7fa64fd9cbe4 in ml_discover_hierarchy (ml_module=0x232fbe0) at 
>coll_ml_module.c:1518
>
>14 0x7fa64fda164f in mca_coll_ml_comm_query (comm=0x6020a0 
>, priority=0x7fff46f45358) at coll_ml_module.c:2970
>
>15 0x7fa65b02f6aa in 

[OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Hello, I have troubles with latest trunk if I use PMI1.

For example, if I use 2 nodes the application hangs. See backtraces from
both nodes below. From them I can see that second (non launching) node
hangs in bcol component selection. Here is the default setting of
bcol_base_string parameter:
bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll,ugni"
according to ompi_info. I don't know if it is correct that basesmuma is
duplicated or not.

Experiments with this parameter showed that it directly influences the bug:
export OMPI_MCA_bcol_base_string="" #  [SEGFAULT]
export OMPI_MCA_bcol_base_string="ptpcoll" #  [OK]
export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll" #  [OK]
export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload" #  [OK]
export OMPI_MCA_bcol_base_string="basesmuma,ptpcoll,iboffload,ugni" #  [OK]
export
OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll,iboffload,ugni" #
 [HANG]
export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ptpcoll" #
[HANG]
export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload" # [OK]
export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,iboffload,ugni" # [OK]
export OMPI_MCA_bcol_base_string="basesmuma,basesmuma,ptpcoll" #  [HANG]
export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma" #  [OK]
export OMPI_MCA_bcol_base_string="ptpcoll,basesmuma,basesmuma" #  [HANG]

I can provide other information if nessesary.

cn1:
(gdb) bt
0  0x7fdebd30ac6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
1  0x7fdebcca64e0 in poll_dispatch (base=0x1d466b0, tv=0x7fff71aab880)
at poll.c:165
2  0x7fdebcc9b041 in opal_libevent2021_event_base_loop (base=0x1d466b0,
flags=2) at event.c:1631
3  0x7fdebcc35891 in opal_progress () at runtime/opal_progress.c:169
4  0x7fdeb32f78cb in opal_condition_wait (c=0x7fdebdb51bc0
, m=0x7fdebdb51cc0 ) at
../../../../opal/threads/condition.h:78
5  0x7fdeb32f79b8 in ompi_request_wait_completion (req=0x7fff71aab920)
at ../../../../ompi/request/request.h:381
6  0x7fdeb32f84b8 in mca_pml_ob1_recv (addr=0x7fff71aabd80, count=1,
datatype=0x6026c0 , src=1, tag=0, comm=0x6020a0
,
status=0x7fff71aabd90) at pml_ob1_irecv.c:109
7  0x7fdebd88f54d in PMPI_Recv (buf=0x7fff71aabd80, count=1,
type=0x6026c0 , source=1, tag=0, comm=0x6020a0
,
status=0x7fff71aabd90) at precv.c:78
8  0x00400c44 in main (argc=1, argv=0x7fff71aabe98) at hellompi.c:33

cn2:
(gdb) bt
0  0x7fa65aa78c6d in poll () from /lib/x86_64-linux-gnu/libc.so.6
1  0x7fa65a4144e0 in poll_dispatch (base=0x20e96b0, tv=0x7fff46f44a80)
at poll.c:165
2  0x7fa65a409041 in opal_libevent2021_event_base_loop (base=0x20e96b0,
flags=2) at event.c:1631
3  0x7fa65a3a3891 in opal_progress () at runtime/opal_progress.c:169
4  0x7fa65afbbc25 in opal_condition_wait (c=0x7fa65b2bfbc0
, m=0x7fa65b2bfcc0 ) at
../opal/threads/condition.h:78
5  0x7fa65afbc1b5 in ompi_request_default_wait_all (count=2,
requests=0x7fff46f44c70, statuses=0x0) at request/req_wait.c:287
6  0x7fa65afc7906 in comm_allgather_pml (src_buf=0x7fff46f44da0,
dest_buf=0x233dac0, count=288, dtype=0x7fa65b29fee0 ,
my_rank_in_group=1,
n_peers=2, ranks_in_comm=0x210a760, comm=0x6020a0
) at patterns/comm/allgather.c:250
7  0x7fa64f14ba08 in bcol_basesmuma_smcm_allgather_connection
(sm_bcol_module=0x7fa64e64d010, module=0x232c800,
peer_list=0x7fa64f3513e8 ,
back_files=0x7fa64eae2690, comm=0x6020a0 , input=...,
base_fname=0x7fa64f14ca8c "sm_ctl_mem_", map_all=false) at
bcol_basesmuma_smcm.c:205
8  0x7fa64f146525 in base_bcol_basesmuma_setup_ctl
(sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
) at bcol_basesmuma_setup.c:344
9  0x7fa64f146cbb in base_bcol_basesmuma_setup_library_buffers
(sm_bcol_module=0x7fa64e64d010, cs=0x7fa64f351220
)
at bcol_basesmuma_setup.c:550
10 0x7fa64f1418d0 in mca_bcol_basesmuma_comm_query (module=0x232c800,
num_modules=0x232e570) at bcol_basesmuma_module.c:532
11 0x7fa64fd9e5f2 in mca_coll_ml_tree_hierarchy_discovery
(ml_module=0x232fbe0, topo=0x232fd98, n_hierarchies=3,
exclude_sbgp_name=0x0, include_sbgp_name=0x0)
at coll_ml_module.c:1964
12 0x7fa64fd9f3a3 in mca_coll_ml_fulltree_hierarchy_discovery
(ml_module=0x232fbe0, n_hierarchies=3) at coll_ml_module.c:2211
13 0x7fa64fd9cbe4 in ml_discover_hierarchy (ml_module=0x232fbe0) at
coll_ml_module.c:1518
14 0x7fa64fda164f in mca_coll_ml_comm_query (comm=0x6020a0
, priority=0x7fff46f45358) at coll_ml_module.c:2970
15 0x7fa65b02f6aa in query_2_0_0 (component=0x7fa64fffe4e0
, comm=0x6020a0 ,
priority=0x7fff46f45358,
module=0x7fff46f45390) at base/coll_base_comm_select.c:374
16 0x7fa65b02f66e in query (component=0x7fa64fffe4e0
, comm=0x6020a0 ,
priority=0x7fff46f45358, module=0x7fff46f45390)
at base/coll_base_comm_select.c:357
17 0x7fa65b02f581 in check_one_component (comm=0x6020a0
, component=0x7fa64fffe4e0 ,
module=0x7fff46f45390)
at base/coll_base_comm_select.c:319
18