Agree. First you should check is to what value OPAL_HAVE_LTDL_ADVISE is
set. If it is zero - very probably this is the same bug as mine.

2014-12-02 17:33 GMT+06:00 Ralph Castain <r...@open-mpi.org>:

> It does look similar - question is: why didn’t this fix the problem? Will
> have to investigate.
>
> Thanks
>
>
> On Dec 2, 2014, at 3:17 AM, Artem Polyakov <artpo...@gmail.com> wrote:
>
>
>
> 2014-12-02 17:13 GMT+06:00 Ralph Castain <r...@open-mpi.org>:
>
>> Hmmm…if that is true, then it didn’t fix this problem as it is being
>> reported in the master.
>>
>
> I had this problem on my laptop installation. You can check my report it
> was detailed enough and see if you hitting the same issue. My fix was also
> included into 1.8 branch. I am not sure that this is the same issue but
> they looks similar.
>
>
>>
>>
>> On Dec 1, 2014, at 9:40 PM, Artem Polyakov <artpo...@gmail.com> wrote:
>>
>> I think this might be related to the configuration problem I was fixing
>> with Jeff few months ago. Refer here:
>> https://github.com/open-mpi/ompi/pull/240
>>
>> 2014-12-02 10:15 GMT+06:00 Ralph Castain <r...@open-mpi.org>:
>>
>>> If it isn’t too much trouble, it would be good to confirm that it
>>> remains broken. I strongly suspect it is based on Moe’s comments.
>>>
>>> Obviously, other people are making this work. For Intel MPI, all you do
>>> is point it at libpmi and they can run. However, they do explicitly dlopen
>>> it in their code, and I don’t know what flags they might pass when they do
>>> so.
>>>
>>> If necessary, I suppose we could follow that pattern. In other words,
>>> rather than specifically linking the “s1” component to libpmi, instead
>>> require that the user point us to a pmi library via an MCA param, then
>>> explicitly dlopen that library with RTLD_GLOBAL. This avoids the issues
>>> cited by Jeff, but resolves the pmi linkage problem.
>>>
>>>
>>> On Dec 1, 2014, at 8:09 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@iferc.org> wrote:
>>>
>>> $ srun --version
>>> slurm 2.6.6-VENDOR_PROVIDED
>>>
>>> $ srun --mpi=pmi2 -n 1 ~/hw
>>> I am 0 / 1
>>>
>>> $ srun -n 1 ~/hw
>>> /csc/home1/gouaillardet/hw: symbol lookup error:
>>> /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_verbose
>>> srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
>>> srun: error: slurm_receive_msg[10.0.3.15]: Zero Bytes were transmitted
>>> or received
>>> srun: error: soleil: task 0: Exited with exit code 127
>>>
>>> $ ldd /usr/lib64/slurm/auth_munge.so
>>>     linux-vdso.so.1 =>  (0x00007fff54478000)
>>>     libmunge.so.2 => /usr/lib64/libmunge.so.2 (0x00007f744760f000)
>>>     libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f74473f1000)
>>>     libc.so.6 => /lib64/libc.so.6 (0x00007f744705d000)
>>>     /lib64/ld-linux-x86-64.so.2 (0x0000003bf5400000)
>>>
>>>
>>> now, if i reling auth_munge.so so it depends on libslurm :
>>>
>>> $ srun -n 1 ~/hw
>>> srun: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined
>>> symbol: slurm_auth_get_arg_desc
>>>
>>>
>>> i can give a try to the latest slurm if needed
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On 2014/12/02 12:56, Ralph Castain wrote:
>>>
>>> Out of curiosity - how are you testing these? I have more current versions 
>>> of Slurm and would like to test the observations there.
>>>
>>>
>>> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>> I d like to make a step back ...
>>>
>>> i previously tested with slurm 2.6.0, and it complained about the 
>>> slurm_verbose symbol that is defined in libslurm.so
>>> so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok
>>>
>>> now i tested with slurm 2.6.6 and it complains about the 
>>> slurm_auth_get_arg_desc symbol, and this symbol is not
>>> defined in any dynamic library. it is internally defined in the static 
>>> libcommon.a library, which is used to build the slurm binaries.
>>>
>>> as far as i understand, auth_munge.so can only be invoked from a slurm 
>>> binary, which means it cannot be invoked from an mpi application
>>> even if it is linked with libslurm, libpmi, ...
>>>
>>> that looks like a slurm design issue that the slurm folks will take care of.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/12/02 12:33, Ralph Castain wrote:
>>>
>>> Another option is to simply add the -lslurm -lauth flags to the pmix/s1 
>>> component as this is the only place that requires it, and it won’t hurt 
>>> anything to do so.
>>>
>>>
>>>
>>> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> 
>>> <mailto:gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> 
>>> wrote:
>>>
>>> Jeff,
>>>
>>> FWIW, you can read my analysis of what is going wrong 
>>> athttp://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>
>>>
>>> bottom line, i agree this is a slurm issue (slurm plugin should depend
>>> on libslurm, but they do not, yet)
>>>
>>> a possible workaround would be to make the pmi component a "proxy" that
>>> dlopen with RTLD_GLOBAL the "real" component in which the job is done.
>>> that being said, the impact is quite limited (no direct launch in slurm
>>> with pmi1, but pmi2 works fine) so it makes sense not to work around
>>> someone else problem.
>>> and that being said, configure could detect this broken pmi1 and not
>>> build pmi1 support or print a user friendly error message if pmi1 is used.
>>>
>>> any thoughts ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/12/02 7:47, Jeff Squyres (jsquyres) wrote:
>>>
>>> Ok, if the problem is moot, great.
>>>
>>> (sidenote: this is moot, so ignore this if you want: with this explanation, 
>>> I'm still not sure how RTLD_GLOBAL fixes the issue)
>>>
>>>
>>> On Dec 1, 2014, at 5:15 PM, Ralph Castain <r...@open-mpi.org> 
>>> <r...@open-mpi.org> <mailto:r...@open-mpi.org> <r...@open-mpi.org> wrote:
>>>
>>>
>>> Easy enough to explain. We link libpmi into the pmix/s1 component. This 
>>> library is missing the linkage to libslurm that contains the linkage to 
>>> libauth where munge resides. So when we call a PMI function, libpmi 
>>> references a call to munge for authentication and hits an “unresolved 
>>> symbol” error.
>>>
>>> Moe acknowledges the error is in Slurm and is fixing the linkages so this 
>>> problem goes away
>>>
>>>
>>>
>>> On Dec 1, 2014, at 2:13 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> <jsquy...@cisco.com> <mailto:jsquy...@cisco.com> <jsquy...@cisco.com> wrote:
>>>
>>> On Dec 1, 2014, at 5:07 PM, Ralph Castain <r...@open-mpi.org> 
>>> <r...@open-mpi.org> <mailto:r...@open-mpi.org> <r...@open-mpi.org> wrote:
>>>
>>>
>>> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its 
>>> dependencies (the pmi-2 one is correct).  Moe is aware of the problem and 
>>> fixing it on their side. This won’t help existing installations until they 
>>> upgrade, but I tend to agree with Jeff about not fixing other people’s 
>>> problems.
>>>
>>> Can you explain what is happening?
>>>
>>> I ask because I'm not sure I understand the problem such that using 
>>> RTLD_GLOBAL would fix it.  I.e., even if libpmi1.so isn't linked against 
>>> its dependencies properly, that shouldn't cause a problem if OMPI 
>>> components A and B are both linked against libpmi1.so, and then A is 
>>> loaded, and then B is loaded.
>>>
>>> ...or perhaps we can just discuss this on the call tomorrow?
>>>
>>> --
>>> Jeff squyresjsquy...@cisco.com <mailto:jsquy...@cisco.com> 
>>> <jsquy...@cisco.com>
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/ 
>>> <http://www.cisco.com/web/about/doing_business/legal/cri/> 
>>> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org <mailto:de...@open-mpi.org> 
>>> <de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16383.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org <mailto:de...@open-mpi.org> 
>>> <de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16384.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org <mailto:de...@open-mpi.org> 
>>> <de...@open-mpi.org> <mailto:de...@open-mpi.org> <de...@open-mpi.org> 
>>> <mailto:de...@open-mpi.org> <de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16386.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org <mailto:de...@open-mpi.org> 
>>> <de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16387.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16388.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16389.php
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16390.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16391.php
>>>
>>
>>
>>
>> --
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16393.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16395.php
>>
>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16396.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16397.php
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to