the mailing list refused to let me add the config.log file, since it is too large, I can forward the output to you directly as well (as I did to Jeff).

I honestly have not looked into the configure logic, I can just tell that OPAL_HAVE_LTDL_ADVISE is not set on my linux system for master, but is set on the 1.8 series (1.8 series checkout was from Nov. 20, so if something changed in between the result might be different).



On 12/2/2014 9:27 AM, Artem Polyakov wrote:

2014-12-02 20:59 GMT+06:00 Edgar Gabriel <gabr...@cs.uh.edu
<mailto:gabr...@cs.uh.edu>>:

    didn't want to interfere with this thread, although I have a similar
    issue, since I have the solution nearly fully cooked up. But anyway,
    this last email gave the hint on why we have suddenly the problem in
    ompio:

    it looks like OPAL_HAVE_LTDL_ADVISE (at least on my systems) is not
    set anymore, so the entire section is being skipped. I double
    checked that with the 1.8 branch, it goes through the section, but
    not with master.


Hi, Edgar.

Both master and ompi-release (isn't it 1.8?!) are equal in sence of my
fix. Something else!? I'd like to see config.log too but will look into
it only tomorrow.

Also I want to add that SLURM PMI2 communicates with local slurmstepd's
and doesn't need any authentification. All PMI1 processes otherwise
communicate to the srun process and thus need libslurm services for
communication and authentification.


    Thanks
    Edgar




    On 12/2/2014 7:56 AM, Jeff Squyres (jsquyres) wrote:

        Looks like I was totally lying in
        http://www.open-mpi.org/__community/lists/devel/2014/12/__16381.php
        <http://www.open-mpi.org/community/lists/devel/2014/12/16381.php> (where
        I said we should not use RTLD_GLOBAL).  We *do* use RTLD_GLOBAL:

        
https://github.com/open-mpi/__ompi/blob/master/opal/mca/__base/mca_base_component___repository.c#L124
        
<https://github.com/open-mpi/ompi/blob/master/opal/mca/base/mca_base_component_repository.c#L124>

        This ltdl advice object is passed to lt_dlopen() for all
        components.  My mistake; sorry.

        So the idea that using RTLD_GLOBAL will fix this SLURM bug is
        incorrect.

        I believe someone said earlier in the thread that adding the
        right -llibs to the configure line will solve the issue, and
        that sounds correct to me.  If there's a missing symbol because
        the SLURM libraries are not automatically pulling in the right
        dependent libraries, then *if* we put a workaround in OMPI to
        fix this issue, then the right workaround is to add the relevant
        -llibs when that component is linked.

        *If* you add that workaround (which is a whole separate
        discussion), I would suggest adding a configure.m4 test to see
        if adding the additional -llibs are necessary.  Perhaps
        AC_LINK_IFELSE looking for a symbol, and then if that fails,
        AC_LINK_IFELSE again with the additional -llibs to see if that
        works.

        Or something like that.



        On Dec 2, 2014, at 6:38 AM, Artem Polyakov <artpo...@gmail.com
        <mailto:artpo...@gmail.com>> wrote:

            Agree. First you should check is to what value
            OPAL_HAVE_LTDL_ADVISE is set. If it is zero - very probably
            this is the same bug as mine.

            2014-12-02 17:33 GMT+06:00 Ralph Castain <r...@open-mpi.org
            <mailto:r...@open-mpi.org>>:
            It does look similar - question is: why didn’t this fix the
            problem? Will have to investigate.

            Thanks


                On Dec 2, 2014, at 3:17 AM, Artem Polyakov
                <artpo...@gmail.com <mailto:artpo...@gmail.com>> wrote:



                2014-12-02 17:13 GMT+06:00 Ralph Castain
                <r...@open-mpi.org <mailto:r...@open-mpi.org>>:
                Hmmm…if that is true, then it didn’t fix this problem as
                it is being reported in the master.

                I had this problem on my laptop installation. You can
                check my report it was detailed enough and see if you
                hitting the same issue. My fix was also included into
                1.8 branch. I am not sure that this is the same issue
                but they looks similar.



                    On Dec 1, 2014, at 9:40 PM, Artem Polyakov
                    <artpo...@gmail.com <mailto:artpo...@gmail.com>> wrote:

                    I think this might be related to the configuration
                    problem I was fixing with Jeff few months ago. Refer
                    here:
                    https://github.com/open-mpi/__ompi/pull/240
                    <https://github.com/open-mpi/ompi/pull/240>

                    2014-12-02 10:15 GMT+06:00 Ralph Castain
                    <r...@open-mpi.org <mailto:r...@open-mpi.org>>:
                    If it isn’t too much trouble, it would be good to
                    confirm that it remains broken. I strongly suspect
                    it is based on Moe’s comments.

                    Obviously, other people are making this work. For
                    Intel MPI, all you do is point it at libpmi and they
                    can run. However, they do explicitly dlopen it in
                    their code, and I don’t know what flags they might
                    pass when they do so.

                    If necessary, I suppose we could follow that
                    pattern. In other words, rather than specifically
                    linking the “s1” component to libpmi, instead
                    require that the user point us to a pmi library via
                    an MCA param, then explicitly dlopen that library
                    with RTLD_GLOBAL. This avoids the issues cited by
                    Jeff, but resolves the pmi linkage problem.


                        On Dec 1, 2014, at 8:09 PM, Gilles Gouaillardet
                        <gilles.gouaillar...@iferc.org
                        <mailto:gilles.gouaillar...@iferc.org>__> wrote:

                        $ srun --version
                        slurm 2.6.6-VENDOR_PROVIDED

                        $ srun --mpi=pmi2 -n 1 ~/hw
                        I am 0 / 1

                        $ srun -n 1 ~/hw
                        /csc/home1/gouaillardet/hw: symbol lookup error:
                        /usr/lib64/slurm/auth_munge.__so: undefined
                        symbol: slurm_verbose
                        srun: error: slurm_receive_msg: Zero Bytes were
                        transmitted or received
                        srun: error: slurm_receive_msg[10.0.3.15]: Zero
                        Bytes were transmitted or received
                        srun: error: soleil: task 0: Exited with exit
                        code 127

                        $ ldd /usr/lib64/slurm/auth_munge.so
                              linux-vdso.so.1 =>  (0x00007fff54478000)
                              libmunge.so.2 => /usr/lib64/libmunge.so.2
                        (0x00007f744760f000)
                              libpthread.so.0 => /lib64/libpthread.so.0
                        (0x00007f74473f1000)
                              libc.so.6 => /lib64/libc.so.6
                        (0x00007f744705d000)
                              /lib64/ld-linux-x86-64.so.2
                        (0x0000003bf5400000)


                        now, if i reling auth_munge.so so it depends on
                        libslurm :

                        $ srun -n 1 ~/hw
                        srun: symbol lookup error:
                        /usr/lib64/slurm/auth_munge.__so: undefined
                        symbol: slurm_auth_get_arg_desc


                        i can give a try to the latest slurm if needed

                        Cheers,

                        Gilles


                        On 2014/12/02 12:56, Ralph Castain wrote:

                            Out of curiosity - how are you testing
                            these? I have more current versions of Slurm
                            and would like to test the observations there.


                                On Dec 1, 2014, at 7:49 PM, Gilles
                                Gouaillardet
                                <gilles.gouaillar...@iferc.org
                                <mailto:gilles.gouaillar...@iferc.org>__>
                                   wrote:

                                I d like to make a step back ...

                                i previously tested with slurm 2.6.0,
                                and it complained about the
                                slurm_verbose symbol that is defined in
                                libslurm.so
                                so with slurm 2.6.0, RTLD_GLOBAL or
                                relinking is ok

                                now i tested with slurm 2.6.6 and it
                                complains about the
                                slurm_auth_get_arg_desc symbol, and this
                                symbol is not
                                defined in any dynamic library. it is
                                internally defined in the static
                                libcommon.a library, which is used to
                                build the slurm binaries.

                                as far as i understand, auth_munge.so
                                can only be invoked from a slurm binary,
                                which means it cannot be invoked from an
                                mpi application
                                even if it is linked with libslurm,
                                libpmi, ...

                                that looks like a slurm design issue
                                that the slurm folks will take care of.

                                Cheers,

                                Gilles

                                On 2014/12/02 12:33, Ralph Castain wrote:

                                    Another option is to simply add the
                                    -lslurm -lauth flags to the pmix/s1
                                    component as this is the only place
                                    that requires it, and it won’t hurt
                                    anything to do so.



                                        On Dec 1, 2014, at 6:03 PM,
                                        Gilles Gouaillardet
                                        <gilles.gouaillar...@iferc.org
                                        
<mailto:gilles.gouaillar...@iferc.org>__>
                                        <mailto:gilles.gouaillardet@__iferc.org
                                        <mailto:gilles.gouaillar...@iferc.org>>
                                           wrote:

                                        Jeff,

                                        FWIW, you can read my analysis
                                        of what is going wrong at

                                        
http://www.open-mpi.org/__community/lists/pmix-devel/__2014/11/0293.php
                                        
<http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>
                                        
<http://www.open-mpi.org/__community/lists/pmix-devel/__2014/11/0293.php
                                        
<http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>>
                                        
<http://www.open-mpi.org/__community/lists/pmix-devel/__2014/11/0293.php
                                        
<http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>>
                                        
<http://www.open-mpi.org/__community/lists/pmix-devel/__2014/11/0293.php
                                        
<http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>>


                                        bottom line, i agree this is a
                                        slurm issue (slurm plugin should
                                        depend
                                        on libslurm, but they do not, yet)

                                        a possible workaround would be
                                        to make the pmi component a
                                        "proxy" that
                                        dlopen with RTLD_GLOBAL the
                                        "real" component in which the
                                        job is done.
                                        that being said, the impact is
                                        quite limited (no direct launch
                                        in slurm
                                        with pmi1, but pmi2 works fine)
                                        so it makes sense not to work around
                                        someone else problem.
                                        and that being said, configure
                                        could detect this broken pmi1
                                        and not
                                        build pmi1 support or print a
                                        user friendly error message if
                                        pmi1 is used.

                                        any thoughts ?

                                        Cheers,

                                        Gilles

                                        On 2014/12/02 7:47, Jeff Squyres
                                        (jsquyres) wrote:

                                            Ok, if the problem is moot,
                                            great.

                                            (sidenote: this is moot, so
                                            ignore this if you want:
                                            with this explanation, I'm
                                            still not sure how
                                            RTLD_GLOBAL fixes the issue)


                                            On Dec 1, 2014, at 5:15 PM,
                                            Ralph Castain
                                            <r...@open-mpi.org
                                            <mailto:r...@open-mpi.org>>
                                            <mailto:r...@open-mpi.org
                                            <mailto:r...@open-mpi.org>>
                                               wrote:


                                                Easy enough to explain.
                                                We link libpmi into the
                                                pmix/s1 component. This
                                                library is missing the
                                                linkage to libslurm that
                                                contains the linkage to
                                                libauth where munge
                                                resides. So when we call
                                                a PMI function, libpmi
                                                references a call to
                                                munge for authentication
                                                and hits an “unresolved
                                                symbol” error.

                                                Moe acknowledges the
                                                error is in Slurm and is
                                                fixing the linkages so
                                                this problem goes away



                                                    On Dec 1, 2014, at
                                                    2:13 PM, Jeff
                                                    Squyres (jsquyres)
                                                    <jsquy...@cisco.com
                                                    <mailto:jsquy...@cisco.com>>
                                                    <mailto:jsquy...@cisco.com
                                                    <mailto:jsquy...@cisco.com>>
                                                       wrote:

                                                    On Dec 1, 2014, at
                                                    5:07 PM, Ralph Castain
                                                    <r...@open-mpi.org
                                                    <mailto:r...@open-mpi.org>>
                                                    <mailto:r...@open-mpi.org
                                                    <mailto:r...@open-mpi.org>>
                                                       wrote:


                                                        FWIW: It’s
                                                        Slurm’s pmi-1
                                                        library that
                                                        isn’t linked
                                                        correctly
                                                        against its
                                                        dependencies
                                                        (the pmi-2 one
                                                        is correct).
                                                        Moe is aware of
                                                        the problem and
                                                        fixing it on
                                                        their side. This
                                                        won’t help
                                                        existing
                                                        installations
                                                        until they
                                                        upgrade, but I
                                                        tend to agree
                                                        with Jeff about
                                                        not fixing other
                                                        people’s problems.

                                                    Can you explain what
                                                    is happening?

                                                    I ask because I'm
                                                    not sure I
                                                    understand the
                                                    problem such that
                                                    using RTLD_GLOBAL
                                                    would fix it.  I.e.,
                                                    even if libpmi1.so
                                                    isn't linked against
                                                    its dependencies
                                                    properly, that
                                                    shouldn't cause a
                                                    problem if OMPI
                                                    components A and B
                                                    are both linked
                                                    against libpmi1.so,
                                                    and then A is
                                                    loaded, and then B
                                                    is loaded.

                                                    ...or perhaps we can
                                                    just discuss this on
                                                    the call tomorrow?

                                                    --
                                                    Jeff Squyres

                                                    jsquy...@cisco.com
                                                    <mailto:jsquy...@cisco.com>
                                                    <mailto:jsquy...@cisco.com
                                                    <mailto:jsquy...@cisco.com>>

                                                    For corporate legal
                                                    information go to:
                                                    
http://www.cisco.com/web/__about/doing_business/legal/__cri/
                                                    
<http://www.cisco.com/web/about/doing_business/legal/cri/>
                                                    
<http://www.cisco.com/web/__about/doing_business/legal/__cri/
                                                    
<http://www.cisco.com/web/about/doing_business/legal/cri/>>


                                                    
_________________________________________________
                                                    devel mailing list

                                                    de...@open-mpi.org
                                                    <mailto:de...@open-mpi.org>
                                                    <mailto:de...@open-mpi.org
                                                    <mailto:de...@open-mpi.org>>

                                                    Subscription:
                                                    
http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                                    
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                                                    
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                                    
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>

                                                    Link to this post:
                                                    
http://www.open-mpi.org/__community/lists/devel/2014/12/__16383.php
                                                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16383.php>
                                                    
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16383.php
                                                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16383.php>>

                                                
_________________________________________________
                                                devel mailing list

                                                de...@open-mpi.org
                                                <mailto:de...@open-mpi.org>
                                                <mailto:de...@open-mpi.org
                                                <mailto:de...@open-mpi.org>>

                                                Subscription:
                                                
http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                                
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                                                
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                                
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>

                                                Link to this post:
                                                
http://www.open-mpi.org/__community/lists/devel/2014/12/__16384.php
                                                
<http://www.open-mpi.org/community/lists/devel/2014/12/16384.php>
                                                
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16384.php
                                                
<http://www.open-mpi.org/community/lists/devel/2014/12/16384.php>>

                                        
_________________________________________________
                                        devel mailing list

                                        de...@open-mpi.org
                                        <mailto:de...@open-mpi.org>
                                        <mailto:de...@open-mpi.org
                                        <mailto:de...@open-mpi.org>>
                                        <mailto:de...@open-mpi.org
                                        <mailto:de...@open-mpi.org>>
                                        <mailto:de...@open-mpi.org
                                        <mailto:de...@open-mpi.org>>

                                        Subscription:
                                        
http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                        
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                                        
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                        
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>
                                        
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                        
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>
                                        
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                        
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>

                                        Link to this post:
                                        
http://www.open-mpi.org/__community/lists/devel/2014/12/__16386.php
                                        
<http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>
                                        
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16386.php
                                        
<http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>>
                                        
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16386.php
                                        
<http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>>
                                        
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16386.php
                                        
<http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>>

                                    
_________________________________________________
                                    devel mailing list

                                    de...@open-mpi.org
                                    <mailto:de...@open-mpi.org>
                                    <mailto:de...@open-mpi.org
                                    <mailto:de...@open-mpi.org>>

                                    Subscription:
                                    
http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                    
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                                    
<http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                    
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>>

                                    Link to this post:
                                    
http://www.open-mpi.org/__community/lists/devel/2014/12/__16387.php
                                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16387.php>
                                    
<http://www.open-mpi.org/__community/lists/devel/2014/12/__16387.php
                                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16387.php>>

                                
_________________________________________________
                                devel mailing list

                                de...@open-mpi.org
                                <mailto:de...@open-mpi.org>

                                Subscription:
                                
http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                                
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>

                                Link to this post:
                                
http://www.open-mpi.org/__community/lists/devel/2014/12/__16388.php
                                
<http://www.open-mpi.org/community/lists/devel/2014/12/16388.php>



                            _________________________________________________
                            devel mailing list

                            de...@open-mpi.org <mailto:de...@open-mpi.org>

                            Subscription:
                            http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                            <http://www.open-mpi.org/mailman/listinfo.cgi/devel>

                            Link to this post:
                            
http://www.open-mpi.org/__community/lists/devel/2014/12/__16389.php
                            
<http://www.open-mpi.org/community/lists/devel/2014/12/16389.php>


                        _________________________________________________
                        devel mailing list
                        de...@open-mpi.org <mailto:de...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                        <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                        Link to this post:
                        
http://www.open-mpi.org/__community/lists/devel/2014/12/__16390.php
                        
<http://www.open-mpi.org/community/lists/devel/2014/12/16390.php>



                    _________________________________________________
                    devel mailing list
                    de...@open-mpi.org <mailto:de...@open-mpi.org>
                    Subscription:
                    http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                    <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                    Link to this post:
                    
http://www.open-mpi.org/__community/lists/devel/2014/12/__16391.php
                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16391.php>



                    --
                    С Уважением, Поляков Артем Юрьевич
                    Best regards, Artem Y. Polyakov
                    _________________________________________________
                    devel mailing list
                    de...@open-mpi.org <mailto:de...@open-mpi.org>
                    Subscription:
                    http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                    <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                    Link to this post:
                    
http://www.open-mpi.org/__community/lists/devel/2014/12/__16393.php
                    
<http://www.open-mpi.org/community/lists/devel/2014/12/16393.php>



                _________________________________________________
                devel mailing list
                de...@open-mpi.org <mailto:de...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                Link to this post:
                
http://www.open-mpi.org/__community/lists/devel/2014/12/__16395.php
                
<http://www.open-mpi.org/community/lists/devel/2014/12/16395.php>



                --
                С Уважением, Поляков Артем Юрьевич
                Best regards, Artem Y. Polyakov
                _________________________________________________
                devel mailing list
                de...@open-mpi.org <mailto:de...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/__mailman/listinfo.cgi/devel
                <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
                Link to this post:
                
http://www.open-mpi.org/__community/lists/devel/2014/12/__16396.php
                
<http://www.open-mpi.org/community/lists/devel/2014/12/16396.php>



            _________________________________________________
            devel mailing list
            de...@open-mpi.org <mailto:de...@open-mpi.org>
            Subscription:
            http://www.open-mpi.org/__mailman/listinfo.cgi/devel
            <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
            Link to this post:
            http://www.open-mpi.org/__community/lists/devel/2014/12/__16397.php
            <http://www.open-mpi.org/community/lists/devel/2014/12/16397.php>



            --
            С Уважением, Поляков Артем Юрьевич
            Best regards, Artem Y. Polyakov
            _________________________________________________
            devel mailing list
            de...@open-mpi.org <mailto:de...@open-mpi.org>
            Subscription:
            http://www.open-mpi.org/__mailman/listinfo.cgi/devel
            <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
            Link to this post:
            http://www.open-mpi.org/__community/lists/devel/2014/12/__16398.php
            <http://www.open-mpi.org/community/lists/devel/2014/12/16398.php>




    --
    Edgar Gabriel
    Associate Professor
    Parallel Software Technologies Lab http://pstl.cs.uh.edu
    Department of Computer Science          University of Houston
    Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
    Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
    _________________________________________________
    devel mailing list
    de...@open-mpi.org <mailto:de...@open-mpi.org>
    Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/devel
    <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
    Link to this post:
    http://www.open-mpi.org/__community/lists/devel/2014/12/__16400.php
    <http://www.open-mpi.org/community/lists/devel/2014/12/16400.php>




--
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/12/16404.php


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to