Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

Ralph Castain Wed, 7 May 2014 12:08:26 -0400 (EDT)

Yeah, we'll want to move some of it into common - but a lot of that was already 
done, so I think it won't be that hard. Will explore



On May 7, 2014, at 9:00 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> +1 Sounds like a good idea - but decoupling the two and adding all the right 
> selection mojo might be a bit of a pain. There are several places in OMPI 
> where the distinction between PMI1and PMI2 is made, not only in grpcomm. DB 
> and ESS frameworks off the top of my head.
> 
> Josh
> 
> 
> On Wed, May 7, 2014 at 11:48 AM, Artem Polyakov <artpo...@gmail.com> wrote:
> Good idea :)!
> 
> среда, 7 мая 2014 г. пользователь Ralph Castain написал:
> 
> Jeff actually had a useful suggestion (gasp!).He proposed that we separate 
> the PMI-1 and PMI-2 codes into separate components so you could select them 
> at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs are 
> found), default to PMI-1, but users could select to try PMI-2. If the PMI-2 
> component failed, we would emit a show_help indicating that they probably 
> have a broken PMI-2 version and should try PMI-1.
> 
> Make sense?
> Ralph
> 
> On May 7, 2014, at 8:00 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> 
>> On May 7, 2014, at 7:56 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>> 
>>> Ah, I see. Sorry for the reactionary comment - but this feature falls 
>>> squarely within my "jurisdiction", and we've invested a lot in improving 
>>> OMPI jobstart under srun. 
>>> 
>>> That being said (now that I've taken some deep breaths and carefully read 
>>> your original email :)), what you're proposing isn't a bad idea. I think it 
>>> would be good to maybe add a "--with-pmi2" flag to configure since 
>>> "--with-pmi" automagically uses PMI2 if it finds the header and lib. This 
>>> way, we could experiment with PMI1/PMI2 without having to rebuild SLURM or 
>>> hack the installation. 
>> 
>> That would be a much simpler solution than what Artem proposed (off-list) 
>> where we would try PMI2 and then if it didn't work try to figure out how to 
>> fall back to PMI1. I'll add this for now, and if Artem wants to try his more 
>> automagic solution and can make it work, then we can reconsider that option.
>> 
>> Thanks
>> Ralph
>> 
>>> 
>>> Josh  
>>> 
>>> 
>>> On Wed, May 7, 2014 at 10:45 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Okay, then we'll just have to develop a workaround for all those Slurm 
>>> releases where PMI-2 is borked :-(
>>> 
>>> FWIW: I think people misunderstood my statement. I specifically did *not* 
>>> propose to *lose* PMI-2 support. I suggested that we change it to 
>>> "on-by-request" instead of the current "on-by-default" so we wouldn't keep 
>>> getting asked about PMI-2 bugs in Slurm. Once the Slurm implementation 
>>> stabilized, then we could reverse that policy.
>>> 
>>> However, given that both you and Chris appear to prefer to keep it 
>>> "on-by-default", we'll see if we can find a way to detect that PMI-2 is 
>>> broken and then fall back to PMI-1.
>>> 
>>> 
>>> On May 7, 2014, at 7:39 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>>> 
>>>> Just saw this thread, and I second Chris' observations: at scale we are 
>>>> seeing huge gains in jobstart performance with PMI2 over PMI1. We CANNOT 
>>>> loose this functionality. For competitive reasons, I cannot provide exact 
>>>> numbers, but let's say the difference is in the ballpark of a full 
>>>> order-of-magnitude on 20K ranks versus PMI1. PMI1 is completely 
>>>> unacceptable/unusable at scale. Certainly PMI2 still has scaling issues, 
>>>> but there is no contest between PMI1 and PMI2.  We (MLNX) are actively 
>>>> working to resolve some of the scalability issues in PMI2. 
>>>> 
>>>> Josh
>>>> 
>>>> Joshua S. Ladd
>>>> Staff Engineer, HPC Software
>>>> Mellanox Technologies
>>>> 
>>>> Email: josh...@mellanox.com
>>>> 
>>>> 
>>>> On Wed, May 7, 2014 at 4:00 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> Interesting - how many nodes were involved? As I said, the bad scaling 
>>>> becomes more evident at a fairly high node count.
>>>> 
>>>> On May 7, 2014, at 12:07 AM, Christopher Samuel <sam...@unimelb.edu.au> 
>>>> wrote:
>>>> 
>>>> > -----BEGIN PGP SIGNED MESSAGE-----
>>>> > Hash: SHA1
>>>> >
>>>> > Hiya Ralph,
>>>> >
>>>> > On 07/05/14 14:49, Ralph Castain wrote:
>>>> >
>>>> >> I should have looked closer to see the numbers you posted, Chris -
>>>> >> those include time for MPI wireup. So what you are seeing is that
>>>> >> mpirun is much more efficient at exchanging the MPI endpoint info
>>>> >> than PMI. I suspect that PMI2 is not much better as the primary
>>>> >> reason for the difference is that mpriun sends blobs, while PMI
>>>> >> requires that everything b
> 
> 
> -- 
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14716.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14717.php

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

Reply via email to