Re: [OMPI devel] smcuda higher exclusivity than anything else?

2015-05-20 Thread Rolf vandeVaart
A few observations.
1. The smcuda btl is only built when --with-cuda is part of the configure line 
so folks who do not do this will not even have this btl and will never run into 
this issue.
2. The priority of the smcuda btl has been higher since Open MPI 1.7.5 (March 
2014).  The idea is that if someone configured in CUDA-aware support, then they 
should not explicitly have to adjust the priority of the smcuda btl to get it 
selected.
3. This issue popped up because I made a change in the smcuda btl between 1.8.4 
and 1.8.5.  The change was that the btl_smcuda_max_send_size was bumped from 
32k to 128K.  This had a positive effect when sending and receiving GPU 
buffers.  I knew it would somewhat negatively affect host memory transfers, but 
figured that was a fair tradeoff.  Based on this report, that may not have been 
the right decision.  If one runs with Open MPI 1.8.5 and sets --mca 
btl_smcuda_max_send_size 32768, then one sees the same performance as 1.8.4 and 
similar to what one gets with the sm btl.

Interesting idea to disqualify this BTL if there are no GPUs on the machine. 

Aurelien, would that seem like a good solution?

Rolf

PS: Unfortunately, the max_send_size value is used for both GPU and CPU 
transfers, and the optimal value for each is different.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>Castain
>Sent: Wednesday, May 20, 2015 3:25 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] smcuda higher exclusivity than anything else?
>
>Rolf - this doesn’t sound right to me. I assume that smcuda is only supposed
>to build if cuda support was found/requested, but if there are no cuda
>adapters, then I would have thought it should disqualify itself.
>
>Can we do something about this for 1.8.6?
>
>> On May 20, 2015, at 11:14 AM, Aurélien Bouteiller <boute...@icl.utk.edu>
>wrote:
>>
>> I was making basic performance measurements on our machine after
>installing 1.8.5, the performance were looking bad. It turns out that the
>smcuda btl has a higher exclusivity than both vader and sm, even on machines
>with no nvidia adapters. Is there a strong reason why the default exclusivity 
>is
>set so high ? Of course it can be easily fixed with a couple of mca options, 
>but
>unsuspecting users that “just run” will experience 1/3 overhead across the
>board for shared memory communication according to my measurements.
>>
>>
>> Side note: from my understanding of the smcuda component, performance
>> should be identical to the regular sm component (as long as no GPU
>operation are required). This is not the case, there is some performance
>penalty with smcuda compared to sm.
>>
>> Aurelien
>>
>> --
>> Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/05/17435.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: http://www.open-
>mpi.org/community/lists/devel/2015/05/17436.php

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] smcuda higher exclusivity than anything else?

2015-05-20 Thread Ralph Castain
Rolf - this doesn’t sound right to me. I assume that smcuda is only supposed to 
build if cuda support was found/requested, but if there are no cuda adapters, 
then I would have thought it should disqualify itself.

Can we do something about this for 1.8.6?

> On May 20, 2015, at 11:14 AM, Aurélien Bouteiller  
> wrote:
> 
> I was making basic performance measurements on our machine after installing 
> 1.8.5, the performance were looking bad. It turns out that the smcuda btl has 
> a higher exclusivity than both vader and sm, even on machines with no nvidia 
> adapters. Is there a strong reason why the default exclusivity is set so high 
> ? Of course it can be easily fixed with a couple of mca options, but 
> unsuspecting users that “just run” will experience 1/3 overhead across the 
> board for shared memory communication according to my measurements.
> 
> 
> Side note: from my understanding of the smcuda component, performance should 
> be identical to the regular sm component (as long as no GPU
> operation are required). This is not the case, there is some performance 
> penalty with smcuda compared to sm.
> 
> Aurelien
> 
> --
> Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/05/17435.php



[OMPI devel] smcuda higher exclusivity than anything else?

2015-05-20 Thread Aurélien Bouteiller
I was making basic performance measurements on our machine after installing 
1.8.5, the performance were looking bad. It turns out that the smcuda btl has a 
higher exclusivity than both vader and sm, even on machines with no nvidia 
adapters. Is there a strong reason why the default exclusivity is set so high ? 
Of course it can be easily fixed with a couple of mca options, but unsuspecting 
users that “just run” will experience 1/3 overhead across the board for shared 
memory communication according to my measurements.


Side note: from my understanding of the smcuda component, performance should be 
identical to the regular sm component (as long as no GPU
operation are required). This is not the case, there is some performance 
penalty with smcuda compared to sm.

Aurelien

--
Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/



signature.asc
Description: Message signed with OpenPGP using GPGMail