Re: [OMPI users] difference between single and double precision

2010-12-20 Thread Mathieu Gontier

Hi,

I am now ok with the env. var. Pretty simple to set and get into the 
code to pack the messages.
About tests, it is so dependent on the cluster, OpenMPI itself and the 
model, this way is not an industrial way of tuning the computation. But 
the env. var. is a good workaround.


Thanks again to all of you for the help.
Best regards,
Mathieu.

On 12/16/2010 06:21 PM, Eugene Loh wrote:

Jeff Squyres wrote:

On Dec 16, 2010, at 5:14 AM, Mathieu Gontier wrote:
   

We have lead some tests and the option btl_sm_eager_limit has a positive 
consequence on the performance. Eugene, thank you for your links.
 

Good!
Just be aware of the tradeoff you're making: space for time.
   

Now, to offer a good support to our users, we would like to get the value of 
this parameters at the runtime. I am aware I can have the value running 
ompi_info like following:
ompi_info --param btl all | grep btl_sm_eager_limit

but can I get the value during the computation when I run mpirun -np 12 --mca 
btl_sm_eager_limit 8192 my_binary? This value could be compared with the buffer 
size into my code and some warning put into the output.
 

We don't currently have a user-exposed method of retrieving MCA parameter 
values.  As you noted in your 2nd email, if the value was set by setting an 
environment variable, then you can just getenv() it.  But if the value was set 
some other way (e.g., via a file), it won't necessarily be loaded in the 
environment.
   
If you are desperate to get this value, I suppose you could run 
empirical tests within your application.  This would be a little ugly, 
but could work well enough if you are desperate enough.



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] difference between single and double precision

2010-12-16 Thread Eugene Loh




Jeff Squyres wrote:

  On Dec 16, 2010, at 5:14 AM, Mathieu Gontier wrote:
  
  
We have lead some tests and the option btl_sm_eager_limit has a positive consequence on the performance. Eugene, thank you for your links.

  
  Good!
Just be aware of the tradeoff you're making: space for time.
  
  
Now, to offer a good support to our users, we would like to get the value of this parameters at the runtime. I am aware I can have the value running ompi_info like following:
ompi_info --param btl all | grep btl_sm_eager_limit

but can I get the value during the computation when I run mpirun -np 12 --mca btl_sm_eager_limit 8192 my_binary? This value could be compared with the buffer size into my code and some warning put into the output.

  
  We don't currently have a user-exposed method of retrieving MCA parameter values.  As you noted in your 2nd email, if the value was set by setting an environment variable, then you can just getenv() it.  But if the value was set some other way (e.g., via a file), it won't necessarily be loaded in the environment.
  

If you are desperate to get this value, I suppose you could run
empirical tests within your application.  This would be a little ugly,
but could work well enough if you are desperate enough.




Re: [OMPI users] difference between single and double precision

2010-12-16 Thread Jeff Squyres
On Dec 16, 2010, at 5:14 AM, Mathieu Gontier wrote:

> We have lead some tests and the option btl_sm_eager_limit has a positive 
> consequence on the performance. Eugene, thank you for your links.

Good!

Just be aware of the tradeoff you're making: space for time.

> Now, to offer a good support to our users, we would like to get the value of 
> this parameters at the runtime. I am aware I can have the value running 
> ompi_info like following:
> ompi_info --param btl all | grep btl_sm_eager_limit
> 
> but can I get the value during the computation when I run mpirun -np 12 --mca 
> btl_sm_eager_limit 8192 my_binary? This value could be compared with the 
> buffer size into my code and some warning put into the output.

We don't currently have a user-exposed method of retrieving MCA parameter 
values.  As you noted in your 2nd email, if the value was set by setting an 
environment variable, then you can just getenv() it.  But if the value was set 
some other way (e.g., via a file), it won't necessarily be loaded in the 
environment.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] difference between single and double precision

2010-12-16 Thread Mathieu Gontier

Does the env. var. works to overload it:
export OMPI_MCA_btl_sm_eager_limit=40960

In that case, I can deal with it.

On 12/16/2010 11:14 AM, Mathieu Gontier wrote:

Hi all,

We have lead some tests and the option btl_sm_eager_limit has a 
positive consequence on the performance. Eugene, thank you for your 
links.


Now, to offer a good support to our users, we would like to get the 
value of this parameters at the runtime. I am aware I can have the 
value running ompi_info like following:

ompi_info --param btl all | grep btl_sm_eager_limit

but can I get the value during the computation when I run mpirun -np 
12 --mca btl_sm_eager_limit 8192 my_binary? This value could be 
compared with the buffer size into my code and some warning put into 
the output.


Any idea?

On 12/06/2010 04:31 PM, Eugene Loh wrote:

Mathieu Gontier wrote:

Nevertheless, one can observed some differences between MPICH and 
OpenMPI from 25% to 100% depending on the options we are using into 
our software. Tests are lead on a single SGI node on 6 or 12 
processes, and thus, I am focused on the sm option.


Is it possible to narrow our focus here a little?  E.g., are there 
particular MPI calls that are much more expensive with OMPI than 
MPICH?  Is the performance difference observable with simple 
ping-pong tests?



So, I have two questions:
1/ does the option--mca mpool_sm_max_size= can change something 
(I am wondering if the value is not too small and, as consequence, a 
set of small messages is sent instead of a big one)


There was recent related discussion on this mail list.
http://www.open-mpi.org/community/lists/users/2010/11/14910.php

Check the OMPI FAQ for more info.  E.g.,
http://www.open-mpi.org/faq/?category=sm

This particular parameter disappeared with OMPI 1.3.2.
http://www.open-mpi.org/faq/?category=sm#how-much-use

To move messages as bigger chunks, try btl_sm_eager_limit and 
btl_sm_max_send_size:

http://www.open-mpi.org/faq/?category=sm#more-sm

2/ is there a difference between --mca btl tcp,sm,self and --mca btl 
self,sm,tcp (or not put any explicit mca option)?


I think tcp,sm,self and self,sm,tcp will be the same.  Without an 
explicit MCA btl choice, it depends on what BTL choices are available.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] difference between single and double precision

2010-12-16 Thread Mathieu Gontier

Hi all,

We have lead some tests and the option btl_sm_eager_limit has a positive 
consequence on the performance. Eugene, thank you for your links.


Now, to offer a good support to our users, we would like to get the 
value of this parameters at the runtime. I am aware I can have the value 
running ompi_info like following:

ompi_info --param btl all | grep btl_sm_eager_limit

but can I get the value during the computation when I run mpirun -np 12 
--mca btl_sm_eager_limit 8192 my_binary? This value could be compared 
with the buffer size into my code and some warning put into the output.


Any idea?

On 12/06/2010 04:31 PM, Eugene Loh wrote:

Mathieu Gontier wrote:

Nevertheless, one can observed some differences between MPICH and 
OpenMPI from 25% to 100% depending on the options we are using into 
our software. Tests are lead on a single SGI node on 6 or 12 
processes, and thus, I am focused on the sm option.


Is it possible to narrow our focus here a little?  E.g., are there 
particular MPI calls that are much more expensive with OMPI than 
MPICH?  Is the performance difference observable with simple ping-pong 
tests?



So, I have two questions:
1/ does the option--mca mpool_sm_max_size= can change something 
(I am wondering if the value is not too small and, as consequence, a 
set of small messages is sent instead of a big one)


There was recent related discussion on this mail list.
http://www.open-mpi.org/community/lists/users/2010/11/14910.php

Check the OMPI FAQ for more info.  E.g.,
http://www.open-mpi.org/faq/?category=sm

This particular parameter disappeared with OMPI 1.3.2.
http://www.open-mpi.org/faq/?category=sm#how-much-use

To move messages as bigger chunks, try btl_sm_eager_limit and 
btl_sm_max_send_size:

http://www.open-mpi.org/faq/?category=sm#more-sm

2/ is there a difference between --mca btl tcp,sm,self and --mca btl 
self,sm,tcp (or not put any explicit mca option)?


I think tcp,sm,self and self,sm,tcp will be the same.  Without an 
explicit MCA btl choice, it depends on what BTL choices are available.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] difference between single and double precision

2010-12-06 Thread Peter Kjellström
On Monday 06 December 2010 15:03:13 Mathieu Gontier wrote:
> Hi,
> 
> A small update.
> My colleague made a mistake and there is no arithmetic performance
> issue. Sorry for bothering you.
> 
> Nevertheless, one can observed some differences between MPICH and
> OpenMPI from 25% to 100% depending on the options we are using into our
> software. Tests are lead on a single SGI node on 6 or 12 processes, and
> thus, I am focused on the sm option.

A few previous threads on sm performance have been related to what /tmp is. 
OpenMPI relies on (or at least used to rely on) this being backed by page 
cache (tmpfs, a local ext3 or similar). I'm not sure what the behaviour is in 
the latest version but then again you didn't say which version you've tried.

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] difference between single and double precision

2010-12-06 Thread Mathieu Gontier

Hi,

A small update.
My colleague made a mistake and there is no arithmetic performance 
issue. Sorry for bothering you.


Nevertheless, one can observed some differences between MPICH and 
OpenMPI from 25% to 100% depending on the options we are using into our 
software. Tests are lead on a single SGI node on 6 or 12 processes, and 
thus, I am focused on the sm option.


So, I have two questions:
1/ does the option--mca mpool_sm_max_size= can change something (I 
am wondering if the value is not too small and, as consequence, a set of 
small messages is sent instead of a big one)
2/ is there a difference between --mca btl tcp,sm,self and --mca btl 
self,sm,tcp (or not put any explicit mca option)?


Best regards,
Mathieu.

On 12/05/2010 06:10 PM, Eugene Loh wrote:

Mathieu Gontier wrote:


  Dear OpenMPI users

I am dealing with an arithmetic problem. In fact, I have two variants 
of my code: one in single precision, one in double precision. When I 
compare the two executable built with MPICH, one can observed an 
expected difference of performance: 115.7-sec in single precision 
against 178.68-sec in double precision (+54%).


The thing is, when I use OpenMPI, the difference is really bigger: 
238.5-sec in single precision against 403.19-sec double precision 
(+69%).


Our experiences have already shown OpenMPI is less efficient than 
MPICH on Ethernet with a small number of processes. This explain the 
differences between the first set of results with MPICH and the 
second set with OpenMPI. (But if someone have more information about 
that or even a solution, I am of course interested.)
But, using OpenMPI increases the difference between the two 
arithmetic. Is it the accentuation of the OpenMPI+Ethernet loss of 
performance, is it another issue into OpenMPI or is there any option 
a can use?


It is also unusual that the performance difference between MPICH and 
OMPI is so large.  You say that OMPI is slower than MPICH even at 
small process counts.  Can you confirm that this is because MPI calls 
are slower?  Some of the biggest performance differences I've seen 
between MPI implementations had nothing to do with the performance of 
MPI calls at all.  It had to do with process binding or other factors 
that impacted the computational (non-MPI) performance of the code.  
The performance of MPI calls was basically irrelevant.


In this particular case, I'm not convinced since neither OMPI nor 
MPICH binds processes by default.


Still, can you do some basic performance profiling to confirm what 
aspect of your application is consuming so much time?  Is it a 
particular MPI call?  If your application is spending almost all of 
its time in MPI calls, do you have some way of judging whether the 
faster performance is acceptable?  That is, is 238 secs acceptable and 
403 secs slow?  Or, are both timings unacceptable -- e.g., the code 
"should" be running in about 30 secs.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] difference between single and double precision

2010-12-05 Thread Eugene Loh

Mathieu Gontier wrote:


  Dear OpenMPI users

I am dealing with an arithmetic problem. In fact, I have two variants 
of my code: one in single precision, one in double precision. When I 
compare the two executable built with MPICH, one can observed an 
expected difference of performance: 115.7-sec in single precision 
against 178.68-sec in double precision (+54%).


The thing is, when I use OpenMPI, the difference is really bigger: 
238.5-sec in single precision against 403.19-sec double precision (+69%).


Our experiences have already shown OpenMPI is less efficient than 
MPICH on Ethernet with a small number of processes. This explain the 
differences between the first set of results with MPICH and the second 
set with OpenMPI. (But if someone have more information about that or 
even a solution, I am of course interested.)
But, using OpenMPI increases the difference between the two 
arithmetic. Is it the accentuation of the OpenMPI+Ethernet loss of 
performance, is it another issue into OpenMPI or is there any option a 
can use?


It is also unusual that the performance difference between MPICH and 
OMPI is so large.  You say that OMPI is slower than MPICH even at small 
process counts.  Can you confirm that this is because MPI calls are 
slower?  Some of the biggest performance differences I've seen between 
MPI implementations had nothing to do with the performance of MPI calls 
at all.  It had to do with process binding or other factors that 
impacted the computational (non-MPI) performance of the code.  The 
performance of MPI calls was basically irrelevant.


In this particular case, I'm not convinced since neither OMPI nor MPICH 
binds processes by default.


Still, can you do some basic performance profiling to confirm what 
aspect of your application is consuming so much time?  Is it a 
particular MPI call?  If your application is spending almost all of its 
time in MPI calls, do you have some way of judging whether the faster 
performance is acceptable?  That is, is 238 secs acceptable and 403 secs 
slow?  Or, are both timings unacceptable -- e.g., the code "should" be 
running in about 30 secs.


Re: [OMPI users] difference between single and double precision

2010-12-03 Thread Jonathan Dursi
On 2010-12-03, at 8:46AM, Jeff Squyres (jsquyres) wrote:

> Another option to try is to install the openmx drivers on your system and run 
> open MPI with mx support. This should be much better perf than tcp. 


We've tried this on a big GigE cluster (in fact, Brice Goglin was playing with 
it on our system) -- it's not really an answer.  It didn't work past a small 
number of nodes, and the performance gains were fairly small.   IntelMPIs 
Direct Ethernet Transport did work on larger nodecounts, but again it was a 
pretty modest effect (few percent decrease in pingpong latencies, no 
discernable bandwidth improvements).  

- Jonathan
-- 
Jonathan Dursi   SciNet, Compute/Calcul Canada









Re: [OMPI users] difference between single and double precision

2010-12-03 Thread Jeff Squyres (jsquyres)
Yes, we have never really optimized open MPI for tcp. That is changing soon, 
hopefully. 

Regardless, what is the communication pattern of your app?  Are you sending a 
lot of data frequently?  Even the MPICH perf difference is surprising - it 
suggests a lot of data xfer, potentially with small messages...?

Another option to try is to install the openmx drivers on your system and run 
open MPI with mx support. This should be much better perf than tcp. 

Sent from my PDA. No type good. 

On Dec 3, 2010, at 3:11 AM, "Mathieu Gontier"  wrote:

> 
> Dear OpenMPI users
> 
> I am dealing with an arithmetic problem. In fact, I have two variants of my 
> code: one in single precision, one in double precision. When I compare the 
> two executable built with MPICH, one can observed an expected difference of 
> performance: 115.7-sec in single precision against 178.68-sec in double 
> precision (+54%).
> 
> The thing is, when I use OpenMPI, the difference is really bigger: 238.5-sec 
> in single precision against 403.19-sec double precision (+69%).
> 
> Our experiences have already shown OpenMPI is less efficient than MPICH on 
> Ethernet with a small number of processes. This explain the differences 
> between the first set of results with MPICH and the second set with OpenMPI. 
> (But if someone have more information about that or even a solution, I am of 
> course interested.)
> But, using OpenMPI increases the difference between the two arithmetic. Is it 
> the accentuation of the OpenMPI+Ethernet loss of performance, is it another 
> issue into OpenMPI or is there any option a can use?
> 
> Thank you for your help.
> Mathieu.
> 
> -- 
> Mathieu Gontier
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users