Re: [OMPI users] "Hostfile" on Multicore Node?
So, it appears that for a machine of this type (dual quad core cpu's), this approach would be correct for my tests... [jpummill@n1 bin]$ more my-hosts n1 slots=8 max_slots=8 and subsequently, launch two jobs in this configuration... /home/jpummill/openmpi-1.2.2/bin/mpirun --hostfile my-hosts -np 4 --byslot ./cg.C.4 It appears that this does avoid oversubscribing any particular core as I am not exceeding my core count by running just the two jobs requiring 4 cores each. Thanks, Jeff Pummill George Bosilca wrote: The cleaner way to define such an environment is by using the max-slots and/or slots options in the hostfile. Here is a FAQ entry about how Open MPI deal with these options (http://www.open-mpi.org/faq/?category=running#mpirun-scheduling). george. On Oct 26, 2007, at 10:52 AM, Jeff Pummill wrote: I am doing some testing on a variety of 8-core nodes in which I just want to execute a couple of executables and have them distributed to the available cores without overlapping. Typically, this would be done with a parameter like -machinefile machines, but I have no idea what names to put into the machines file as this is a single node with two quad core cpu's. As I am launching the jobs sans scheduler, I need to specify what cores to run on I would think to keep from overscheduling some cores while others receive nothing to do at all. Simple suggestions? Maybe Open MPI takes care of this detail for me? Thanks! Jeff Pummill ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] "Hostfile" on Multicore Node?
The cleaner way to define such an environment is by using the max- slots and/or slots options in the hostfile. Here is a FAQ entry about how Open MPI deal with these options (http://www.open-mpi.org/faq/? category=running#mpirun-scheduling). george. On Oct 26, 2007, at 10:52 AM, Jeff Pummill wrote: I am doing some testing on a variety of 8-core nodes in which I just want to execute a couple of executables and have them distributed to the available cores without overlapping. Typically, this would be done with a parameter like -machinefile machines, but I have no idea what names to put into the machines file as this is a single node with two quad core cpu's. As I am launching the jobs sans scheduler, I need to specify what cores to run on I would think to keep from overscheduling some cores while others receive nothing to do at all. Simple suggestions? Maybe Open MPI takes care of this detail for me? Thanks! Jeff Pummill ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] "Hostfile" on Multicore Node?
Jeff, A simple suggestion: put eight (or whatever the number of cores is) identical entries for each node, such as compute-0-0 compute-0-0 compute-0-0 compute-0-0 compute-0-0 compute-0-0 compute-0-0 compute-0-0 compute-0-1 compute-0-1 compute-0-1 compute-0-1 ... It seems to work for my dual-core nodes. -Tudor On Fri, 2007-10-26 at 09:52 -0500, Jeff Pummill wrote: > I am doing some testing on a variety of 8-core nodes in which I just > want to execute a couple of executables and have them distributed to > the available cores without overlapping. Typically, this would be done > with a parameter like -machinefile machines, but I have no idea what > names to put into the machines file as this is a single node with two > quad core cpu's. As I am launching the jobs sans scheduler, I need to > specify what cores to run on I would think to keep from overscheduling > some cores while others receive nothing to do at all. > > Simple suggestions? Maybe Open MPI takes care of this detail for me? > > Thanks! > > Jeff Pummill > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Tudor Buican
[OMPI users] "Hostfile" on Multicore Node?
I am doing some testing on a variety of 8-core nodes in which I just want to execute a couple of executables and have them distributed to the available cores without overlapping. Typically, this would be done with a parameter like /-machinefile machines/, but I have no idea what names to put into the /machines/ file as this is a single node with two quad core cpu's. As I am launching the jobs sans scheduler, I need to specify what cores to run on I would think to keep from overscheduling some cores while others receive nothing to do at all. Simple suggestions? Maybe Open MPI takes care of this detail for me? Thanks! Jeff Pummill
[OMPI users] MPI_Send issues with openib btl
hi, We are facing some problem when calling MPI_Send over IB. The problem looks similar to ticket https://svn.open-mpi.org/trac/ompi/ticket/232, but this time its for IB Interface. When forcefully running the program using --mca btl tcp,self its running fine. On Ib, its giving error messages like local protocol error, flush error, invalid request error, local length error kind of messages.Any help would be appreciated.-Neeraj
Re: [OMPI users] Process 0 with different time executing the same code
This is not an MPI problem. Without looking at your code in detail, I'm guessing that you're accessing memory without any regard to memory layout and/or caching. Such an access pattern will therefore thrash your L1 and L2 caches and access memory in a truly horrible pattern that guarantees abysmal performance. Google around for cache effects or check out an operating systems textbook; there's lots of material around about this kind of effect. Good luck. On Oct 26, 2007, at 5:10 AM, 42af...@niit.edu.pk wrote: Thanks, The array bounds are the same on all the nodes and also the compute nodes are identical i.e. SunFire V890 nodes. And I have also changed the root process to be on different nodes, but the problem remains the same. I still dont understand the problem very well and my progress is in stand still situation. regards aftab hussain Hi, Please ensure if following things are correct 1) The array bounds are equal. Means "my_x" and "size_y" has the same value on all nodes. 2) Nodes are homogenous. To check that, you could decide root to be some different node and run the program -Neeraj On Fri, October 26, 2007 10:13 am, 42af...@niit.edu.pk wrote: Thanks for your reply, I used MPI_Wtime for my application but even then process 0 took longer time executing the mentioned code segment. I might be worng, but what I see is process 0 takes more time to access the array elements than other processes. Now I dont see what to do because the mentioned code segment is creating a bottleneck for the timing of my application. Can any one suggest somthing in this regard. I will be very thankful regards Aftab Hussain On Thu, October 25, 2007 9:38 pm, jody wrote: HI I'm not sure if that is a problem, but in MPI applications you shoud use MPI_WTime() for time- measurements Jody On 10/25/07, 42af...@niit.edu.pk <42af...@niit.edu.pk> wrote: Hi all, I am a research assistant (RA) at NUST Pakistan in High Performance Scientific Computing Lab. I am working on the parallel implementation of the Finitie Difference Time Domain (FDTD) method using MPI. I am using the OpenMPI environment on a cluster of 4 SunFire v890 cluster connected through Myrinet. I am having problem that when I run my code with let say 4 processes. Process 0 takes about 3 times more time than other three processes, executing a for loop which is the main cause of load imbalance in my code. I am writing the code that is causing the problem. The code is run by all the processes simultaneously and independently and I have timed it independent of segments of code. start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n < size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-1,n)) - (Hx(m,n) - Hx(m,n-1))); } } stop = gethrtime(); time = (stop-start); In my implementation I used 1-D array to realize 2-D arrays.I have used the following macros for accesing the array elements. #define Hx(I,J) hx[(I)*(size_y) + (J)] #define Hy(I,J) hy[(I)*(size_y) + (J)] #define Ez(I,J) ez[(I)*(size_y) + (J)] Can any one tell me what am I doing wrong here, or macros are creating the problems or it can be related to any OS issue. I will be looking forward for help because this problem has stopped my progress for the last two weeks regards aftab hussain RA High Performance Scientific Computing Lab NUST Institue of Information Technology National University of Sciences and Technology Pakistan -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Process 0 with different time executing the same code
Thanks, The array bounds are the same on all the nodes and also the compute nodes are identical i.e. SunFire V890 nodes. And I have also changed the root process to be on different nodes, but the problem remains the same. I still dont understand the problem very well and my progress is in stand still situation. regards aftab hussain Hi, Please ensure if following things are correct 1) The array bounds are equal. Means "my_x" and "size_y" has the same value on all nodes. 2) Nodes are homogenous. To check that, you could decide root to be some different node and run the program -Neeraj On Fri, October 26, 2007 10:13 am, 42af...@niit.edu.pk wrote: > Thanks for your reply, > > > I used MPI_Wtime for my application but even then process 0 took longer > time executing the mentioned code segment. I might be worng, but what I see > is process 0 takes more time to access the array elements than other > processes. Now I dont see what to do because the mentioned code segment is > creating a bottleneck for the timing of my application. > > Can any one suggest somthing in this regard. I will be very thankful > > > regards > > Aftab Hussain > > > > On Thu, October 25, 2007 9:38 pm, jody wrote: > >> HI >> I'm not sure if that is a problem, >> but in MPI applications you shoud use MPI_WTime() for time-measurements >> >> Jody >> >> >> >> On 10/25/07, 42af...@niit.edu.pk <42af...@niit.edu.pk> wrote: >> >> >>> Hi all, >>> I am a research assistant (RA) at NUST Pakistan in High Performance >>> Scientific Computing Lab. I am working on the parallel >>> implementation of the Finitie Difference Time Domain (FDTD) method >>> using MPI. I am using the OpenMPI environment on a cluster of 4 >>> SunFire v890 cluster connected through Myrinet. I am having problem >>> that when I run my code with let say 4 processes. Process 0 takes >>> about 3 times more time than other three processes, executing a for >>> loop which is the main cause of load imbalance in my code. I am >>> writing the code that is causing the problem. The code is run by all >>> the processes simultaneously and independently and I have timed it >>> independent of segments of code. >>> >>> start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n < >>> size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-1,n)) - >>> (Hx(m,n) - Hx(m,n-1))); >>> } >>> } >>> stop = gethrtime(); time = (stop-start); >>> >>> In my implementation I used 1-D array to realize 2-D arrays.I have >>> used the following macros for accesing the array elements. >>> >>> #define Hx(I,J) hx[(I)*(size_y) + (J)] >>> #define Hy(I,J) hy[(I)*(size_y) + (J)] >>> #define Ez(I,J) ez[(I)*(size_y) + (J)] >>> >>> >>> >>> Can any one tell me what am I doing wrong here, or macros are >>> creating the problems or it can be related to any OS issue. I will be >>> looking forward for help because this problem has stopped my progress >>> for the last two weeks >>> >>> regards aftab hussain >>> >>> RA High Performance Scientific Computing Lab >>> >>> >>> >>> NUST Institue of Information Technology >>> >>> >>> >>> National University of Sciences and Technology Pakistan >>> >>> >>> >>> >>> >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is believed to be clean. >>> >>> ___ >>> users mailing list us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >> ___ >> users mailing list us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is believed to be clean. >> >> > > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is believed to be clean. > > ___ > users mailing list us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
[OMPI users] Re :Re: Process 0 with different time executing the same code
Hi, Please ensure if following things are correct1) The array bounds are equal. Means \"my_x\" and \"size_y\" has the same value on all nodes.2) Nodes are homogenous. To check that, you could decide root to be some different node and run the program-NeerajOn Fri, 26 Oct 2007 10:13:15 +0500 (PKT) Open MPI Users wrote Thanks for your reply, I used MPI_Wtime for my application but even then process 0 took longer time executing the mentioned code segment. I might be worng, but what I see is process 0 takes more time to access the array elements than other processes. Now I dont see what to do because the mentioned code segment is creating a bottleneck for the timing of my application.Can any one suggest somthing in this regard. I will be very thankfulregardsAftab Hussain On Thu, October 25, 2007 9:38 pm, jody wrote: > HI > I\'m not sure if that is a problem, > but in MPI applications you shoud use MPI_WTime() for time-measurements > > Jody > > > On 10/25/07, 42af...@niit.edu.pk wrote: > >> Hi all, >> I am a research assistant (RA) at NUST Pakistan in High Performance >> Scientific Computing Lab. I am working on the parallel >> implementation of the Finitie Difference Time Domain (FDTD) method using >> MPI. I am using the OpenMPI environment on a cluster of 4 >> SunFire v890 cluster connected through Myrinet. I am having problem >> that when I run my code with let say 4 processes. Process 0 takes about 3 >> times more time than other three processes, executing a for loop which >> is the main cause of load imbalance in my code. I am writing the code >> that is causing the problem. The code is run by all the processes >> simultaneously and independently and I have timed it independent of >> segments of code. >> >> start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n > size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m- 1,n)) - >> (Hx(m,n) - Hx(m,n-1))); >> } >> } >> stop = gethrtime(); time = (stop-start); >> >> In my implementation I used 1-D array to realize 2-D arrays.I have used >> the following macros for accesing the array elements. >> >> #define Hx(I,J) hx[(I)*(size_y) + (J)] >> #define Hy(I,J) hy[(I)*(size_y) + (J)] >> #define Ez(I,J) ez[(I)*(size_y) + (J)] >> >> >> Can any one tell me what am I doing wrong here, or macros are creating >> the problems or it can be related to any OS issue. I will be looking >> forward for help because this problem has stopped my progress for the >> last two weeks >> >> regards aftab hussain >> >> RA High Performance Scientific Computing Lab >> >> >> NUST Institue of Information Technology >> >> >> National University of Sciences and Technology Pakistan >> >> >> >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is believed to be clean. >> >> ___ >> users mailing list us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > ___ > users mailing list us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is believed to be clean. > >-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Process 0 with different time executing the same code
Thanks for your reply, I used MPI_Wtime for my application but even then process 0 took longer time executing the mentioned code segment. I might be worng, but what I see is process 0 takes more time to access the array elements than other processes. Now I dont see what to do because the mentioned code segment is creating a bottleneck for the timing of my application. Can any one suggest somthing in this regard. I will be very thankful regards Aftab Hussain On Thu, October 25, 2007 9:38 pm, jody wrote: > HI > I'm not sure if that is a problem, > but in MPI applications you shoud use MPI_WTime() for time-measurements > > Jody > > > On 10/25/07, 42af...@niit.edu.pk <42af...@niit.edu.pk> wrote: > >> Hi all, >> I am a research assistant (RA) at NUST Pakistan in High Performance >> Scientific Computing Lab. I am working on the parallel >> implementation of the Finitie Difference Time Domain (FDTD) method using >> MPI. I am using the OpenMPI environment on a cluster of 4 >> SunFire v890 cluster connected through Myrinet. I am having problem >> that when I run my code with let say 4 processes. Process 0 takes about 3 >> times more time than other three processes, executing a for loop which >> is the main cause of load imbalance in my code. I am writing the code >> that is causing the problem. The code is run by all the processes >> simultaneously and independently and I have timed it independent of >> segments of code. >> >> start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n < >> size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-1,n)) - >> (Hx(m,n) - Hx(m,n-1))); >> } >> } >> stop = gethrtime(); time = (stop-start); >> >> In my implementation I used 1-D array to realize 2-D arrays.I have used >> the following macros for accesing the array elements. >> >> #define Hx(I,J) hx[(I)*(size_y) + (J)] >> #define Hy(I,J) hy[(I)*(size_y) + (J)] >> #define Ez(I,J) ez[(I)*(size_y) + (J)] >> >> >> Can any one tell me what am I doing wrong here, or macros are creating >> the problems or it can be related to any OS issue. I will be looking >> forward for help because this problem has stopped my progress for the >> last two weeks >> >> regards aftab hussain >> >> RA High Performance Scientific Computing Lab >> >> >> NUST Institue of Information Technology >> >> >> National University of Sciences and Technology Pakistan >> >> >> >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is believed to be clean. >> >> ___ >> users mailing list us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > ___ > users mailing list us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is believed to be clean. > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.