Re: [OMPI users] Bad performance when scattering big size of data?
Storm Zhang wrote: Here is what I meant: the results of 500 procs in fact shows it with 272-304(<500) real cores, the program's running time is good, which is almost five times 100 procs' time. So it can be handled very well. Therefore I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job. But with 600 procs, the running time is more than double of that of 500 procs. I don't know why. This is my problem. BTW, how to use -bind-to-core? I added it as mpirun's options. It always gives me error " the executable 'bind-to-core' can't be found. Isn't it like: mpirun --mca btl_tcp_if_include eth0 -np 600 -bind-to-core scatttest Thanks for sending the mpirun run and error message. That helps. It's not recognizing the --bind-to-core option. (Single hyphen, as you had, should also be okay.) Skimming through the e-mail, it looks like you are using OMPI 1.3.2 and 1.4.2. Did you try --bind-to-core with both? If I remember my version numbers, --bind-to-core will not be recognized with 1.3.2, but should be with 1.4.2. Could it be that you only tried 1.3.2? Another option is to try "mpirun --help". Make sure that it reports --bind-to-core.
Re: [OMPI users] Bad performance when scattering big size of data?
Here is what I meant: the results of 500 procs in fact shows it with 272-304(<500) real cores, the program's running time is good, which is almost five times 100 procs' time. So it can be handled very well. Therefore I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job. But with 600 procs, the running time is more than double of that of 500 procs. I don't know why. This is my problem. BTW, how to use -bind-to-core? I added it as mpirun's options. It always gives me error " the executable 'bind-to-core' can't be found. Isn't it like: mpirun --mca btl_tcp_if_include eth0 -np 600 -bind-to-core scatttest Thank you very much. Linbao On Mon, Oct 4, 2010 at 4:42 PM, Ralph Castain wrote: > > On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote: > > Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024 > available for parallel tasks) which only assign 34-38 compute nodes which > only has 272-304 real cores for 500 procs running. The running time is > consistent with 100 procs and not a lot fluctuations due to the number of > machines' changing. > > > Afraid I don't understand your statement. If you have 500 procs running on > < 500 cores, then the performance relative to a high-performance job (#procs > <= #cores) will be worse. We deliberately dial down the performance when > oversubscribed to ensure that procs "play nice" in situations where the node > is oversubscribed. > > So I guess it is not related to hyperthreading. Correct me if I'm wrong. > > > Has nothing to do with hyperthreading - OMPI has no knowledge of > hyperthreads at this time. > > > BTW, how to bind the proc to the core? I tried --bind-to-core or > -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI? > > > Those should work. You might try --report-bindings to see what OMPI thought > it did. > > > Linbao > > > On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain wrote: > >> Some of what you are seeing is the natural result of context >> switchingsome thoughts regarding the results: >> >> 1. You didn't bind your procs to cores when running with #procs < #cores, >> so you're performance in those scenarios will also be less than max. >> >> 2. Once the number of procs exceeds the number of cores, you guarantee a >> lot of context switching, so performance will definitely take a hit. >> >> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become >> hyperthread aware. For now, we don't see them as separate processing units. >> So as far as OMPI is concerned, you only have 512 computing units to work >> with, not 1024. >> >> Bottom line is that you are running oversubscribed, so OMPI turns down >> your performance so that the machine doesn't hemorrhage as it context >> switches. >> >> >> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote: >> >> In my experience hyperthreading can't really deliver two cores worth of >> processing simultaneously for processes expecting sole use of a core. Since >> you really have 512 cores I'm not surprised that you see a performance hit >> when requesting > 512 compute units. We should really get input from a >> hyperthreading expert, preferably form intel. >> >> Doug Reeder >> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: >> >> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. >> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to >> scatter an array from the master node to the compute nodes using mpiCC and >> mpirun using C++. >> >> Here is my test: >> >> The array size is 18KB * Number of compute nodes and is scattered to the >> compute nodes 5000 times repeatly. >> >> The average running time(seconds): >> >> 100 nodes: 170, >> 400 nodes: 690, >> 500 nodes: 855, >> 600 nodes: 2550, >> 700 nodes: 2720, >> 800 nodes: 2900, >> >> There is a big jump of running time from 500 nodes to 600 nodes. Don't >> know what's the problem. >> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster >> for all the tests in 1.4.2 but the jump still exists. >> Tried using either Bcast function or simply Send/Recv which give very >> close results. >> Tried both in running it directly or using SGE and got the same results. >> >> The code and ompi_info are attached to this email. The direct running >> command is : >> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile >> ../machines -np 600 scatttest >> >> The ifconfig of head node for eth0 is: >> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 >> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 >> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:1096060373 errors:0 dropped:2512622 overruns:0 >> frame:0 >> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 >> GiB) >> In
[OMPI users] location of ompi libraries
Hi, In Open MPI 1.4.1, the directory lib/openmpi contains about 130 entries, including such things as mca_btl_openib.so. In my build of Open MPI 1.4.2, lib/openmpi contains exactly three items: libompi_dbg_msgq.a libompi_dbg_msgq.la libompi_dbg_msgq.so I have searched my 1.4.2 installation for mca_btl_openib.so, to no avail. And yet, 1.4.2 seems to work "fine". Is my installation broken, or is the organization significantly different between the two versions? A quick scan of the release notes didn't help. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] Bad performance when scattering big size of data?
On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote: > Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024 > available for parallel tasks) which only assign 34-38 compute nodes which > only has 272-304 real cores for 500 procs running. The running time is > consistent with 100 procs and not a lot fluctuations due to the number of > machines' changing. Afraid I don't understand your statement. If you have 500 procs running on < 500 cores, then the performance relative to a high-performance job (#procs <= #cores) will be worse. We deliberately dial down the performance when oversubscribed to ensure that procs "play nice" in situations where the node is oversubscribed. > So I guess it is not related to hyperthreading. Correct me if I'm wrong. Has nothing to do with hyperthreading - OMPI has no knowledge of hyperthreads at this time. > > BTW, how to bind the proc to the core? I tried --bind-to-core or > -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI? Those should work. You might try --report-bindings to see what OMPI thought it did. > > Linbao > > > On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain wrote: > Some of what you are seeing is the natural result of context > switchingsome thoughts regarding the results: > > 1. You didn't bind your procs to cores when running with #procs < #cores, so > you're performance in those scenarios will also be less than max. > > 2. Once the number of procs exceeds the number of cores, you guarantee a lot > of context switching, so performance will definitely take a hit. > > 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become > hyperthread aware. For now, we don't see them as separate processing units. > So as far as OMPI is concerned, you only have 512 computing units to work > with, not 1024. > > Bottom line is that you are running oversubscribed, so OMPI turns down your > performance so that the machine doesn't hemorrhage as it context switches. > > > On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote: > >> In my experience hyperthreading can't really deliver two cores worth of >> processing simultaneously for processes expecting sole use of a core. Since >> you really have 512 cores I'm not surprised that you see a performance hit >> when requesting > 512 compute units. We should really get input from a >> hyperthreading expert, preferably form intel. >> >> Doug Reeder >> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: >> >>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. >>> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to >>> scatter an array from the master node to the compute nodes using mpiCC and >>> mpirun using C++. >>> >>> Here is my test: >>> >>> The array size is 18KB * Number of compute nodes and is scattered to the >>> compute nodes 5000 times repeatly. >>> >>> The average running time(seconds): >>> >>> 100 nodes: 170, >>> 400 nodes: 690, >>> 500 nodes: 855, >>> 600 nodes: 2550, >>> 700 nodes: 2720, >>> 800 nodes: 2900, >>> >>> There is a big jump of running time from 500 nodes to 600 nodes. Don't know >>> what's the problem. >>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster >>> for all the tests in 1.4.2 but the jump still exists. >>> Tried using either Bcast function or simply Send/Recv which give very close >>> results. >>> Tried both in running it directly or using SGE and got the same results. >>> >>> The code and ompi_info are attached to this email. The direct running >>> command is : >>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile >>> ../machines -np 600 scatttest >>> >>> The ifconfig of head node for eth0 is: >>> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 >>> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 >>> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 >>> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:1000 >>> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 >>> GiB) >>> Interrupt:106 Memory:d600-d6012800 >>> >>> A typical ifconfig of a compute node is: >>> eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC >>> inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 >>> inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:1000 >>> RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 >>> GiB) >>> Interrupt:82 Memory:d
Re: [OMPI users] Bad performance when scattering big size of data?
Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024 available for parallel tasks) which only assign 34-38 compute nodes which only has 272-304 real cores for 500 procs running. The running time is consistent with 100 procs and not a lot fluctuations due to the number of machines' changing. So I guess it is not related to hyperthreading. Correct me if I'm wrong. BTW, how to bind the proc to the core? I tried --bind-to-core or -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI? Linbao On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain wrote: > Some of what you are seeing is the natural result of context > switchingsome thoughts regarding the results: > > 1. You didn't bind your procs to cores when running with #procs < #cores, > so you're performance in those scenarios will also be less than max. > > 2. Once the number of procs exceeds the number of cores, you guarantee a > lot of context switching, so performance will definitely take a hit. > > 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become > hyperthread aware. For now, we don't see them as separate processing units. > So as far as OMPI is concerned, you only have 512 computing units to work > with, not 1024. > > Bottom line is that you are running oversubscribed, so OMPI turns down your > performance so that the machine doesn't hemorrhage as it context switches. > > > On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote: > > In my experience hyperthreading can't really deliver two cores worth of > processing simultaneously for processes expecting sole use of a core. Since > you really have 512 cores I'm not surprised that you see a performance hit > when requesting > 512 compute units. We should really get input from a > hyperthreading expert, preferably form intel. > > Doug Reeder > On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: > > We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. > So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to > scatter an array from the master node to the compute nodes using mpiCC and > mpirun using C++. > > Here is my test: > > The array size is 18KB * Number of compute nodes and is scattered to the > compute nodes 5000 times repeatly. > > The average running time(seconds): > > 100 nodes: 170, > 400 nodes: 690, > 500 nodes: 855, > 600 nodes: 2550, > 700 nodes: 2720, > 800 nodes: 2900, > > There is a big jump of running time from 500 nodes to 600 nodes. Don't > know what's the problem. > Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster > for all the tests in 1.4.2 but the jump still exists. > Tried using either Bcast function or simply Send/Recv which give very close > results. > Tried both in running it directly or using SGE and got the same results. > > The code and ompi_info are attached to this email. The direct running > command is : > /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile > ../machines -np 600 scatttest > > The ifconfig of head node for eth0 is: > eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 > inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 > TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 > GiB) > Interrupt:106 Memory:d600-d6012800 > > A typical ifconfig of a compute node is: > eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC > inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 > TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 > GiB) > Interrupt:82 Memory:d600-d6012800 > > > Does anyone help me out of this? It bothers me a lot. > > Thank you very much. > > Linbao > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] mpi_comm_spawn have problems with group communicators
> "Ralph" == Ralph Castain writes: Ralph> On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote: >>> "Ralph" == Ralph Castain writes: >> Ralph> I'm not sure why the group communicator would make a Ralph> difference - the code area in question knows nothing about Ralph> the mpi aspects of the job. It looks like you are hitting a Ralph> race condition that causes a particular internal recv to Ralph> not exist when we subsequently try to cancel it, which Ralph> generates that error message. How did you configure OMPI? >> >> Thank you for the reply! >> >> Must be some race problem, but I have no control of it, or do >> I? Ralph> Not really. What I don't understand is why your code would Ralph> work fine when using comm_world, but encounter a race Ralph> condition when using comm groups. There shouldn't be any Ralph> timing difference between the two cases. Fixing race condition is sometime easy by puting some variables into the arrays. I just did for one of them but it didn't help. I'll do some more testing in this direction, but I am running out of ideas. When you put ngrp=1 and uncomment the other mpi_comm_spawn line in the program you basically get only one spawn, so no opportunity for race condition. But in my real project I usually work with many spawn calls, however all using mpi_comm_world, but running different programs, etc. And that always works. This time I want to localize mpi_comm_spawns by similar trick that is in the program I sent. So this small test case is a good model of what I would like to have. I studied the MPI-2 standard and I think I got it right, but one never knows... Ralph> I'll have to take a look and see if I can spot something in Ralph> the code... Thanks a lot -- Milan
Re: [OMPI users] Bad performance when scattering big size of data?
Some of what you are seeing is the natural result of context switchingsome thoughts regarding the results: 1. You didn't bind your procs to cores when running with #procs < #cores, so you're performance in those scenarios will also be less than max. 2. Once the number of procs exceeds the number of cores, you guarantee a lot of context switching, so performance will definitely take a hit. 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become hyperthread aware. For now, we don't see them as separate processing units. So as far as OMPI is concerned, you only have 512 computing units to work with, not 1024. Bottom line is that you are running oversubscribed, so OMPI turns down your performance so that the machine doesn't hemorrhage as it context switches. On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote: > In my experience hyperthreading can't really deliver two cores worth of > processing simultaneously for processes expecting sole use of a core. Since > you really have 512 cores I'm not surprised that you see a performance hit > when requesting > 512 compute units. We should really get input from a > hyperthreading expert, preferably form intel. > > Doug Reeder > On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: > >> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So >> we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to >> scatter an array from the master node to the compute nodes using mpiCC and >> mpirun using C++. >> >> Here is my test: >> >> The array size is 18KB * Number of compute nodes and is scattered to the >> compute nodes 5000 times repeatly. >> >> The average running time(seconds): >> >> 100 nodes: 170, >> 400 nodes: 690, >> 500 nodes: 855, >> 600 nodes: 2550, >> 700 nodes: 2720, >> 800 nodes: 2900, >> >> There is a big jump of running time from 500 nodes to 600 nodes. Don't know >> what's the problem. >> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for >> all the tests in 1.4.2 but the jump still exists. >> Tried using either Bcast function or simply Send/Recv which give very close >> results. >> Tried both in running it directly or using SGE and got the same results. >> >> The code and ompi_info are attached to this email. The direct running >> command is : >> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile >> ../machines -np 600 scatttest >> >> The ifconfig of head node for eth0 is: >> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 >> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 >> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 >> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 >> GiB) >> Interrupt:106 Memory:d600-d6012800 >> >> A typical ifconfig of a compute node is: >> eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC >> inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 >> inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 >> GiB) >> Interrupt:82 Memory:d600-d6012800 >> >> >> Does anyone help me out of this? It bothers me a lot. >> >> Thank you very much. >> >> Linbao >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Bad performance when scattering big size of data?
Thanks a lot for your reply, Doug. There is one more thing I forgot to mention. For 500 nodes test, I observe if I use SGE, it runs only almost half of our cluster, like 35-38 nodes, not uniformly distributed on the whole cluster but the running time is still good. So I guess it is not a hyperthreading problem. Linbao On Mon, Oct 4, 2010 at 12:06 PM, Doug Reeder wrote: > In my experience hyperthreading can't really deliver two cores worth of > processing simultaneously for processes expecting sole use of a core. Since > you really have 512 cores I'm not surprised that you see a performance hit > when requesting > 512 compute units. We should really get input from a > hyperthreading expert, preferably form intel. > > Doug Reeder > On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: > > We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. > So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to > scatter an array from the master node to the compute nodes using mpiCC and > mpirun using C++. > > Here is my test: > > The array size is 18KB * Number of compute nodes and is scattered to the > compute nodes 5000 times repeatly. > > The average running time(seconds): > > 100 nodes: 170, > 400 nodes: 690, > 500 nodes: 855, > 600 nodes: 2550, > 700 nodes: 2720, > 800 nodes: 2900, > > There is a big jump of running time from 500 nodes to 600 nodes. Don't > know what's the problem. > Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster > for all the tests in 1.4.2 but the jump still exists. > Tried using either Bcast function or simply Send/Recv which give very close > results. > Tried both in running it directly or using SGE and got the same results. > > The code and ompi_info are attached to this email. The direct running > command is : > /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile > ../machines -np 600 scatttest > > The ifconfig of head node for eth0 is: > eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 > inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 > TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 > GiB) > Interrupt:106 Memory:d600-d6012800 > > A typical ifconfig of a compute node is: > eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC > inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 > TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 > GiB) > Interrupt:82 Memory:d600-d6012800 > > > Does anyone help me out of this? It bothers me a lot. > > Thank you very much. > > Linbao > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Bad performance when scattering big size of data?
In my experience hyperthreading can't really deliver two cores worth of processing simultaneously for processes expecting sole use of a core. Since you really have 512 cores I'm not surprised that you see a performance hit when requesting > 512 compute units. We should really get input from a hyperthreading expert, preferably form intel. Doug Reeder On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: > We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So > we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to > scatter an array from the master node to the compute nodes using mpiCC and > mpirun using C++. > > Here is my test: > > The array size is 18KB * Number of compute nodes and is scattered to the > compute nodes 5000 times repeatly. > > The average running time(seconds): > > 100 nodes: 170, > 400 nodes: 690, > 500 nodes: 855, > 600 nodes: 2550, > 700 nodes: 2720, > 800 nodes: 2900, > > There is a big jump of running time from 500 nodes to 600 nodes. Don't know > what's the problem. > Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for > all the tests in 1.4.2 but the jump still exists. > Tried using either Bcast function or simply Send/Recv which give very close > results. > Tried both in running it directly or using SGE and got the same results. > > The code and ompi_info are attached to this email. The direct running command > is : > /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile > ../machines -np 600 scatttest > > The ifconfig of head node for eth0 is: > eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 > inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 > TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 GiB) > Interrupt:106 Memory:d600-d6012800 > > A typical ifconfig of a compute node is: > eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC > inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 > TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 GiB) > Interrupt:82 Memory:d600-d6012800 > > > Does anyone help me out of this? It bothers me a lot. > > Thank you very much. > > Linbao > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpi_comm_spawn have problems with group communicators
On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote: >> "Ralph" == Ralph Castain writes: > >Ralph> I'm not sure why the group communicator would make a >Ralph> difference - the code area in question knows nothing about >Ralph> the mpi aspects of the job. It looks like you are hitting a >Ralph> race condition that causes a particular internal recv to >Ralph> not exist when we subsequently try to cancel it, which >Ralph> generates that error message. How did you configure OMPI? > > Thank you for the reply! > > Must be some race problem, but I have no control of it, or do I? Not really. What I don't understand is why your code would work fine when using comm_world, but encounter a race condition when using comm groups. There shouldn't be any timing difference between the two cases. > > These are the configure options that gentoo compiles openmpi-1.4.2 with: > > ./configure --prefix=/usr --build=x86_64-pc-linux-gnu > --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info > --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib > --libdir=/usr/lib64 --sysconfdir=/etc/openmpi --without-xgrid > --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default > --without-slurm --enable-contrib-no-build=vt --enable-mpi-cxx > --disable-io-romio --disable-heterogeneous --without-tm --enable-ipv6 > This looks okay. I'll have to take a look and see if I can spot something in the code...
[OMPI users] Bad performance when scattering big size of data?
We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to scatter an array from the master node to the compute nodes using mpiCC and mpirun using C++. Here is my test: The array size is 18KB * Number of compute nodes and is scattered to the compute nodes 5000 times repeatly. The average running time(seconds): 100 nodes: 170, 400 nodes: 690, 500 nodes: 855, 600 nodes: 2550, 700 nodes: 2720, 800 nodes: 2900, There is a big jump of running time from 500 nodes to 600 nodes. Don't know what's the problem. Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for all the tests in 1.4.2 but the jump still exists. Tried using either Bcast function or simply Send/Recv which give very close results. Tried both in running it directly or using SGE and got the same results. The code and ompi_info are attached to this email. The direct running command is : /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile ../machines -np 600 scatttest The ifconfig of head node for eth0 is: eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 GiB) Interrupt:106 Memory:d600-d6012800 A typical ifconfig of a compute node is: eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 GiB) Interrupt:82 Memory:d600-d6012800 Does anyone help me out of this? It bothers me a lot. Thank you very much. Linbao scatttest.cpp Description: Binary data ompi_info Description: Binary data
Re: [OMPI users] mpi_comm_spawn have problems with group communicators
> "Ralph" == Ralph Castain writes: Ralph> I'm not sure why the group communicator would make a Ralph> difference - the code area in question knows nothing about Ralph> the mpi aspects of the job. It looks like you are hitting a Ralph> race condition that causes a particular internal recv to Ralph> not exist when we subsequently try to cancel it, which Ralph> generates that error message. How did you configure OMPI? Thank you for the reply! Must be some race problem, but I have no control of it, or do I? These are the configure options that gentoo compiles openmpi-1.4.2 with: ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --sysconfdir=/etc/openmpi --without-xgrid --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default --without-slurm --enable-contrib-no-build=vt --enable-mpi-cxx --disable-io-romio --disable-heterogeneous --without-tm --enable-ipv6
Re: [OMPI users] new open mpi user questions
Ed Peddycoart wrote: The machines are RHEL 5.4 machines and an older version of Open MPI was already installed. I understand from the faq that I should not simply over write the old installation, but would it be better if I remove the old version or install the new to a different location. I have no need for the old version. Remove it, if you don't need it. If you'll compile from source, you can add the argument --prefix=/home/username/openmpi (or similar) to the ./configure command to install Open MPI in that location. Just set your PATH and LD_LIBRARY_PATH variables accordingly to compile and run your applications. Are there test apps with the Open MPI distribution that I can use to produce some benchmarks? One thing I want to get an idea of data transfer rates as a function of the size of the data transferred. A nice bandwidth benchmark is netpipe from http://www.scl.ameslab.gov/netpipe/. Compile with "make mpi". Nico
Re: [OMPI users] Shared memory
Does OMPI have shared memory capabilities (as it is mentioned in MPI-2)? How can I use them? Andrei On Sat, Sep 25, 2010 at 23:19, Andrei Fokau wrote: > Here are some more details about our problem. We use a dozen of 4-processor > nodes with 8 GB memory on each node. The code we run needs about 3 GB per > processor, so we can load only 2 processors out of 4. The vast majority of > those 3 GB is the same for each processor and is accessed continuously > during calculation. In my original question I wasn't very clear asking about > a possibility to use shared memory with Open MPI - in our case we do not > need to have a remote access to the data, and it would be sufficient to > share memory within each node only. > > Of course, the possibility to access the data remotely (via mmap) is > attractive because it would allow to store much larger arrays (up to 10 GB) > at one remote place, meaning higher accuracy for our calculations. However, > I believe that the access time would be too long for the data read so > frequently, and therefore the performance would be lost. > > I still hope that some of the subscribers to this mailing list have an > experience of using Global Arrays. This library seems to be fine for our > case, however I feel that there should be a simpler solution. Open MPI > conforms with MPI-2 standard, and the later has a description of shared > memory application. Do you see any other way for us to use shared memory > (within node) apart of using Global Arrays? > > Andrei > > > On Fri, Sep 24, 2010 at 19:03, Durga Choudhury wrote: > >> I think the 'middle ground' approach can be simplified even further if >> the data file is in a shared device (e.g. NFS/Samba mount) that can be >> mounted at the same location of the file system tree on all nodes. I >> have never tried it, though and mmap()'ing a non-POSIX compliant file >> system such as Samba might have issues I am unaware of. >> >> However, I do not see why you should not be able to do this even if >> the file is being written to as long as you call msync() before using >> the mapped pages. >> >> Durga >> >> >> On Fri, Sep 24, 2010 at 12:31 PM, Eugene Loh >> wrote: >> > It seems to me there are two extremes. >> > >> > One is that you replicate the data for each process. This has the >> > disadvantage of consuming lots of memory "unnecessarily." >> > >> > Another extreme is that shared data is distributed over all processes. >> This >> > has the disadvantage of making at least some of the data less >> accessible, >> > whether in programming complexity and/or run-time performance. >> > >> > I'm not familiar with Global Arrays. I was somewhat familiar with HPF. >> I >> > think the natural thing to do with those programming models is to >> distribute >> > data over all processes, which may relieve the excessive memory >> consumption >> > you're trying to address but which may also just put you at a different >> > "extreme" of this spectrum. >> > >> > The middle ground I think might make most sense would be to share data >> only >> > within a node, but to replicate the data for each node. There are >> probably >> > multiple ways of doing this -- possibly even GA, I don't know. One way >> > might be to use one MPI process per node, with OMP multithreading within >> > each process|node. Or (and I thought this was the solution you were >> looking >> > for), have some idea which processes are collocal. Have one process per >> > node create and initialize some shared memory -- mmap, perhaps, or SysV >> > shared memory. Then, have its peers map the same shared memory into >> their >> > address spaces. >> > >> > You asked what source code changes would be required. It depends. If >> > you're going to mmap shared memory in on each node, you need to know >> which >> > processes are collocal. If you're willing to constrain how processes >> are >> > mapped to nodes, this could be easy. (E.g., "every 4 processes are >> > collocal".) If you want to discover dynamically at run time which are >> > collocal, it would be harder. The mmap stuff could be in a stand-alone >> > function of about a dozen lines. If the shared area is allocated as one >> > piece, substituting the single malloc() call with a call to your mmap >> > function should be simple. If you have many malloc()s you're trying to >> > replace, it's harder. >> > >> > Andrei Fokau wrote: >> > >> > The data are read from a file and processed before calculations begin, >> so I >> > think that mapping will not work in our case. >> > Global Arrays look promising indeed. As I said, we need to put just a >> part >> > of data to the shared section. John, do you (or may be other users) have >> an >> > experience of working with GA? >> > http://www.emsl.pnl.gov/docs/global/um/build.html >> > When GA runs with MPI: >> > MPI_Init(..) ! start MPI >> > GA_Initialize() ! start global arrays >> > MA_Init(..) ! start memory allocator >> > do work >> > GA_Terminate()! tidy
[OMPI users] new open mpi user questions
I would like to give Open MPI a test drive on some machines I have in my lab and I have a few questions... The machines are RHEL 5.4 machines and an older version of Open MPI was already installed. I understand from the faq that I should not simply over write the old installation, but would it be better if I remove the old version or install the new to a different location. I have no need for the old version. Are there test apps with the Open MPI distribution that I can use to produce some benchmarks? One thing I want to get an idea of data transfer rates as a function of the size of the data transferred. Thanks, Ed
Re: [OMPI users] Granular locks?
On Oct 2, 2010, at 2:54 AM, Gijsbert Wiesenekker wrote: > On Oct 1, 2010, at 23:24 , Gijsbert Wiesenekker wrote: > >> I have a large array that is shared between two processes. One process >> updates array elements randomly, the other process reads array elements >> randomly. Most of the time these writes and reads do not overlap. >> The current version of the code uses Linux shared memory with NSEMS >> semaphores. When array element i has to be read or updated semaphore (i % >> NSEMS) is used. if NSEMS = 1 the entire array will be locked which leads to >> unnecessary waits because reads and writes do not overlap most of the time. >> Performance increases as NSEMS increases, and flattens out at NSEMS = 32, at >> which point the code runs twice as fast when compared to NSEMS = 1. >> I want to change the code to use OpenMPI RMA, but MPI_Win_lock locks the >> entire array, which is similar to NSEMS = 1. Is there a way to have more >> granular locks? >> >> Gijsbert >> > > Also, is there an MPI_Win_lock equavalent for IPC_NOWAIT? No. Every call to MPI_Win_lock will (eventually) result in a locking of the window. Note, however, that MPI_WIN_LOCK returning does not guarantee the remote window has been locked. It only guarantees that it is now safe to call data transfer operations targeting that window. An implementation could (and Open MPI frequently does) return immediately, queue up all data transfers until some ACK is received from the target, and then begin data movement operations. Confusing, but flexible for the wide variety of platforms MPI must target. Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI users] Granular locks?
On Oct 1, 2010, at 3:24 PM, Gijsbert Wiesenekker wrote: > I have a large array that is shared between two processes. One process > updates array elements randomly, the other process reads array elements > randomly. Most of the time these writes and reads do not overlap. > The current version of the code uses Linux shared memory with NSEMS > semaphores. When array element i has to be read or updated semaphore (i % > NSEMS) is used. if NSEMS = 1 the entire array will be locked which leads to > unnecessary waits because reads and writes do not overlap most of the time. > Performance increases as NSEMS increases, and flattens out at NSEMS = 32, at > which point the code runs twice as fast when compared to NSEMS = 1. > I want to change the code to use OpenMPI RMA, but MPI_Win_lock locks the > entire array, which is similar to NSEMS = 1. Is there a way to have more > granular locks? The MPI standard defines MPI_WIN_LOCK as protecting the entire window, so the short answer to your question is no. Depending on your application, it may be possible to have multiple windows independent pieces of the data to get the behavior you want, but that does seem icky. Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI users] mpi_comm_spawn have problems with group communicators
I'm not sure why the group communicator would make a difference - the code area in question knows nothing about the mpi aspects of the job. It looks like you are hitting a race condition that causes a particular internal recv to not exist when we subsequently try to cancel it, which generates that error message. How did you configure OMPI? On Oct 3, 2010, at 6:40 PM, Milan Hodoscek wrote: > Hi, > > I am a long time happy user of mpi_comm_spawn() routine. But so far I > used it only with the MPI_COMM_WORLD communicator. Now I want to > execute more mpi_comm_spawn() routines, by creating and using group > communicators. However this seems to have some problems. I can get it > to run about 50% times on my laptop, but on some more "speedy" > machines it just produces the following message: > > $ mpirun -n 4 a.out > [ala:31406] [[45304,0],0] ORTE_ERROR_LOG: Not found in file > base/plm_base_launch_support.c at line 758 > -- > mpirun was unable to start the specified application as it encountered an > error. > More information may be available above. > -- > > I am attaching the 2 programs needed to test the behavior. Compile: > $ mpif90 -o sps sps.f08 # spawned program > $ mpif90 mspbug.f08 # program with problems > $ mpirun -n 4 a.out > > The compiler is gfortran-4.4.4, and openmpi is 1.4.2. > > Needless to say it runs with mpich2, but mpich2 doesn't know how to > deal with stdin on a spawned process, so it's useless for my project :-( > > Any ideas? > > - > program sps > use mpi > implicit none > integer :: ier,nproc,me,pcomm,meroot,mi,on > integer, dimension(1:10) :: num > > call mpi_init(ier) > > mi=mpi_integer > call mpi_comm_rank(mpi_comm_world,me,ier) > meroot=0 > > on=1 > > call mpi_comm_get_parent(pcomm,ier) > > call mpi_bcast(num,on,mi,meroot,pcomm,ier) > write(*,*)'sps>me,num=',me,num(on) > > call mpi_finalize(ier) > > end program sps > - > > program groupspawn > > use mpi > > implicit none > ! in the case use mpi does not work (eg Ubuntu) use the include below > ! include 'mpif.h' > integer :: ier,intercom,nproc,meroot,info,mpierrs(1),mcw > integer :: i,myrepsiz,me,np,mcg,repdgrp,repdcom,on,mi,op > integer, dimension(1:10) :: myrepgrp > character(len=5) :: sarg(1),prog > integer, dimension(1:10) :: num,sm > integer :: newme,ngrp,igrp > > call mpi_init(ier) > > prog='sps' > sarg(1) = '' > nproc=2 > on=1 > meroot=0 > mcw=mpi_comm_world > info=mpi_info_null > mi=mpi_integer > op=mpi_sum > mpierrs(1)=mpi_errcodes_ignore(1) > > call mpi_comm_rank(mcw,me,ier) > call mpi_comm_size(mcw,np,ier) > > ngrp=2 ! lets have some groups > myrepsiz=np/ngrp > igrp=me/myrepsiz > do i = 1, myrepsiz >myrepgrp(i)=i+me-mod(me,myrepsiz)-1 > enddo > > call mpi_comm_group(mcw,mcg,ier) > call mpi_group_incl(mcg,myrepsiz,myrepgrp,repdgrp,ier) > call mpi_comm_create(mcw,repdgrp,repdcom,ier) > > ! call mpi_comm_spawn(prog,sarg,nproc,info,meroot,mcw,intercom,mpierrs,ier) > call mpi_comm_spawn(prog,sarg,nproc,info,meroot,repdcom,intercom,mpierrs,ier) > > ! send a number to spawned ones... > > call mpi_comm_rank(intercom,newme,ier) > write(*,*)'me,intercom,newme=',me,intercom,newme > num(1)=111*(igrp+1) > > meroot=mpi_proc_null > if(newme == 0) meroot=mpi_root ! to send data > > call mpi_bcast(num,on,mi,meroot,intercom,ier) > ! sometimes there is no output from sps programs, so we wait here: WEIRD :-( > !call sleep(1) > > call mpi_finalize(ier) > > end program groupspawn > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
It looks to me like your remote nodes aren't finding the orted executable. I suspect the problem is that you need to forward the path and ld_library_path tot he remove nodes. Use the mpirun -x option to do so. On Oct 4, 2010, at 5:08 AM, Chris Jewell wrote: > Hi all, > > Firstly, hello to the mailing list for the first time! Secondly, sorry for > the non-descript subject line, but I couldn't really think how to be more > specific! > > Anyway, I am currently having a problem getting OpenMPI to work within my > installation of SGE 6.2u5. I compiled OpenMPI 1.4.2 from source, and > installed under /usr/local/packages/openmpi-1.4.2. Software on my system is > controlled by the Modules framework which adds the bin and lib directories to > PATH and LD_LIBRARY_PATH respectively when a user is connected to an > execution node. I configured a parallel environment in which OpenMPI is to > be used: > > pe_namempi > slots 16 > user_lists NONE > xuser_listsNONE > start_proc_args/bin/true > stop_proc_args /bin/true > allocation_rule$round_robin > control_slaves TRUE > job_is_first_task FALSE > urgency_slots min > accounting_summary FALSE > > I then tried a simple job submission script: > > #!/bin/bash > # > #$ -S /bin/bash > . /etc/profile > module add ompi gcc > mpirun hostname > > If the parallel environment runs within one execution host (8 slots per > host), then all is fine. However, if scheduled across several nodes, I get > an error: > > execv: No such file or directory > execv: No such file or directory > execv: No such file or directory > -- > A daemon (pid 1629) died unexpectedly with status 1 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -- > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > mpirun: clean termination accomplished > > > I'm at a loss on how to start debugging this, and I don't seem to be getting > anything useful using the mpirun '-d' and '-v' switches. SGE logs don't note > anything. Can anyone suggest either what is wrong, or how I might progress > with getting more information? > > Many thanks, > > > Chris > > > > -- > Dr Chris Jewell > Department of Statistics > University of Warwick > Coventry > CV4 7AL > UK > Tel: +44 (0)24 7615 0778 > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Error when using OpenMPI with SGE multiple hosts
Hi all, Firstly, hello to the mailing list for the first time! Secondly, sorry for the non-descript subject line, but I couldn't really think how to be more specific! Anyway, I am currently having a problem getting OpenMPI to work within my installation of SGE 6.2u5. I compiled OpenMPI 1.4.2 from source, and installed under /usr/local/packages/openmpi-1.4.2. Software on my system is controlled by the Modules framework which adds the bin and lib directories to PATH and LD_LIBRARY_PATH respectively when a user is connected to an execution node. I configured a parallel environment in which OpenMPI is to be used: pe_namempi slots 16 user_lists NONE xuser_listsNONE start_proc_args/bin/true stop_proc_args /bin/true allocation_rule$round_robin control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary FALSE I then tried a simple job submission script: #!/bin/bash # #$ -S /bin/bash . /etc/profile module add ompi gcc mpirun hostname If the parallel environment runs within one execution host (8 slots per host), then all is fine. However, if scheduled across several nodes, I get an error: execv: No such file or directory execv: No such file or directory execv: No such file or directory -- A daemon (pid 1629) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- mpirun: clean termination accomplished I'm at a loss on how to start debugging this, and I don't seem to be getting anything useful using the mpirun '-d' and '-v' switches. SGE logs don't note anything. Can anyone suggest either what is wrong, or how I might progress with getting more information? Many thanks, Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778