Re: [OMPI users] single CPU vs four CPU result differences, is it normal?
Dear Diego, I will suggest you read the following two. It will give you some good understanding as to what is happening: https://en.wikipedia.org/wiki/Butterfly_effect http://www.amazon.com/The-End-Error-Computing-Computational/dp/1482239868 --Bibrak On Wed, Oct 28, 2015 at 6:58 PM, Diego Avesaniwrote: > dear Damin, > I wrote the solver by myself. I have not understood your answer. > > Diego > > > On 28 October 2015 at 23:09, Damien wrote: > >> Diego, >> >> There aren't many linear solvers that are bit-consistent, where the >> answer is the same no matter how many cores or processes you use. Intel's >> version of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but >> that's all. You should assume your answer will not be exactly the same as >> you change the number of cores or processes, although you should reach the >> same overall error tolerance in approximately the same number of iterations. >> >> Damien >> >> >> On 2015-10-28 3:51 PM, Diego Avesani wrote: >> >> dear Andreas, dear all, >> The code is quite long. It is a conjugate gradient algorithm to solve a >> complex system. >> >> I have noticed that when a do cycle is small, let's say >> do i=1,3 >> >> enddo >> >> the results are identical. If the cycle is big, let's say do i=1,20, the >> results are different and the difference increase with the number of >> iterations. >> >> What do you think? >> >> >> >> Diego >> >> >> On 28 October 2015 at 22:32, Andreas Schäfer wrote: >> >>> On 22:03 Wed 28 Oct , Diego Avesani wrote: >>> > When I use a single CPU a get a results, when I use 4 CPU I get another >>> > one. I do not think that very is a bug. >>> >>> Sounds like a bug to me, most likely in your code. >>> >>> > Do you think that these small differences are normal? >>> >>> It depends on what small means. Floating point operations in a >>> computer are generally not commutative, so parallelization may in deed >>> lead to different results. >>> >>> > Is there any way to get the same results? is some align problem? >>> >>> Impossible to say without knowing your code. >>> >>> Cheers >>> -Andreas >>> >>> >>> -- >>> == >>> Andreas Schäfer >>> HPC and Grid Computing >>> Department of Computer Science 3 >>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany >>> +49 9131 85-27910 >>> PGP/GPG key via keyserver >>> http://www.libgeodecomp.org >>> == >>> >>> (\___/) >>> (+'.'+) >>> (")_(") >>> This is Bunny. Copy and paste Bunny into your >>> signature to help him gain world domination! >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/10/27933.php >>> >> >> >> >> ___ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27934.php >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/10/27935.php >> > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27936.php >
Re: [OMPI users] How to run Open MPI over TCP (Ethernet)
Dear Jeff, Thanks for the information and helping me out. I too delayed replying, I wanted to test this but the cluster here is down. I will check it and let you know in case it doesn't work. Thanks Bibrak Qamar On Sat, May 24, 2014 at 5:23 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > I am sorry for the delay in replying; this week got a bit crazy on me. > > I'm guessing that Open MPI is striping across both your eth0 and ib0 > interfaces. > > You can limit which interfaces it uses with the btl_tcp_if_include MCA > param. For example: > > # Just use eth0 > mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0 ... > > # Just use ib0 > mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include ib0 ... > > Note that IPoIB is nowhere near as efficient as native verbs, so you won't > get nearly as good performance as you do with OMPI's openib transport. > > Note, too, that I specifically included "--mca btl tcp,sm,self" in the > above examples to force the use of the TCP MPI transport. Otherwise, OMPI > may well automatically choose the native IB (openib) transport. I see you > mentioned this in your first mail, too, but I am listing it here just to be > specific/pedantic. > > > > On May 22, 2014, at 3:30 AM, Bibrak Qamar <bibr...@gmail.com> wrote: > > > Hi, > > > > I am facing problem in running Open MPI using TCP (on 1G Ethernet). In > practice the bandwidth must not exceed 1000 Mbps but for some data points > (for point-to-point ping pong) it exceeds this limit. I checked with MPICH > it works as desired. > > > > Following is the command I issue to run my program on TCP. Am I missing > something? > > > > -bash-3.2$ mpirun -np 2 -machinefile machines -N 1 --mca btl tcp,self > ./bandwidth.ompi > > > -- > > The following command line options and corresponding MCA parameter have > > been deprecated and replaced as follows: > > > > Command line options: > > Deprecated: --npernode, -npernode > > Replacement: --map-by ppr:N:node > > > > Equivalent MCA parameter: > > Deprecated: rmaps_base_n_pernode, rmaps_ppr_n_pernode > > Replacement: rmaps_base_mapping_policy=ppr:N:node > > > > The deprecated forms *will* disappear in a future version of Open MPI. > > Please update to the new syntax. > > > -- > > Hello, world. I am 1 on compute-0-16.local > > Hello, world. I am 0 on compute-0-15.local > > 125.660.30 > > 225.540.60 > > 425.341.20 > > 825.272.42 > > 1625.244.84 > > 3225.499.58 > > 6426.4418.47 > > 12826.8536.37 > > 25629.4366.37 > > 51236.02108.44 > > 102442.03185.86 > > 2048194.3080.42 > > 4096255.21122.45 > > 8192258.85241.45 > > 16384307.96405.90 > > 32768422.78591.32 > > 65536790.11632.83 > > 1310721054.08948.70 > > 2621441618.201235.94 > > 5242883126.651279.33 > > > > -Bibrak > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] How to run Open MPI over TCP (Ethernet)
Here the output of ifconfig *-bash-3.2$ ssh compute-0-15 /sbin/ifconfig* eth0 Link encap:Ethernet HWaddr 78:E7:D1:61:C6:F4 inet addr:10.1.255.239 Bcast:10.1.255.255 Mask:255.255.0.0 inet6 addr: fe80::7ae7:d1ff:fe61:c6f4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:63715944 errors:0 dropped:0 overruns:0 frame:0 TX packets:66225083 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:85950530550 (80.0 GiB) TX bytes:88970954416 (82.8 GiB) Memory:fbe6-fbe8 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.1.15 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:a:6f81/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:85388965 errors:0 dropped:0 overruns:0 frame:0 TX packets:94530341 errors:0 dropped:72 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:52140667469 (48.5 GiB) TX bytes:72573030755 (67.5 GiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:394785 errors:0 dropped:0 overruns:0 frame:0 TX packets:394785 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:23757752 (22.6 MiB) TX bytes:23757752 (22.6 MiB) *-bash-3.2$ ssh compute-0-16 /sbin/ifconfig* eth0 Link encap:Ethernet HWaddr 78:E7:D1:61:D6:72 inet addr:10.1.255.238 Bcast:10.1.255.255 Mask:255.255.0.0 inet6 addr: fe80::7ae7:d1ff:fe61:d672/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:85494220 errors:0 dropped:0 overruns:0 frame:0 TX packets:84183073 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:90136414384 (83.9 GiB) TX bytes:87205444848 (81.2 GiB) Memory:fbe6-fbe8 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:a:6f91/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:116291959 errors:0 dropped:0 overruns:0 frame:0 TX packets:130137130 errors:0 dropped:107 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:54348901701 (50.6 GiB) TX bytes:80828495293 (75.2 GiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:394518 errors:0 dropped:0 overruns:0 frame:0 TX packets:394518 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:21661017 (20.6 MiB) TX bytes:21661017 (20.6 MiB) Bibrak Qamar On Thu, May 22, 2014 at 3:30 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > Can you send the output of ifconfig on both compute-0-15.local and > compute-0-16.local? > > > On May 22, 2014, at 3:30 AM, Bibrak Qamar <bibr...@gmail.com> wrote: > > > Hi, > > > > I am facing problem in running Open MPI using TCP (on 1G Ethernet). In > practice the bandwidth must not exceed 1000 Mbps but for some data points > (for point-to-point ping pong) it exceeds this limit. I checked with MPICH > it works as desired. > > > > Following is the command I issue to run my program on TCP. Am I missing > something? > > > > -bash-3.2$ mpirun -np 2 -machinefile machines -N 1 --mca btl tcp,self > ./bandwidth.ompi > > > -- > > The following command line options and corresponding MCA parameter have > > been deprecated and replaced as follows: > > > > Command line options: > > Deprecated: --npernode, -npernode > > Replacement: --map-by ppr:N:node > > > > Equivalent MCA parameter: > > Deprecated: rmaps_base_n_pernode, rmaps_ppr_n_pernode > > Replacement: rmaps_base_mapping_policy=ppr:N:node > > > > The deprecated forms *will* disappear in a future version of Open MPI. > > Please update to the new syntax. > > > -- > > Hello, world. I am 1 on compute-0-16.local > > Hello, world. I am 0 on compute-0-15.local > > 125.660.30 > > 225.540.60 > > 425.341.20 > > 825.272.42 > > 1625.244.84 > > 3225.499.58 > &
[OMPI users] How to run Open MPI over TCP (Ethernet)
Hi, I am facing problem in running Open MPI using TCP (on 1G Ethernet). In practice the bandwidth must not exceed 1000 Mbps but for some data points (for point-to-point ping pong) it exceeds this limit. I checked with MPICH it works as desired. Following is the command I issue to run my program on TCP. Am I missing something? *-bash-3.2$ mpirun -np 2 -machinefile machines -N 1 --mca btl tcp,self ./bandwidth.ompi * -- The following command line options and corresponding MCA parameter have been deprecated and replaced as follows: Command line options: Deprecated: --npernode, -npernode Replacement: --map-by ppr:N:node Equivalent MCA parameter: Deprecated: rmaps_base_n_pernode, rmaps_ppr_n_pernode Replacement: rmaps_base_mapping_policy=ppr:N:node The deprecated forms *will* disappear in a future version of Open MPI. Please update to the new syntax. -- Hello, world. I am 1 on compute-0-16.local Hello, world. I am 0 on compute-0-15.local 125.660.30 225.540.60 425.341.20 825.272.42 1625.244.84 3225.499.58 6426.4418.47 12826.8536.37 25629.4366.37 51236.02108.44 102442.03185.86 2048194.3080.42 4096255.21122.45 8192258.85241.45 16384307.96405.90 32768422.78591.32 65536790.11632.83 1310721054.08948.70 *2621441618.201235.94 5242883126.651279.33* -Bibrak
[OMPI users] Collective comminucation API
I want to know, if there is any other implementation of collective communication ( reduce and Bcast) available apart from what openMPI provides. Thanks Bibrak Qamar Undergraduate Student BIT-9 Member Center for High Performance Scientific Computing NUST-School of Electrical Engineering and Computer Science.
Re: [OMPI users] Calculate time spent on non blocking communication?
Thanks all, As asked the reason of such calculation of non blocking communication, the main reason is that I want to look into the program as how much it percent time is consumed on communication alone, computation alone and the intersection of both. Bibrak Qamar Undergraduate Student BIT-9 Member Center for High Performance Scientific Computing NUST-School of Electrical Engineering and Computer Science. On Thu, Feb 3, 2011 at 5:08 AM, Eugene Loh <eugene@oracle.com> wrote: > Again, you can try the Peruse instrumentation. Configure OMPI with > --enable-peruse. The instrumentation points might help you decide how you > want to define the time you want to measure. Again, you really have to > spend a bunch of your own time deciding what is meaningful to measure. > > Gustavo Correa wrote: > > However, OpenMPI may give this info, with non-MPI (hence non-portable) >> functions, I'd guess. >> >> From: Eugene Loh <eugene@oracle.com> >>> >>> Anyhow, the Peruse instrumentation in OMPI might help. >>> >>> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Calculate time spent on non blocking communication?
Gus Correa, But it will include the time of computation which took place before waitAll( ). List-Post: users@lists.open-mpi.org Date: Tue, 1 Feb 2011 10:09:03 +0400 From: Bibrak Qamar <bibr...@gmail.com> Subject: [OMPI users] Calculate time spent on non blocking communication? To: us...@open-mpi.org Message-ID: <aanlktinewz_xyx88pgouojvogaf6ld8nwf_nchsy0...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hello All, I am using non-blocking send and receive, and i want to calculate the time it took for the communication. Is there any method or a way to do this using openmpi. Thanks Bibrak Qamar Undergraduate Student BIT-9 Member Center for High Performance Scientific Computing NUST-School of Electrical Engineering and Computer Science. -- next part -- HTML attachment scrubbed and removed -- Message: 4 List-Post: users@lists.open-mpi.org Date: Mon, 31 Jan 2011 22:14:53 -0800 From: Eugene Loh <eugene@oracle.com> Subject: Re: [OMPI users] Calculate time spent on non blocking communication? To: Open MPI Users <us...@open-mpi.org> Message-ID: <4d47a4dd.5010...@oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Bibrak Qamar wrote: > Hello All, > > I am using non-blocking send and receive, and i want to calculate the > time it took for the communication. Is there any method or a way to do > this using openmpi. You probably have to start by defining what you mean by "the time it took for the communication". Anyhow, the Peruse instrumentation in OMPI might help. -- Message: 5 List-Post: users@lists.open-mpi.org Date: Tue, 1 Feb 2011 01:20:36 -0500 From: Gustavo Correa <g...@ldeo.columbia.edu> Subject: Re: [OMPI users] Calculate time spent on non blocking communication? To: Open MPI Users <us...@open-mpi.org> Message-ID: <8f16054c-6fca-4e65-9c83-5efbfcb18...@ldeo.columbia.edu > Content-Type: text/plain; charset=us-ascii On Feb 1, 2011, at 1:09 AM, Bibrak Qamar wrote: > Hello All, > > I am using non-blocking send and receive, and i want to calculate the time it took for the communication. Is there any method or a way to do this using openmpi. > > Thanks > Bibrak Qamar > Undergraduate Student BIT-9 > Member Center for High Performance Scientific Computing > NUST-School of Electrical Engineering and Computer Science. > ___ About the same as with blocking communication, I guess. Would this do work for you? start=MPI_Wtime() MPI_Isend(...) ... MPI_Irecv(...) ... MPI_Wait[all](...) end=MPI_Wtime() print *, 'walltime = ', end-start My two cents, Gus Correa
[OMPI users] Check whether non-blocking communication has finished?
Hello All, Is there any way to find whether a non blocking communication has finished without calling the wait( ) function. Thanks Bibrak Qamar Undergraduate Student BIT-9 Member Center for High Performance Scientific Computing NUST-School of Electrical Engineering and Computer Science.
[OMPI users] Calculate time spent on non blocking communication?
Hello All, I am using non-blocking send and receive, and i want to calculate the time it took for the communication. Is there any method or a way to do this using openmpi. Thanks Bibrak Qamar Undergraduate Student BIT-9 Member Center for High Performance Scientific Computing NUST-School of Electrical Engineering and Computer Science.
[OMPI users] MPICH2 is working OpenMPI Not
Hello, I have developed a code which I tested on MPICH2, it working fine. But when I compile and run it with OpenMPI, its not working. The result of the program with the errors by OpenMPI is below .. -- bibrak@barq:~/XXX> mpirun -np 4 ./exec 98 warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc Send count -- >> 25 Send count -- >> 25 Send count -- >> 24 Send count -- >> 24 Dis -- >> 0 Dis -- >> 25 Dis -- >> 50 Dis -- >> 74 0 d[0] = -14.025975 1 d[0] = -14.025975 -- 1 -- 2 d[0] = -14.025975 -- 2 -- -- 0 -- 3 d[0] = -14.025975 --3 -- [barq:27118] *** Process received signal *** [barq:27118] Signal: Segmentation fault (11) [barq:27118] Signal code: Address not mapped (1) [barq:27118] Failing at address: 0x51681f96 [barq:27121] *** Process received signal *** [barq:27121] Signal: Segmentation fault (11) [barq:27121] Signal code: Address not mapped (1) [barq:27121] Failing at address: 0x77b5685 [barq:27118] [ 0] [0xe410] [barq:27118] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7d20f3c] [barq:27118] [ 2] ./exec(main+0x2214) [0x804ad8d] [barq:27118] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7cc9705] [barq:27121] [ 0] [0xe410] [barq:27121] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7d0ef3c] [barq:27121] [ 2] ./exec(main+0x2214) [0x804ad8d] [barq:27121] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7cb7705] [barq:27121] [ 4] ./exec [0x8048b01] [barq:27121] *** End of error message *** [barq:27118] [ 4] ./exec [0x8048b01] [barq:27118] *** End of error message *** -- mpirun noticed that process rank 3 with PID 27121 on node barq exited on signal 11 (Segmentation fault). -- [barq:27120] *** Process received signal *** [barq:27120] Signal: Segmentation fault (11) [barq:27120] Signal code: Address not mapped (1) [barq:27120] Failing at address: 0x4bd1ca3e [barq:27120] [ 0] [0xe410] [barq:27120] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7c97f3c] [barq:27120] [ 2] ./exec(main+0x2214) [0x804ad8d] [barq:27120] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7c40705] [barq:27120] [ 4] ./exec [0x8048b01] [barq:27120] *** End of error message *** Because of the warning:regcache incompatible with malloc warning I did > bibrak@barq:~/XXX> export MX_RCACHE=2 And now ignored the warning, but the error still remains I shall appreciate any help. Bibrak Qamar NUST-SEECS