Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Bibrak Qamar
Dear Diego,

I will suggest you read the following two. It will give you some good
understanding as to what is happening:

https://en.wikipedia.org/wiki/Butterfly_effect

http://www.amazon.com/The-End-Error-Computing-Computational/dp/1482239868


--Bibrak

On Wed, Oct 28, 2015 at 6:58 PM, Diego Avesani 
wrote:

> dear Damin,
> I wrote the solver by myself. I have not understood your answer.
>
> Diego
>
>
> On 28 October 2015 at 23:09, Damien  wrote:
>
>> Diego,
>>
>> There aren't many linear solvers that are bit-consistent, where the
>> answer is the same no matter how many cores or processes you use.  Intel's
>> version of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but
>> that's all.  You should assume your answer will not be exactly the same as
>> you change the number of cores or processes, although you should reach the
>> same overall error tolerance in approximately the same number of iterations.
>>
>> Damien
>>
>>
>> On 2015-10-28 3:51 PM, Diego Avesani wrote:
>>
>> dear Andreas, dear all,
>> The code is quite long. It is a conjugate gradient algorithm to solve a
>> complex system.
>>
>> I have noticed that when a do cycle is small, let's say
>> do i=1,3
>>
>> enddo
>>
>> the results are identical. If the cycle is big, let's say do i=1,20, the
>> results are different and the difference increase with the number of
>> iterations.
>>
>> What do you think?
>>
>>
>>
>> Diego
>>
>>
>> On 28 October 2015 at 22:32, Andreas Schäfer  wrote:
>>
>>> On 22:03 Wed 28 Oct , Diego Avesani wrote:
>>> > When I use a single CPU a get a results, when I use 4 CPU I get another
>>> > one. I do not think that very is a bug.
>>>
>>> Sounds like a bug to me, most likely in your code.
>>>
>>> > Do you think that these small differences are normal?
>>>
>>> It depends on what small means. Floating point operations in a
>>> computer are generally not commutative, so parallelization may in deed
>>> lead to different results.
>>>
>>> > Is there any way to get the same results? is some align problem?
>>>
>>> Impossible to say without knowing your code.
>>>
>>> Cheers
>>> -Andreas
>>>
>>>
>>> --
>>> ==
>>> Andreas Schäfer
>>> HPC and Grid Computing
>>> Department of Computer Science 3
>>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
>>> +49 9131 85-27910
>>> PGP/GPG key via keyserver
>>> http://www.libgeodecomp.org
>>> ==
>>>
>>> (\___/)
>>> (+'.'+)
>>> (")_(")
>>> This is Bunny. Copy and paste Bunny into your
>>> signature to help him gain world domination!
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
>>>
>>
>>
>>
>> ___
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/10/27934.php
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/10/27935.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27936.php
>


Re: [OMPI users] How to run Open MPI over TCP (Ethernet)

2014-05-28 Thread Bibrak Qamar
Dear Jeff,

Thanks for the information and helping me out. I too delayed replying, I
wanted to test this but the cluster here is down. I will check it and let
you know in case it doesn't work.

Thanks
Bibrak Qamar



On Sat, May 24, 2014 at 5:23 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> I am sorry for the delay in replying; this week got a bit crazy on me.
>
> I'm guessing that Open MPI is striping across both your eth0 and ib0
> interfaces.
>
> You can limit which interfaces it uses with the btl_tcp_if_include MCA
> param.  For example:
>
> # Just use eth0
> mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0 ...
>
> # Just use ib0
> mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include ib0 ...
>
> Note that IPoIB is nowhere near as efficient as native verbs, so you won't
> get nearly as good performance as you do with OMPI's openib transport.
>
> Note, too, that I specifically included "--mca btl tcp,sm,self" in the
> above examples to force the use of the TCP MPI transport.  Otherwise, OMPI
> may well automatically choose the native IB (openib) transport.  I see you
> mentioned this in your first mail, too, but I am listing it here just to be
> specific/pedantic.
>
>
>
> On May 22, 2014, at 3:30 AM, Bibrak Qamar <bibr...@gmail.com> wrote:
>
> > Hi,
> >
> > I am facing problem in running Open MPI using TCP (on 1G Ethernet). In
> practice the bandwidth must not exceed 1000 Mbps but for some data points
> (for point-to-point ping pong) it exceeds this limit. I checked with MPICH
> it works as desired.
> >
> > Following is the command I issue to run my program on TCP. Am I missing
> something?
> >
> > -bash-3.2$ mpirun -np 2  -machinefile machines -N 1 --mca btl tcp,self
> ./bandwidth.ompi
> >
> --
> > The following command line options and corresponding MCA parameter have
> > been deprecated and replaced as follows:
> >
> >   Command line options:
> > Deprecated:  --npernode, -npernode
> > Replacement: --map-by ppr:N:node
> >
> >   Equivalent MCA parameter:
> > Deprecated:  rmaps_base_n_pernode, rmaps_ppr_n_pernode
> > Replacement: rmaps_base_mapping_policy=ppr:N:node
> >
> > The deprecated forms *will* disappear in a future version of Open MPI.
> > Please update to the new syntax.
> >
> --
> > Hello, world.  I am 1 on compute-0-16.local
> > Hello, world.  I am 0 on compute-0-15.local
> > 125.660.30
> > 225.540.60
> > 425.341.20
> > 825.272.42
> > 1625.244.84
> > 3225.499.58
> > 6426.4418.47
> > 12826.8536.37
> > 25629.4366.37
> > 51236.02108.44
> > 102442.03185.86
> > 2048194.3080.42
> > 4096255.21122.45
> > 8192258.85241.45
> > 16384307.96405.90
> > 32768422.78591.32
> > 65536790.11632.83
> > 1310721054.08948.70
> > 2621441618.201235.94
> > 5242883126.651279.33
> >
> > -Bibrak
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] How to run Open MPI over TCP (Ethernet)

2014-05-23 Thread Bibrak Qamar
Here the output of ifconfig

*-bash-3.2$ ssh compute-0-15 /sbin/ifconfig*
eth0  Link encap:Ethernet  HWaddr 78:E7:D1:61:C6:F4
  inet addr:10.1.255.239  Bcast:10.1.255.255  Mask:255.255.0.0
  inet6 addr: fe80::7ae7:d1ff:fe61:c6f4/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:63715944 errors:0 dropped:0 overruns:0 frame:0
  TX packets:66225083 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:85950530550 (80.0 GiB)  TX bytes:88970954416 (82.8 GiB)
  Memory:fbe6-fbe8

ib0   Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet addr:192.168.1.15  Bcast:192.168.1.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c903:a:6f81/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:85388965 errors:0 dropped:0 overruns:0 frame:0
  TX packets:94530341 errors:0 dropped:72 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:52140667469 (48.5 GiB)  TX bytes:72573030755 (67.5 GiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:394785 errors:0 dropped:0 overruns:0 frame:0
  TX packets:394785 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:23757752 (22.6 MiB)  TX bytes:23757752 (22.6 MiB)



*-bash-3.2$ ssh compute-0-16 /sbin/ifconfig*
eth0  Link encap:Ethernet  HWaddr 78:E7:D1:61:D6:72
  inet addr:10.1.255.238  Bcast:10.1.255.255  Mask:255.255.0.0
  inet6 addr: fe80::7ae7:d1ff:fe61:d672/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:85494220 errors:0 dropped:0 overruns:0 frame:0
  TX packets:84183073 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:90136414384 (83.9 GiB)  TX bytes:87205444848 (81.2 GiB)
  Memory:fbe6-fbe8

ib0   Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet addr:192.168.1.16  Bcast:192.168.1.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c903:a:6f91/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:116291959 errors:0 dropped:0 overruns:0 frame:0
  TX packets:130137130 errors:0 dropped:107 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:54348901701 (50.6 GiB)  TX bytes:80828495293 (75.2 GiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:394518 errors:0 dropped:0 overruns:0 frame:0
  TX packets:394518 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:21661017 (20.6 MiB)  TX bytes:21661017 (20.6 MiB)


Bibrak Qamar



On Thu, May 22, 2014 at 3:30 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Can you send the output of ifconfig on both compute-0-15.local and
> compute-0-16.local?
>
>
> On May 22, 2014, at 3:30 AM, Bibrak Qamar <bibr...@gmail.com> wrote:
>
> > Hi,
> >
> > I am facing problem in running Open MPI using TCP (on 1G Ethernet). In
> practice the bandwidth must not exceed 1000 Mbps but for some data points
> (for point-to-point ping pong) it exceeds this limit. I checked with MPICH
> it works as desired.
> >
> > Following is the command I issue to run my program on TCP. Am I missing
> something?
> >
> > -bash-3.2$ mpirun -np 2  -machinefile machines -N 1 --mca btl tcp,self
> ./bandwidth.ompi
> >
> --
> > The following command line options and corresponding MCA parameter have
> > been deprecated and replaced as follows:
> >
> >   Command line options:
> > Deprecated:  --npernode, -npernode
> > Replacement: --map-by ppr:N:node
> >
> >   Equivalent MCA parameter:
> > Deprecated:  rmaps_base_n_pernode, rmaps_ppr_n_pernode
> > Replacement: rmaps_base_mapping_policy=ppr:N:node
> >
> > The deprecated forms *will* disappear in a future version of Open MPI.
> > Please update to the new syntax.
> >
> --
> > Hello, world.  I am 1 on compute-0-16.local
> > Hello, world.  I am 0 on compute-0-15.local
> > 125.660.30
> > 225.540.60
> > 425.341.20
> > 825.272.42
> > 1625.244.84
> > 3225.499.58
> &

[OMPI users] How to run Open MPI over TCP (Ethernet)

2014-05-22 Thread Bibrak Qamar
Hi,

I am facing problem in running Open MPI using TCP (on 1G Ethernet). In
practice the bandwidth must not exceed 1000 Mbps but for some data points
(for point-to-point ping pong) it exceeds this limit. I checked with MPICH
it works as desired.

Following is the command I issue to run my program on TCP. Am I missing
something?

*-bash-3.2$ mpirun -np 2  -machinefile machines -N 1 --mca btl tcp,self
./bandwidth.ompi *
--
The following command line options and corresponding MCA parameter have
been deprecated and replaced as follows:

  Command line options:
Deprecated:  --npernode, -npernode
Replacement: --map-by ppr:N:node

  Equivalent MCA parameter:
Deprecated:  rmaps_base_n_pernode, rmaps_ppr_n_pernode
Replacement: rmaps_base_mapping_policy=ppr:N:node

The deprecated forms *will* disappear in a future version of Open MPI.
Please update to the new syntax.
--
Hello, world.  I am 1 on compute-0-16.local
Hello, world.  I am 0 on compute-0-15.local
125.660.30
225.540.60
425.341.20
825.272.42
1625.244.84
3225.499.58
6426.4418.47
12826.8536.37
25629.4366.37
51236.02108.44
102442.03185.86
2048194.3080.42
4096255.21122.45
8192258.85241.45
16384307.96405.90
32768422.78591.32
65536790.11632.83
1310721054.08948.70

*2621441618.201235.94 5242883126.651279.33*

-Bibrak


[OMPI users] Collective comminucation API

2011-02-11 Thread Bibrak Qamar
I want to know, if there is any other implementation of collective
communication ( reduce and Bcast) available apart from what openMPI
provides.


Thanks

Bibrak Qamar
Undergraduate Student BIT-9
Member Center for High Performance Scientific Computing
NUST-School of Electrical Engineering and Computer Science.


Re: [OMPI users] Calculate time spent on non blocking communication?

2011-02-03 Thread Bibrak Qamar
Thanks all,

As asked the reason of such calculation of non blocking communication, the
main reason is that I want to look into the program as how much it percent
time is consumed on communication alone, computation alone and the
intersection of both.

Bibrak Qamar
Undergraduate Student BIT-9
Member Center for High Performance Scientific Computing
NUST-School of Electrical Engineering and Computer Science.


On Thu, Feb 3, 2011 at 5:08 AM, Eugene Loh <eugene@oracle.com> wrote:

> Again, you can try the Peruse instrumentation.  Configure OMPI with
> --enable-peruse.  The instrumentation points might help you decide how you
> want to define the time you want to measure.  Again, you really have to
> spend a bunch of your own time deciding what is meaningful to measure.
>
> Gustavo Correa wrote:
>
>  However, OpenMPI may give this info, with non-MPI (hence non-portable)
>> functions, I'd guess.
>>
>>  From: Eugene Loh <eugene@oracle.com>
>>>
>>> Anyhow, the Peruse instrumentation in OMPI might help.
>>>
>>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Calculate time spent on non blocking communication?

2011-02-02 Thread Bibrak Qamar
Gus Correa, But it will include the time of computation which took place
before waitAll( ).


List-Post: users@lists.open-mpi.org
Date: Tue, 1 Feb 2011 10:09:03 +0400
From: Bibrak Qamar <bibr...@gmail.com>
Subject: [OMPI users] Calculate time spent on non blocking
   communication?
To: us...@open-mpi.org
Message-ID:
   <aanlktinewz_xyx88pgouojvogaf6ld8nwf_nchsy0...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello All,

I am using non-blocking send and receive, and i want to calculate the time
it took for the communication. Is there any method or a way to do this using
openmpi.

Thanks
Bibrak Qamar
Undergraduate Student BIT-9
Member Center for High Performance Scientific Computing
NUST-School of Electrical Engineering and Computer Science.
-- next part --
HTML attachment scrubbed and removed

--

Message: 4
List-Post: users@lists.open-mpi.org
Date: Mon, 31 Jan 2011 22:14:53 -0800
From: Eugene Loh <eugene@oracle.com>
Subject: Re: [OMPI users] Calculate time spent on non blocking
   communication?
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <4d47a4dd.5010...@oracle.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Bibrak Qamar wrote:

> Hello All,
>
> I am using non-blocking send and receive, and i want to calculate the
> time it took for the communication. Is there any method or a way to do
> this using openmpi.

You probably have to start by defining what you mean by "the time it
took for the communication".  Anyhow, the Peruse instrumentation in OMPI
might help.


--

Message: 5
List-Post: users@lists.open-mpi.org
Date: Tue, 1 Feb 2011 01:20:36 -0500
From: Gustavo Correa <g...@ldeo.columbia.edu>
Subject: Re: [OMPI users] Calculate time spent on non blocking
   communication?
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <8f16054c-6fca-4e65-9c83-5efbfcb18...@ldeo.columbia.edu
>
Content-Type: text/plain; charset=us-ascii


On Feb 1, 2011, at 1:09 AM, Bibrak Qamar wrote:




> Hello All,
>
> I am using non-blocking send and receive, and i want to calculate the time
it took for the communication. Is there any method or a way to do this using
openmpi.
>
> Thanks
> Bibrak Qamar
> Undergraduate Student BIT-9
> Member Center for High Performance Scientific Computing
> NUST-School of Electrical Engineering and Computer Science.
> ___

About the same as with blocking communication, I guess.

Would this do work for you?

start=MPI_Wtime()
MPI_Isend(...)
...
MPI_Irecv(...)
...
MPI_Wait[all](...)
end=MPI_Wtime()
print *, 'walltime = ', end-start

My two cents,
Gus Correa


[OMPI users] Check whether non-blocking communication has finished?

2011-02-02 Thread Bibrak Qamar
Hello All,

Is there any way to find whether a non blocking communication has finished
without calling the wait( ) function.


Thanks
Bibrak Qamar
Undergraduate Student BIT-9
Member Center for High Performance Scientific Computing
NUST-School of Electrical Engineering and Computer Science.


[OMPI users] Calculate time spent on non blocking communication?

2011-02-01 Thread Bibrak Qamar
Hello All,

I am using non-blocking send and receive, and i want to calculate the time
it took for the communication. Is there any method or a way to do this using
openmpi.

Thanks
Bibrak Qamar
Undergraduate Student BIT-9
Member Center for High Performance Scientific Computing
NUST-School of Electrical Engineering and Computer Science.


[OMPI users] MPICH2 is working OpenMPI Not

2010-07-18 Thread Bibrak Qamar
Hello,

I have developed a code which I tested on MPICH2, it working fine.

But when I compile and run it with OpenMPI, its not working.

The result of the program with the errors by OpenMPI is below ..

--


bibrak@barq:~/XXX> mpirun -np 4 ./exec 98


warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
Send count -- >> 25
Send count -- >> 25
Send count -- >> 24
Send count -- >> 24
Dis -- >> 0
Dis -- >> 25
Dis -- >> 50
Dis -- >> 74




 0 d[0] = -14.025975
 1 d[0] = -14.025975
-- 1 --
 2 d[0] = -14.025975
-- 2 --
-- 0 --
 3 d[0] = -14.025975
 --3 --
[barq:27118] *** Process received signal ***
[barq:27118] Signal: Segmentation fault (11)
[barq:27118] Signal code: Address not mapped (1)
[barq:27118] Failing at address: 0x51681f96
[barq:27121] *** Process received signal ***
[barq:27121] Signal: Segmentation fault (11)
[barq:27121] Signal code: Address not mapped (1)
[barq:27121] Failing at address: 0x77b5685
[barq:27118] [ 0] [0xe410]
[barq:27118] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7d20f3c]
[barq:27118] [ 2] ./exec(main+0x2214) [0x804ad8d]
[barq:27118] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7cc9705]
[barq:27121] [ 0] [0xe410]
[barq:27121] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7d0ef3c]
[barq:27121] [ 2] ./exec(main+0x2214) [0x804ad8d]
[barq:27121] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7cb7705]
[barq:27121] [ 4] ./exec [0x8048b01]
[barq:27121] *** End of error message ***
[barq:27118] [ 4] ./exec [0x8048b01]
[barq:27118] *** End of error message ***
--
mpirun noticed that process rank 3 with PID 27121 on node barq exited on
signal 11 (Segmentation fault).
--
[barq:27120] *** Process received signal ***
[barq:27120] Signal: Segmentation fault (11)
[barq:27120] Signal code: Address not mapped (1)
[barq:27120] Failing at address: 0x4bd1ca3e
[barq:27120] [ 0] [0xe410]
[barq:27120] [ 1] /lib/libc.so.6(cfree+0x9c) [0xb7c97f3c]
[barq:27120] [ 2] ./exec(main+0x2214) [0x804ad8d]
[barq:27120] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7c40705]
[barq:27120] [ 4] ./exec [0x8048b01]
[barq:27120] *** End of error message ***




Because of the warning:regcache incompatible with malloc warning I did
>  bibrak@barq:~/XXX> export MX_RCACHE=2

And now ignored the warning, but the error still remains

I shall appreciate any help.

Bibrak Qamar
NUST-SEECS