Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Joshua Mora
Diego.
Assuming you are not properly coding the solver.
Write a problem so you know the exact solution.
That is know A (a very simple non singular SDP) and x_ , where x_ !=0. Make x
a linear function or a constant so it is super easy to spot where it is
happening the bad x's.
I assume A has the boundary conditions on it and you are not taking care of
them outside of A. If not you will have to check those too.
Then find b=A*x_.
Given A and b, initialize x to zero.
Then use your CG to find x.
Display/plot your x or the error ( abs(x-x_) ) as you go on the iterations.
Display them in a map where you can easily identify/locate your cores and the
distribution of A,x and b.
You will soon see where (eg. not exchanging data between your cores properly)
your data x is not being properly updated.
I assume you are also using an easy 1D or 2D partitioning of your data so you
can easily spot the issues.

Good luck.
Joshua

PS. Usually you have to build a communication library for your algebra that
you trust (thoroughly tested). Then you build your data types of the algebra
bit a bit: scalar, vector, matrix. Then the operators (addition, product), and
finally your solver  : CG, BiCGSTAB, GMRESR,...

-- Original Message --
Received: 05:58 PM CDT, 10/28/2015
From: Diego Avesani 
To: Open MPI Users 
Subject: Re: [OMPI users] single CPU vs four CPU result differences,is it
normal?

> dear Damin,
> I wrote the solver by myself. I have not understood your answer.
> 
> Diego
> 
> 
> On 28 October 2015 at 23:09, Damien  wrote:
> 
> > Diego,
> >
> > There aren't many linear solvers that are bit-consistent, where the
answer
> > is the same no matter how many cores or processes you use.  Intel's
version
> > of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but that's
> > all.  You should assume your answer will not be exactly the same as you
> > change the number of cores or processes, although you should reach the
same
> > overall error tolerance in approximately the same number of iterations.
> >
> > Damien
> >
> >
> > On 2015-10-28 3:51 PM, Diego Avesani wrote:
> >
> > dear Andreas, dear all,
> > The code is quite long. It is a conjugate gradient algorithm to solve a
> > complex system.
> >
> > I have noticed that when a do cycle is small, let's say
> > do i=1,3
> >
> > enddo
> >
> > the results are identical. If the cycle is big, let's say do i=1,20, the
> > results are different and the difference increase with the number of
> > iterations.
> >
> > What do you think?
> >
> >
> >
> > Diego
> >
> >
> > On 28 October 2015 at 22:32, Andreas Schäfer  wrote:
> >
> >> On 22:03 Wed 28 Oct , Diego Avesani wrote:
> >> > When I use a single CPU a get a results, when I use 4 CPU I get
another
> >> > one. I do not think that very is a bug.
> >>
> >> Sounds like a bug to me, most likely in your code.
> >>
> >> > Do you think that these small differences are normal?
> >>
> >> It depends on what small means. Floating point operations in a
> >> computer are generally not commutative, so parallelization may in deed
> >> lead to different results.
> >>
> >> > Is there any way to get the same results? is some align problem?
> >>
> >> Impossible to say without knowing your code.
> >>
> >> Cheers
> >> -Andreas
> >>
> >>
> >> --
> >> ==
> >> Andreas Schäfer
> >> HPC and Grid Computing
> >> Department of Computer Science 3
> >> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> >> +49 9131 85-27910
> >> PGP/GPG key via keyserver
> >> http://www.libgeodecomp.org
> >> ==
> >>
> >> (\___/)
> >> (+'.'+)
> >> (")_(")
> >> This is Bunny. Copy and paste Bunny into your
> >> signature to help him gain world domination!
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
> >>
> >
> >
> >
> > ___
> > users mailing listus...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27934.php
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2015/10/27935.php
> >
> 

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27936.php




Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Gilles Gouaillardet

Diego,

your problem might be numerically unstable, that's why results might 
differ between one run and an other.
floating point numbers have their own restrictions (rounding errors, 
absorption, ...)


are you running single or double precision ?
if you are running single precision, you might give double precision a try.
(if your code is written in fortran, you can use the -r8 flag to treat 
real (single precision) as double)


let me give you a (theoretical) example :

1 / (1.e+100 + 1 - 1.e+100) = ?

if you do this by hand, the answer is 1
now if you ask a computer using floating point numbers to do that, he 
might do


1.e+100 +1 ~= 1.e+100
1.e+100 - 1.e+100 = 0
1 / 0 = Division by zero

an other classic example is
sum = 0
do i=1,n
   sum = sum + 1/i
done

that might look trivial, but it is very hard to get accurate results 
with a computer :

a naive approach will give you inaccurate results

bottom line, you notice differences, and that is normal.
the question is how do you compare your results and how much do they 
differ ?
if you do a binary comparison of the results, it is very likely results 
will differ.
if you compare a and b, and abs(a-b) / abs(a)  is very low (depending on 
you using single vs double precision),

then this is likely the normal behaviour.
now if this number is high, that could be a bug in your code (never say 
never ...) or your algorithm might be numerically unstable (at least for 
your test case)


Cheers,

Gilles

On 10/29/2015 7:58 AM, Diego Avesani wrote:

dear Damin,
I wrote the solver by myself. I have not understood your answer.

Diego


On 28 October 2015 at 23:09, Damien > wrote:


Diego,

There aren't many linear solvers that are bit-consistent, where
the answer is the same no matter how many cores or processes you
use.  Intel's version of Pardiso is bit-consistent and I think
MUMPS 5.0 might be, but that's all.  You should assume your answer
will not be exactly the same as you change the number of cores or
processes, although you should reach the same overall error
tolerance in approximately the same number of iterations.

Damien


On 2015-10-28 3:51 PM, Diego Avesani wrote:

dear Andreas, dear all,
The code is quite long. It is a conjugate gradient algorithm to
solve a complex system.

I have noticed that when a do cycle is small, let's say
do i=1,3

enddo

the results are identical. If the cycle is big, let's say do
i=1,20, the results are different and the difference increase
with the number of iterations.

What do you think?



Diego


On 28 October 2015 at 22:32, Andreas Schäfer mailto:gent...@gmx.de>> wrote:

On 22:03 Wed 28 Oct , Diego Avesani wrote:
> When I use a single CPU a get a results, when I use 4 CPU I
get another
> one. I do not think that very is a bug.

Sounds like a bug to me, most likely in your code.

> Do you think that these small differences are normal?

It depends on what small means. Floating point operations in a
computer are generally not commutative, so parallelization
may in deed
lead to different results.

> Is there any way to get the same results? is some align
problem?

Impossible to say without knowing your code.

Cheers
-Andreas


--
==
Andreas Schäfer
HPC and Grid Computing
Department of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910 
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!

___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27933.php




___ users mailing
list us...@open-mpi.org  Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/10/27934.php



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27935.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/

Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Noam Bernstein
> On Oct 28, 2015, at 6:58 PM, Diego Avesani  wrote:
> 
> dear Damin,
> I wrote the solver by myself. I have not understood your answer. 

Floating point addition is not associative.  Doing a long sum in different 
orders, as might happen when different numbers of nodes do local sums that are 
then summed globally, will not get the same results.  Even mpi global reduce 
sum is not guaranteed by the standard to always get exactly the same answer 
even with all the same arguments and number of tasks, it’s just recommended.

Noam


Noam Bernstein
Center for Materials Physics and Technology
NRL Code 6390
noam.bernst...@nrl.navy.mil
phone: 703 683 2783



Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Bibrak Qamar
Dear Diego,

I will suggest you read the following two. It will give you some good
understanding as to what is happening:

https://en.wikipedia.org/wiki/Butterfly_effect

http://www.amazon.com/The-End-Error-Computing-Computational/dp/1482239868


--Bibrak

On Wed, Oct 28, 2015 at 6:58 PM, Diego Avesani 
wrote:

> dear Damin,
> I wrote the solver by myself. I have not understood your answer.
>
> Diego
>
>
> On 28 October 2015 at 23:09, Damien  wrote:
>
>> Diego,
>>
>> There aren't many linear solvers that are bit-consistent, where the
>> answer is the same no matter how many cores or processes you use.  Intel's
>> version of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but
>> that's all.  You should assume your answer will not be exactly the same as
>> you change the number of cores or processes, although you should reach the
>> same overall error tolerance in approximately the same number of iterations.
>>
>> Damien
>>
>>
>> On 2015-10-28 3:51 PM, Diego Avesani wrote:
>>
>> dear Andreas, dear all,
>> The code is quite long. It is a conjugate gradient algorithm to solve a
>> complex system.
>>
>> I have noticed that when a do cycle is small, let's say
>> do i=1,3
>>
>> enddo
>>
>> the results are identical. If the cycle is big, let's say do i=1,20, the
>> results are different and the difference increase with the number of
>> iterations.
>>
>> What do you think?
>>
>>
>>
>> Diego
>>
>>
>> On 28 October 2015 at 22:32, Andreas Schäfer  wrote:
>>
>>> On 22:03 Wed 28 Oct , Diego Avesani wrote:
>>> > When I use a single CPU a get a results, when I use 4 CPU I get another
>>> > one. I do not think that very is a bug.
>>>
>>> Sounds like a bug to me, most likely in your code.
>>>
>>> > Do you think that these small differences are normal?
>>>
>>> It depends on what small means. Floating point operations in a
>>> computer are generally not commutative, so parallelization may in deed
>>> lead to different results.
>>>
>>> > Is there any way to get the same results? is some align problem?
>>>
>>> Impossible to say without knowing your code.
>>>
>>> Cheers
>>> -Andreas
>>>
>>>
>>> --
>>> ==
>>> Andreas Schäfer
>>> HPC and Grid Computing
>>> Department of Computer Science 3
>>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
>>> +49 9131 85-27910
>>> PGP/GPG key via keyserver
>>> http://www.libgeodecomp.org
>>> ==
>>>
>>> (\___/)
>>> (+'.'+)
>>> (")_(")
>>> This is Bunny. Copy and paste Bunny into your
>>> signature to help him gain world domination!
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
>>>
>>
>>
>>
>> ___
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/10/27934.php
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/10/27935.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27936.php
>


Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Diego Avesani
dear Damin,
I wrote the solver by myself. I have not understood your answer.

Diego


On 28 October 2015 at 23:09, Damien  wrote:

> Diego,
>
> There aren't many linear solvers that are bit-consistent, where the answer
> is the same no matter how many cores or processes you use.  Intel's version
> of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but that's
> all.  You should assume your answer will not be exactly the same as you
> change the number of cores or processes, although you should reach the same
> overall error tolerance in approximately the same number of iterations.
>
> Damien
>
>
> On 2015-10-28 3:51 PM, Diego Avesani wrote:
>
> dear Andreas, dear all,
> The code is quite long. It is a conjugate gradient algorithm to solve a
> complex system.
>
> I have noticed that when a do cycle is small, let's say
> do i=1,3
>
> enddo
>
> the results are identical. If the cycle is big, let's say do i=1,20, the
> results are different and the difference increase with the number of
> iterations.
>
> What do you think?
>
>
>
> Diego
>
>
> On 28 October 2015 at 22:32, Andreas Schäfer  wrote:
>
>> On 22:03 Wed 28 Oct , Diego Avesani wrote:
>> > When I use a single CPU a get a results, when I use 4 CPU I get another
>> > one. I do not think that very is a bug.
>>
>> Sounds like a bug to me, most likely in your code.
>>
>> > Do you think that these small differences are normal?
>>
>> It depends on what small means. Floating point operations in a
>> computer are generally not commutative, so parallelization may in deed
>> lead to different results.
>>
>> > Is there any way to get the same results? is some align problem?
>>
>> Impossible to say without knowing your code.
>>
>> Cheers
>> -Andreas
>>
>>
>> --
>> ==
>> Andreas Schäfer
>> HPC and Grid Computing
>> Department of Computer Science 3
>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
>> +49 9131 85-27910
>> PGP/GPG key via keyserver
>> http://www.libgeodecomp.org
>> ==
>>
>> (\___/)
>> (+'.'+)
>> (")_(")
>> This is Bunny. Copy and paste Bunny into your
>> signature to help him gain world domination!
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
>>
>
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/10/27934.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27935.php
>


Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Damien

Diego,

There aren't many linear solvers that are bit-consistent, where the 
answer is the same no matter how many cores or processes you use. 
Intel's version of Pardiso is bit-consistent and I think MUMPS 5.0 might 
be, but that's all.  You should assume your answer will not be exactly 
the same as you change the number of cores or processes, although you 
should reach the same overall error tolerance in approximately the same 
number of iterations.


Damien

On 2015-10-28 3:51 PM, Diego Avesani wrote:

dear Andreas, dear all,
The code is quite long. It is a conjugate gradient algorithm to solve 
a complex system.


I have noticed that when a do cycle is small, let's say
do i=1,3

enddo

the results are identical. If the cycle is big, let's say do i=1,20, 
the results are different and the difference increase with the number 
of iterations.


What do you think?



Diego


On 28 October 2015 at 22:32, Andreas Schäfer > wrote:


On 22:03 Wed 28 Oct , Diego Avesani wrote:
> When I use a single CPU a get a results, when I use 4 CPU I get
another
> one. I do not think that very is a bug.

Sounds like a bug to me, most likely in your code.

> Do you think that these small differences are normal?

It depends on what small means. Floating point operations in a
computer are generally not commutative, so parallelization may in deed
lead to different results.

> Is there any way to get the same results? is some align problem?

Impossible to say without knowing your code.

Cheers
-Andreas


--
==
Andreas Schäfer
HPC and Grid Computing
Department of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910 
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!

___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27933.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27934.php




Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Diego Avesani
dear Andreas, dear all,
The code is quite long. It is a conjugate gradient algorithm to solve a
complex system.

I have noticed that when a do cycle is small, let's say
do i=1,3

enddo

the results are identical. If the cycle is big, let's say do i=1,20, the
results are different and the difference increase with the number of
iterations.

What do you think?



Diego


On 28 October 2015 at 22:32, Andreas Schäfer  wrote:

> On 22:03 Wed 28 Oct , Diego Avesani wrote:
> > When I use a single CPU a get a results, when I use 4 CPU I get another
> > one. I do not think that very is a bug.
>
> Sounds like a bug to me, most likely in your code.
>
> > Do you think that these small differences are normal?
>
> It depends on what small means. Floating point operations in a
> computer are generally not commutative, so parallelization may in deed
> lead to different results.
>
> > Is there any way to get the same results? is some align problem?
>
> Impossible to say without knowing your code.
>
> Cheers
> -Andreas
>
>
> --
> ==
> Andreas Schäfer
> HPC and Grid Computing
> Department of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
>


Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Andreas Schäfer
On 22:03 Wed 28 Oct , Diego Avesani wrote:
> When I use a single CPU a get a results, when I use 4 CPU I get another
> one. I do not think that very is a bug.

Sounds like a bug to me, most likely in your code.

> Do you think that these small differences are normal?

It depends on what small means. Floating point operations in a
computer are generally not commutative, so parallelization may in deed
lead to different results.

> Is there any way to get the same results? is some align problem?

Impossible to say without knowing your code.

Cheers
-Andreas


-- 
==
Andreas Schäfer
HPC and Grid Computing
Department of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!


signature.asc
Description: Digital signature


[OMPI users] single CPU vs four CPU result differences, is it normal?

2015-10-28 Thread Diego Avesani
Dear all,

I have problem with my code.

When I use a single CPU a get a results, when I use 4 CPU I get another
one. I do not think that very is a bug.

Do you think that these small differences are normal?

Is there any way to get the same results? is some align problem?

Really really thanks

Diego


Re: [OMPI users] How to multiply two matrices?

2015-10-28 Thread George Bosilca
If you want to remain in the traditional methods (complexity n^3), what you
need is a GEMM (generalized matrix multiplication), and it is provided in
C, for dense matrices, by ScaLAPACK. The implementation provided on your
blog is indeed a rough cut, there are better solutions (matrices divided in
blocks k-cyclic) proposed by the SUMMA and PUMMA algorithms.

  George.


On Wed, Oct 28, 2015 at 9:14 AM, Ibrahim Ikhlawi 
wrote:

> Hi,
>
> what is the best way to multiply two matrices with java-openmpi.
> Is the way in this link the right way to do that? Also split the first
> matrix row wise and multiply each one with the second matrix (each row on a
> processor) then collect the results.
>
> Link:
> https://anjanavk.wordpress.com/2011/01/08/matrix-multiplication-in-parallel-using-open-mpi/
>
> regards
> Ibrahim
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/10/27930.php
>


[OMPI users] How to multiply two matrices?

2015-10-28 Thread Ibrahim Ikhlawi
Hi,

what is the best way to multiply two matrices with java-openmpi.
Is the way in this link the right way to do that? Also split the first matrix 
row wise and multiply each one with the second matrix (each row on a processor) 
then collect the results.

Link: 
https://anjanavk.wordpress.com/2011/01/08/matrix-multiplication-in-parallel-using-open-mpi/

regards 
Ibrahim