Re: [OMPI users] Configuration problem or network problem?

2009-07-06 Thread Doug Reeder

Lin,

Try -np 16 and not running on the head node.

Doug Reeder
On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:


Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as  
a headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and  
MPI_Recv ), the data size is about 40KB, and a lot of computings  
which will consume a long time(about 1 sec)in a loop.The co- 
processor in PS3 can take care of the computation, the main  
processor take care of point-to-point communication,so the computing  
and communication can overlap.The communication funtions should  
return much faster than computing function.
My question is that after some circles, the time consumed by  
communication functions in a PS3 will increase heavily, and the  
whole cluster's sync state will corrupt.When I decrease the  
computing time, this situation just disappeare.I am very confused  
about this.
I think there is a mechanism in OpenMPI that cause this case, does  
everyone get this situation before?
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there  
something i should added?

Lin
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Configuration problem or network problem?

2009-07-06 Thread Zou, Lin (GE, Research, Consultant)
Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a
headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and
MPI_Recv ), the data size is about 40KB, and a lot of computings which
will consume a long time(about 1 sec)in a loop.The co-processor in PS3
can take care of the computation, the main processor take care of
point-to-point communication,so the computing and communication can
overlap.The communication funtions should return much faster than
computing function.
My question is that after some circles, the time consumed by
communication functions in a PS3 will increase heavily, and the whole
cluster's sync state will corrupt.When I decrease the computing time,
this situation just disappeare.I am very confused about this.
I think there is a mechanism in OpenMPI that cause this case, does
everyone get this situation before? 
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
something i should added?
Lin


Re: [OMPI users] MPI and C++ (Boost)

2009-07-06 Thread Luis Vitorio Cargnini

Hi Raymond, thanks for your answer
Le 09-07-06 à 21:16, Raymond Wan a écrit :


I've used Boost MPI before and it really isn't that bad and  
shouldn't be seen as "just another library".  Many parts of Boost  
are on their way to being part of the standard and are discussed and  
debated on.  And so, it isn't the same as going to some random  
person's web page and downloading their library/template. Of course,  
it takes time to make it into the standard and I'm not entirely sure  
if everything will (probably not).


(One "annoying" thing about Boost MPI is that you have to compile  
it...if you are distributing your code, end-users might find that  
bothersome...oh, and serialization as well.)




we have a common factor, I'm not exactly distributing, but I'll add a  
dependency into my code, something that bothers me.


One suggestion might be to make use of Boost and once you got your  
code working,  start changing it back.  At least you will have a  
working program to compare against.  Kind of like writing a  
prototype first...




Your suggestion is a great and interesting idea. I only have the fear  
to get used to the Boost and could not get rid of Boost anymore,  
because one thing is sure the abstraction added by Boost is  
impressive, it turn the things much less painful like MPI to be  
implemented using C++, also the serialization inside Boost::MPI  
already made by Boost to use MPI is astonishing attractive, and of  
course the possibility to add new types like classes to be able to  
send objects through MPI_Send of Boost, this is certainly attractive,  
but again I do not want to get dependent of a library as I said, this  
is my major concern.

.

Ray


smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: Ceci est une signature électronique PGP


Re: [OMPI users] MPI and C++ (Boost)

2009-07-06 Thread Raymond Wan


Hi Luis,


Luis Vitorio Cargnini wrote:

Thanks, but I really do not want to use Boost.
Is easier ? certainly is, but I want to make it using only MPI itself 
and not been dependent of a Library, or templates like the majority of 
boost a huge set of templates and wrappers for different libraries, 
implemented in C, supplying a wrapper for C++.
I admit Boost is a valuable tool, but in my case, as much independent I 
could be from additional libs, better.



I've used Boost MPI before and it really isn't that bad and shouldn't be seen as 
"just another library".  Many parts of Boost are on their way to being part of 
the standard and are discussed and debated on.  And so, it isn't the same as 
going to some random person's web page and downloading their library/template. 
Of course, it takes time to make it into the standard and I'm not entirely sure 
if everything will (probably not).


(One "annoying" thing about Boost MPI is that you have to compile it...if you 
are distributing your code, end-users might find that bothersome...oh, and 
serialization as well.)


One suggestion might be to make use of Boost and once you got your code working, 
 start changing it back.  At least you will have a working program to compare 
against.  Kind of like writing a prototype first...


Ray



[OMPI users] Segfault when using valgrind

2009-07-06 Thread Justin Luitjens
Hi,  I am attempting to debug a memory corruption in an mpi program using
valgrind.  Howver, when I run with valgrind I get semi-random segfaults and
valgrind messages with the openmpi library.  Here is an example of such a
seg fault:

==6153==
==6153== Invalid read of size 8
==6153==at 0x19102EA0: (within
/usr/lib/openmpi/lib/openmpi/mca_btl_sm.so)
==6153==by 0x182ABACB: (within
/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so)
==6153==by 0x182A3040: (within
/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so)
==6153==by 0xB425DD3: PMPI_Isend (in
/usr/lib/openmpi/lib/libmpi.so.0.0.0)
==6153==by 0x7B83DA8: int Uintah::SFC::MergeExchange(int, std::vector >&,
std::vector >&,
std::vector >&) (SFC.h:2989)
==6153==by 0x7B84A8F: void Uintah::SFC::Batchers(std::vector >&,
std::vector >&,
std::vector >&) (SFC.h:3730)
==6153==by 0x7B8857B: void Uintah::SFC::Cleanup(std::vector >&,
std::vector >&,
std::vector >&) (SFC.h:3695)
==6153==by 0x7B88CC6: void Uintah::SFC::Parallel0<3, unsigned
char>() (SFC.h:2928)
==6153==by 0x7C00AAB: void Uintah::SFC::Parallel<3, unsigned
char>() (SFC.h:1108)
==6153==by 0x7C0EF39: void Uintah::SFC::GenerateDim<3>(int)
(SFC.h:694)
==6153==by 0x7C0F0F2: Uintah::SFC::GenerateCurve(int)
(SFC.h:670)
==6153==by 0x7B30CAC:
Uintah::DynamicLoadBalancer::useSFC(Uintah::Handle const&,
int*) (DynamicLoadBalancer.cc:429)
==6153==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
^G^G^GThread "main"(pid 6153) caught signal SIGSEGV at address (nil)
(segmentation violation)

Looking at the code for our isend at SFC.h:298 does not seem to have any
errors:

=
  MergeInfo myinfo,theirinfo;

  MPI_Request srequest, rrequest;
  MPI_Status status;

  myinfo.n=n;
  if(n!=0)
  {
myinfo.min=sendbuf[0].bits;
myinfo.max=sendbuf[n-1].bits;
  }
  //cout << rank << " n:" << n << " min:" << (int)myinfo.min << "max:" <<
(int)myinfo.max << endl;

  MPI_Isend(,sizeof(MergeInfo),MPI_BYTE,to,0,Comm,);
==

myinfo is a struct located on the stack, to is the rank of the processor
that the message is being sent to, and srequest is also on the stack.  When
I don't run with valgrind my program runs past this point just fine.

I am currently using openmpi 1.3 from the debian unstable branch.  I also
see the same type of segfault in a different portion of the code involving
an MPI_Allgather which can be seen below:

==
==22736== Use of uninitialised value of size 8
==22736==at 0x19104775: mca_btl_sm_component_progress (opal_list.h:322)
==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99)
==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
(coll_tuned_util.c:55)
==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
(coll_tuned_util.h:60)
==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537)
==22736==by 0x6465457:
Uintah::Grid::problemSetup(Uintah::Handle const&,
Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
(SimulationController.cc:243)
==22736==by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)
==22736==
==22736== Invalid read of size 8
==22736==at 0x19104775: mca_btl_sm_component_progress (opal_list.h:322)
==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99)
==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
(coll_tuned_util.c:55)
==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
(coll_tuned_util.h:60)
==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537)
==22736==by 0x6465457:
Uintah::Grid::problemSetup(Uintah::Handle const&,
Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
(SimulationController.cc:243)
==22736==by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)

Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers

2009-07-06 Thread Luis Vitorio Cargnini

Thanks, but I really do not want to use Boost.
Is easier ? certainly is, but I want to make it using only MPI itself  
and not been dependent of a Library, or templates like the majority of  
boost a huge set of templates and wrappers for different libraries,  
implemented in C, supplying a wrapper for C++.
I admit Boost is a valuable tool, but in my case, as much independent  
I could be from additional libs, better.


Le 09-07-06 à 04:49, Number Cruncher a écrit :


I strongly suggest you take a look at boost::mpi, 
http://www.boost.org/doc/libs/1_39_0/doc/html/mpi.html

It handles serialization transparently and has some great natural  
extensions to the MPI C interface for C++, e.g.


bool global = all_reduce(comm, local, logical_and());

This sets "global" to "local_0 && local_1 && ... && local_N-1"


Luis Vitorio Cargnini wrote:
Thank you very much John, the explanation of [0], was the kind of  
think that I was looking for, thank you very much.

This kind of approach solves my problems.
Le 09-07-05 à 22:20, John Phillips a écrit :

Luis Vitorio Cargnini wrote:

Hi,
So, after some explanation I start to use the bindings of C  
inside my C++ code, then comme my new doubt:
How to send a object through Send and Recv of MPI ? Because the  
types are CHAR, int, double, long double, you got.

Someone have any suggestion ?
Thanks.
Vitorio.


Vitorio,

If you are sending collections of built in data types (ints,  
doubles, that sort of thing), then it may be easy, and it isn't  
awful. You want the data in a single stretch of continuous memory.  
If you are using an STL vector, this is already true. If you are  
using some other container, then no guarantees are provided for  
whether the memory is continuous.


Imagine you are using a vector, and you know the number of entries  
in that vector. You want to send that vector to processor 2 on the  
world communicator with tag 0. Then, the code snippet would be;


std::vector v;

... code that fills v with something ...

int send_error;

send_error = MPI_Send([0], v.size(), MPI_DOUBLE, 2,  
0,  MPI_COMM_WORLD);


The [0] part provides a pointer to the first member of the array  
that holds the data for the vector. If you know how long it will  
be, you could use that constant instead of using the v.size()  
function. Knowing the length also simplifies the send, since the  
remote process also knows the length and doesn't need a separate  
send to provide that information.


It is also possible to provide a pointer to the start of storage  
for the character array that makes up a string. Both of these  
legacy friendly interfaces are part of the standard, and should be  
available on any reasonable implementation of the STL.


If you are using a container that is not held in continuous  
memory, and the data is all of a single built in data type, then  
you need to first serialize the data into a block of continuous  
memory before sending it. (If the data block is large, then you  
may actually have to divide it into pieces and send them  
separately.)


If the data is not a block of all a single built in type, (It may  
include several built in types, or it may be a custom data class  
with complex internal structure, for example.) then the  
serialization problem gets harder. In this case, look at the MPI  
provided facilities for dealing with complex data types and  
compare to the boost provided facilities. There is an initial  
learning curve for the boost facilities, but in the long run it  
may provide a substantial development time savings if you need to  
transmit and receive several complex types. In most cases, the run  
time cost is small for using the boost facilities. (according to  
the tests run during library development and documented with the  
library)


   John Phillips

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: Ceci est une signature électronique PGP


Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers

2009-07-06 Thread Luis Vitorio Cargnini

just one additional and if I have:
vector< vector > x;

How to use the MPI_Send

MPI_Send([0][0], x[0].size(),MPI_DOUBLE, 2, 0, MPI_COMM_WORLD);

?

Le 09-07-05 à 22:20, John Phillips a écrit :


Luis Vitorio Cargnini wrote:

Hi,
So, after some explanation I start to use the bindings of C inside  
my C++ code, then comme my new doubt:
How to send a object through Send and Recv of MPI ? Because the  
types are CHAR, int, double, long double, you got.

Someone have any suggestion ?
Thanks.
Vitorio.


 Vitorio,

 If you are sending collections of built in data types (ints,  
doubles, that sort of thing), then it may be easy, and it isn't  
awful. You want the data in a single stretch of continuous memory.  
If you are using an STL vector, this is already true. If you are  
using some other container, then no guarantees are provided for  
whether the memory is continuous.


 Imagine you are using a vector, and you know the number of entries  
in that vector. You want to send that vector to processor 2 on the  
world communicator with tag 0. Then, the code snippet would be;


std::vector v;

... code that fills v with something ...

int send_error;

send_error = MPI_Send([0], v.size(), MPI_DOUBLE, 2, 0,
  MPI_COMM_WORLD);

 The [0] part provides a pointer to the first member of the array  
that holds the data for the vector. If you know how long it will be,  
you could use that constant instead of using the v.size() function.  
Knowing the length also simplifies the send, since the remote  
process also knows the length and doesn't need a separate send to  
provide that information.


 It is also possible to provide a pointer to the start of storage  
for the character array that makes up a string. Both of these legacy  
friendly interfaces are part of the standard, and should be  
available on any reasonable implementation of the STL.


 If you are using a container that is not held in continuous memory,  
and the data is all of a single built in data type, then you need to  
first serialize the data into a block of continuous memory before  
sending it. (If the data block is large, then you may actually have  
to divide it into pieces and send them separately.)


 If the data is not a block of all a single built in type, (It may  
include several built in types, or it may be a custom data class  
with complex internal structure, for example.) then the  
serialization problem gets harder. In this case, look at the MPI  
provided facilities for dealing with complex data types and compare  
to the boost provided facilities. There is an initial learning curve  
for the boost facilities, but in the long run it may provide a  
substantial development time savings if you need to transmit and  
receive several complex types. In most cases, the run time cost is  
small for using the boost facilities. (according to the tests run  
during library development and documented with the library)


John Phillips

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: Ceci est une signature électronique PGP


[OMPI users] any way to get serial time on head node?

2009-07-06 Thread Ross Boylan
Let total time on my slot 0 process be S+C+B+I
= serial computations + communication + busy wait + idle
Is there a way to find out S?
S+C would probably also be useful, since I assume C is low.

The problem is that I = 0, roughly, and B is big.  Since B is big, the
usual process timing methods don't work.

If B all went to "system" as opposed to "user" time I could use the
latter, but I don't think that's the case.  Can anyone confirm that?

If S is big, I might be able to gain by parallelizing in a different
way.  By S I mean to refer to serial computation that is part of my
algorithm, rather than the technical fact that all the computation is
serial on a given slot.

I'm running R/RMPI.

Thanks.
Ross



Re: [OMPI users] Segmentation fault - Address not mapped

2009-07-06 Thread Catalin David
On Mon, Jul 6, 2009 at 3:26 PM, jody wrote:
> Hi
> Are you also sure that you have the same version of Open-MPI
> on every machine of your cluster, and that it is the mpicxx of this
> version that is called when you run your program?
> I ask because you mentioned that there was an old version of Open-MPI
> present... die you remove this?
>
> Jody

Hi

I have just logged in a few other boxes and they all mount my home
folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get
what I expect to get, but this might be because I have set these
variables in the .bashrc file. So, I tried compiling/running like this
 ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace,
but I get the same errors.

As for the previous version, I don't have root access, therefore I was
not able to remove it. I was just trying to outrun it by setting the
$PATH variable to point first at my local installation.


Catalin


-- 

**
Catalin David
B.Sc. Computer Science 2010
Jacobs University Bremen

Phone: +49-(0)1577-49-38-667

College Ring 4, #343
Bremen, 28759
Germany
**


Re: [OMPI users] Segmentation fault - Address not mapped

2009-07-06 Thread jody
Hi
Are you also sure that you have the same version of Open-MPI
on every machine of your cluster, and that it is the mpicxx of this
version that is called when you run your program?
I ask because you mentioned that there was an old version of Open-MPI
present... die you remove this?

Jody

On Mon, Jul 6, 2009 at 3:24 PM, Catalin David wrote:
> On Mon, Jul 6, 2009 at 2:14 PM, Dorian Krause wrote:
>> Hi,
>>
>>>
>>> //Initialize step
>>> MPI_Init(,);
>>> //Here it breaks!!! Memory allocation issue!
>>> MPI_Comm_size(MPI_COMM_WORLD, );
>>> std::cout<<"I'm here"<>> MPI_Comm_rank(MPI_COMM_WORLD, );
>>>
>>> When trying to debug via gdb, the problem seems to be:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base
>>> for "ompi_comm_invalid".) at communicator.h:261
>>> 261     communicator.h: No such file or directory.
>>>        in communicator.h
>>>
>>> which might indicate a problem with paths. For now, my LD_LIBRARY_PATH
>>> is set to "/users/cluster/cdavid/local/lib/" (the local folder in my
>>> home folder emulates the directory structure of the / folder).
>>>
>>
>> and your PATH is also okay? (I see that you use plain mpicxx in the build)
>> ...
>>
>
> Hi again!
>
> This is the output of some commands in the terminal:
>
> cdavid@denali:~$ which mpicxx
> ~/local/bin/mpicxx
> cdavid@denali:~$ echo $PATH
> /users/cluster/cdavid/local/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/scali/bin:/opt/scali/sbin:/opt/scali/contrib/pbs/bin:/users/cluster/cdavid/bin
> cdavid@denali:~$ echo $LD_LIBRARY_PATH
> /users/cluster/cdavid/local/lib/
> cdavid@denali:~$ locate communicator.h
> cdavid@denali:~$
>
> I don't see anything wrong with the path (I added the first part in
> order to make it look there first). I even tried adding
> "-L/users/cluster/cdavid/local/lib -lmpi
> -I/users/cluster/cdavid/local/include" to the compiler invocation, in
> hope of an improvement. So far, nothing.
>
> Regards,
>
> Catalin
>
>
> --
>
> **
> Catalin David
> B.Sc. Computer Science 2010
> Jacobs University Bremen
>
> Phone: +49-(0)1577-49-38-667
>
> College Ring 4, #343
> Bremen, 28759
> Germany
> **
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Segmentation fault - Address not mapped

2009-07-06 Thread Catalin David
On Mon, Jul 6, 2009 at 2:14 PM, Dorian Krause wrote:
> Hi,
>
>>
>> //Initialize step
>> MPI_Init(,);
>> //Here it breaks!!! Memory allocation issue!
>> MPI_Comm_size(MPI_COMM_WORLD, );
>> std::cout<<"I'm here"<> MPI_Comm_rank(MPI_COMM_WORLD, );
>>
>> When trying to debug via gdb, the problem seems to be:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base
>> for "ompi_comm_invalid".) at communicator.h:261
>> 261     communicator.h: No such file or directory.
>>        in communicator.h
>>
>> which might indicate a problem with paths. For now, my LD_LIBRARY_PATH
>> is set to "/users/cluster/cdavid/local/lib/" (the local folder in my
>> home folder emulates the directory structure of the / folder).
>>
>
> and your PATH is also okay? (I see that you use plain mpicxx in the build)
> ...
>

Hi again!

This is the output of some commands in the terminal:

cdavid@denali:~$ which mpicxx
~/local/bin/mpicxx
cdavid@denali:~$ echo $PATH
/users/cluster/cdavid/local/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/scali/bin:/opt/scali/sbin:/opt/scali/contrib/pbs/bin:/users/cluster/cdavid/bin
cdavid@denali:~$ echo $LD_LIBRARY_PATH
/users/cluster/cdavid/local/lib/
cdavid@denali:~$ locate communicator.h
cdavid@denali:~$

I don't see anything wrong with the path (I added the first part in
order to make it look there first). I even tried adding
"-L/users/cluster/cdavid/local/lib -lmpi
-I/users/cluster/cdavid/local/include" to the compiler invocation, in
hope of an improvement. So far, nothing.

Regards,

Catalin


-- 

**
Catalin David
B.Sc. Computer Science 2010
Jacobs University Bremen

Phone: +49-(0)1577-49-38-667

College Ring 4, #343
Bremen, 28759
Germany
**



Re: [OMPI users] Segmentation fault - Address not mapped

2009-07-06 Thread Dorian Krause

Hi,



//Initialize step
MPI_Init(,);
//Here it breaks!!! Memory allocation issue!
MPI_Comm_size(MPI_COMM_WORLD, );
std::cout<<"I'm here"<

Re: [OMPI users] (no subject)

2009-07-06 Thread Josh Hursey
The MPI standard does not define any functions for taking checkpoints  
from the application.


The checkpoint/restart work in Open MPI is a command line driven,  
transparent solution. So the application does not have change in any  
way, and the user (or scheduler) must initiate the checkpoint from the  
command line (on the same node as the mpirun process).


We have experimented with adding Open MPI specific checkpoint/restart  
interfaces in the context of the MPI Forum. These prototypes have not  
made it to the Open MPI trunk. Some information about that particular  
development is at the link below:

  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Quiescence

Best,
Josh

On Jul 6, 2009, at 12:07 AM, Mallikarjuna Shastry wrote:



dear sir/madam

what are the mpi functins used for taking checkpoint and restart  
within applicaion in mpi programs and where do i get these functions  
from ?


with regards

mallikarjuna shastry



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Segmentation fault - Address not mapped

2009-07-06 Thread Catalin David
Dear all,

I have recently started working on a project using OpenMPI. Basically,
I have been given some c++ code, a cluster to play with and a deadline
in order to make the c++ code run faster. The cluster was a bit
crowded, so I started working on my laptop (g++ 4.3.3 -- Ubuntu repos,
OpenMPI 1.3.2 -- compiled with no options ) and after one week I
actually had something that was running on my computer, therefore
decided to move to the cluster. Since the cluster is very old and it
was using g++ 3.2 and an old version of OpenMPI, I decided to install
both of them from source in my home folder (g++ 4.4, OpenMPI 1.3.2).
The issue is that when I run the program (after being compiled
flawless on the machine), I get these error messages:

[denali:30134] *** Process received signal ***
[denali:30134] Signal: Segmentation fault (11)
[denali:30134] Signal code: Address not mapped (1)
[denali:30134] Failing at address: 0x18

(more in the attached file -- mpirun -np 4 ray-trace)

All this morning, I have gone through the mailing lists, found people
experiencing my problems, but their solution did not work for me. By
using simple debugging (cout), I was able to determine where the error
comes from:

//Initialize step
MPI_Init(,);
//Here it breaks!!! Memory allocation issue!
MPI_Comm_size(MPI_COMM_WORLD, );
std::cout<<"I'm here"<

[OMPI users] Parallel I/O Usage

2009-07-06 Thread Manuel Holtgrewe
Hi,

do I understand the MPI-2 Parallel I/O correctly (C++)?

After opening a file with MPI::File::Open, I can use Read_at on the
returned file object. I give offsets in bytes and I can perform random
access reads from any process at any point of the file without
violating correctness (although the performance might/should/will be
better using views):

MPI::File f = MPI::File::Open(MPI::COMM_WORLD, filename, MPI::MODE_RDONLY,
  MPI::INFO_NULL);
// ...
MPI::Offset pos_in_file = ...;
// ...
f.Read_at(pos_in_file, buffer, local_n + 1, MPI::INTEGER);
// ...
f.Close();

I have some problems with the program reading invalid data and want to
make sure I am actually using parallel I/O the right way.

Thanks,
-- Manuel Holtgrewe


Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers

2009-07-06 Thread Number Cruncher
I strongly suggest you take a look at boost::mpi, 
http://www.boost.org/doc/libs/1_39_0/doc/html/mpi.html


It handles serialization transparently and has some great natural 
extensions to the MPI C interface for C++, e.g.


bool global = all_reduce(comm, local, logical_and());

This sets "global" to "local_0 && local_1 && ... && local_N-1"


Luis Vitorio Cargnini wrote:
Thank you very much John, the explanation of [0], was the kind of 
think that I was looking for, thank you very much.

This kind of approach solves my problems.

Le 09-07-05 à 22:20, John Phillips a écrit :


Luis Vitorio Cargnini wrote:

Hi,
So, after some explanation I start to use the bindings of C inside my 
C++ code, then comme my new doubt:
How to send a object through Send and Recv of MPI ? Because the types 
are CHAR, int, double, long double, you got.

Someone have any suggestion ?
Thanks.
Vitorio.


 Vitorio,

 If you are sending collections of built in data types (ints, doubles, 
that sort of thing), then it may be easy, and it isn't awful. You want 
the data in a single stretch of continuous memory. If you are using an 
STL vector, this is already true. If you are using some other 
container, then no guarantees are provided for whether the memory is 
continuous.


 Imagine you are using a vector, and you know the number of entries in 
that vector. You want to send that vector to processor 2 on the world 
communicator with tag 0. Then, the code snippet would be;


std::vector v;

... code that fills v with something ...

int send_error;

send_error = MPI_Send([0], v.size(), MPI_DOUBLE, 2, 0,
  MPI_COMM_WORLD);


 The [0] part provides a pointer to the first member of the array 
that holds the data for the vector. If you know how long it will be, 
you could use that constant instead of using the v.size() function. 
Knowing the length also simplifies the send, since the remote process 
also knows the length and doesn't need a separate send to provide that 
information.


 It is also possible to provide a pointer to the start of storage for 
the character array that makes up a string. Both of these legacy 
friendly interfaces are part of the standard, and should be available 
on any reasonable implementation of the STL.


 If you are using a container that is not held in continuous memory, 
and the data is all of a single built in data type, then you need to 
first serialize the data into a block of continuous memory before 
sending it. (If the data block is large, then you may actually have to 
divide it into pieces and send them separately.)


 If the data is not a block of all a single built in type, (It may 
include several built in types, or it may be a custom data class with 
complex internal structure, for example.) then the serialization 
problem gets harder. In this case, look at the MPI provided facilities 
for dealing with complex data types and compare to the boost provided 
facilities. There is an initial learning curve for the boost 
facilities, but in the long run it may provide a substantial 
development time savings if you need to transmit and receive several 
complex types. In most cases, the run time cost is small for using the 
boost facilities. (according to the tests run during library 
development and documented with the library)


John Phillips

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] (no subject)

2009-07-06 Thread Mallikarjuna Shastry

dear sir/madam

what are the mpi functins used for taking checkpoint and restart within 
applicaion in mpi programs and where do i get these functions from ?

with regards

mallikarjuna shastry