Re: [OMPI users] Open MPI data transfer error

2010-11-06 Thread Jed Brown
On Sat, Nov 6, 2010 at 18:00, Jack Bryan  wrote:

>  Thanks,
>
> About my MPI program bugs:
>
> I used GDB and got the error:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0:  0x003a31c62184 in fwrite () from /lib64/libc.so.6
>

Clearly fwrite was called with invalid parameters, but you don't give enough
information for anyone to explain why.  Compile your program with debugging
symbols and print the whole stack trace, e.g. with "backtrace full".  Also
try valgrind.


> class CNSGA2
> {
> allocate mem for var;
> some deallocate statement;
> some pointers;
> evaluate(); // it is a function
> }
>

This isn't even close to valid code since you can't have statements in the
suggested scope.

main()
> {
> CNSGA2* nsga2a = new CNSGA2(true); // true or false are only for different
> constructors
> CNSGA2* nsga2b = new CNSGA2(false);
>  if (myRank == 0) // scope1
> {
> initialize the objects of nsga2a or nsga2b;
>  }
>  broadcast some parameters, which are got from scope1.
>
> According to the parameters, define a datatype (myData) so that all workers
> use that to do recv and send.
>
>  if (myRank == 0) // scope2
> {
> send out myData to workers by the datatype defined above;
>  }
>  if (myRank != 0)
> {
> newCNSGA2 myNsga2;
> recv data from master and work on the recved data;
> myNsga2.evaluate(recv data);
> send back results;
> }
>
> }
>

According to the above, rank 0 never receives the results from before.  You
should paste valid code.

Jed


Re: [OMPI users] Open MPI data transfer error

2010-11-06 Thread Jack Bryan

Thanks,
About my MPI program bugs: 
I used GDB and got the error:
Program received signal SIGSEGV, Segmentation fault.0:  0x003a31c62184 in 
fwrite () from /lib64/libc.so.6
also error :
1:  Program received signal SIGABRT, Aborted.0:  I am rank 0, I have sent 
4tasks out of total tasks1:  0x003a31c30265 in raise () from 
/lib64/libc.so.6

It may be caused by a class usage.
My program master-worker MPI framework: 
class CNSGA2{   allocate mem for var;   some deallocate statement;  some 
pointers;  evaluate(); // it is a function}
CNSGA2::CNSGA2(){}
class newCNSGA2:public CNSGA2{public:   newCNSGA2(){cout << " constructor for 
newCNSGA2 \n\n" << endl;};~newCNSGA2(){cout << " destructor for 
newCNSGA2 \n\n" << endl;};};

main(){ CNSGA2* nsga2a = new CNSGA2(true); // true or false are only for 
different constructors CNSGA2* nsga2b = new CNSGA2(false); if 
(myRank == 0) // scope1  {   initialize the objects of nsga2a 
or nsga2b; }   broadcast some 
parameters, which are got from scope1. 
According to the parameters, define a datatype (myData) so that all 
workers use that to do  recv and send. 
if (myRank == 0) // scope2  {   send out myData 
to workers by the datatype defined above;   }   
if (myRank != 0){   newCNSGA2 myNsga2;  recv 
data from master and work on the recved data;  
myNsga2.evaluate(recv data);send back results;  }
}

If I declear objects (nsga2a nsga2b ) in scope 1 , they cannot be visible in 
scope2. But, actually, the two objects are only used in master not in workers.
Workers only needs to call  evaluate() from the class CNSGA2. 
This is why I used inheritance to define a new class newCNSGA2. 
But, the problem is there some memory allocation and deallocation inside class 
CNSGA2. 
The new class newCNSGA2 donot need these memory allocation and deallocation. 
If I put the delaration of CNSGA2* nsga2a or CNSGA2* nsga2b in scope1, they are 
not visible in scope 2. 

I cannot combine the two scopes because the datatype need them to de defined so 
that all workers can see them and use them to do send and recv. 

Any help is appreciated. 
Jack
Nov. 6 2010

> Date: Fri, 5 Nov 2010 14:55:32 -0800
> From: eugene@oracle.com
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI data transfer error
> 
> Debugging is not a straightforward task.  Even posting the code doesn't 
> necessarily help (since no one may be motivated to help or they can't 
> reproduce the problem or...).  You'll just have to try different things 
> and see what works for you.  Another option is to trace the MPI calls.  
> If a process sends a message, dump out the MPI_Send() arguments.  When a 
> receiver receives, correspondingly dump those arguments.  Etc.  This 
> might be a way of seeing what the program is doing in terms of MPI and 
> thereby getting to suggestion B below.
> 
> How do you trace and sort through the resulting data?  That's another 
> tough question.  Among other things, if you can't find a tool that fits 
> your needs, you can use the PMPI layer to write wrappers.  Writing 
> wrappers is like inserting printf() statements, but doesn't quite have 
> the same amount of moral shame associated with it!
> 
> Prentice Bisbal wrote:
> 
> >Choose one
> >
> >A) Post only the relevant sections of the code. If you have syntax
> >error, it should be in the Send and Receive calls, or one of the lines
> >where the data is copied or read from the array/buffer/whatever that
> >you're sending or receiving.
> >
> >B) Try reproducing your problem in a toy program that has only enough
> >code to reproduce your problem. For example, create an array, populate
> >it with data, send it, and then on the receiving end, receive it, and
> >print it out. Something simple like that. I find when I do that, I
> >usually find the error in my code.
> >
> >Jack Bryan wrote:
> >  
> >
> >>But, my code is too long to be posted. 
> >>dozens of files, thousands of lines. 
> >>Do you have better ideas ? 
> >>Any help is appreciated. 
> >>
> >>Nov. 5 2010
> >>
> >>From: solarbik...@gmail.com
> >>Date: Fri, 5 Nov 2010 11:20:57 -0700
> >>To: us...@open-mpi.org
> >>Subject: Re: [OMPI users] Open MPI data transfer error
> >>
> >>As Prentice said, we can't help you without seeing your code.  openMPI
> >>has stood many trials from many programmers, with many bugs ironed out.
> >>So typically it is unlikely openMPI is the source of your error. 
> >>Without seeing your code the only logical conclusion is that something
> >>is wrong with your programming.
> >>
> >>On Fri, Nov 5, 2010 at 10:52 AM, Prentice Bisbal  >>> wrote:
> >>
> >>We can't help you with your coding problem without seeing your code.
> >>
> >>
> >>   

[OMPI users] Open MPI 1.5 is not detecting oversubscription

2010-11-06 Thread Jed Brown
Previous versions would set mpi_yield_when_idle automatically when
oversubscribing a node.  I assume this behavior was not intentionally
changed, but the parameter is not being set in cases of oversubscription,
with or without an explicit hostfile.

Jed