Re: [OMPI users] Open MPI program cannot complete

2010-10-27 Thread Jack Bryan

thanksI got :-bash-3.2$ padb -Ormgr=pbs -Q 48516.cystorm2$VAR1 = {};Job 
48516.cluster  is not activeActually, the job is running. Any help is 
appreciated. thanksJinxu DingOct. 27 2010
> Subject: Re: [OMPI users] Open MPI program cannot complete
> From: ash...@pittman.co.uk
> Date: Tue, 26 Oct 2010 23:18:57 +0100
> To: dtustud...@hotmail.com
> 
> 
> The "^M: bad interpreter" tells me that you've downloaded the file in Windows 
> and have got dos-based new-lines in the file.
> 
> Assuming it's installed on your machine run "dos2unix padb" and it'll remove 
> them, if that doesn't work save the file using a unix based email program.  I 
> hope this helps you when we finally get it working!
> 
> Ashley.
> 
> On 26 Oct 2010, at 22:14, Jack Bryan wrote:
> 
> > Hi, 
> > 
> > I put your attahced padb in mypath and also set it up in env variable.
> > I got this: 
> > 
> > -bash-3.2$ padb -Ormgr=pbs -Q 48494.cystorm2
> > -bash: /mypath/padb_patch_2010_10_26/padb: /usr/bin/perl^M: bad 
> > interpreter: No such file or directory
> > 
> > Any help is appreciated. 
> > 
> > thanks
> > 
> > Jack 
> > 
> > Oct. 26 2010
> > 
> > 
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> > From: ash...@pittman.co.uk
> > Date: Tue, 26 Oct 2010 08:39:56 +0100
> > CC: tomview...@yahoo.com
> > To: dtustud...@hotmail.com
> > 
> >  
> > Sorry, I forgot to attach it last night.
> >  
> > 
> > 
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-26 Thread Jack Bryan

thanks
I got :
-bash-3.2$ padb -Ormgr=pbs -Q 48516.cystorm2$VAR1 = {};Job 48516.cluster  is 
not active
Actually, the job is running. 
Any help is appreciated. 
thanksJinxu Ding
Oct. 26 2010
> Subject: Re: [OMPI users] Open MPI program cannot complete
> From: ash...@pittman.co.uk
> Date: Tue, 26 Oct 2010 23:18:57 +0100
> To: dtustud...@hotmail.com
> 
> 
> The "^M: bad interpreter" tells me that you've downloaded the file in Windows 
> and have got dos-based new-lines in the file.
> 
> Assuming it's installed on your machine run "dos2unix padb" and it'll remove 
> them, if that doesn't work save the file using a unix based email program.  I 
> hope this helps you when we finally get it working!
> 
> Ashley.
> 
> On 26 Oct 2010, at 22:14, Jack Bryan wrote:
> 
> > Hi, 
> > 
> > I put your attahced padb in mypath and also set it up in env variable.
> > I got this: 
> > 
> > -bash-3.2$ padb -Ormgr=pbs -Q 48494.cystorm2
> > -bash: /mypath/padb_patch_2010_10_26/padb: /usr/bin/perl^M: bad 
> > interpreter: No such file or directory
> > 
> > Any help is appreciated. 
> > 
> > thanks
> > 
> > Jack 
> > 
> > Oct. 26 2010
> > 
> > 
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> > From: ash...@pittman.co.uk
> > Date: Tue, 26 Oct 2010 08:39:56 +0100
> > CC: tomview...@yahoo.com
> > To: dtustud...@hotmail.com
> > 
> >  
> > Sorry, I forgot to attach it last night.
> >  
> > 
> > 
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-26 Thread Jack Bryan


thanksBut, I cannot see the attachment in the email. Would you please send me 
again ? and also copy to another my email:tomviewisu@yahoo.comthanksOct. 25 2010
From: dtustud...@hotmail.com
To: ash...@pittman.co.uk
Subject: RE: [OMPI users] Open MPI program cannot complete
List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 16:53:32 -0600












thanks
But, I cannot see the attachment in the email. 

Would you please send me again ? 
and also copy to another my email:
tomview...@yahoo.com
thanks
Oct. 25 2010

> Subject: Re: [OMPI users] Open MPI program cannot complete
> From: ash...@pittman.co.uk
> Date: Mon, 25 Oct 2010 23:41:32 +0100
> To: dtustud...@hotmail.com
> 
> 
> Thanks, that's tells me a lot.
> 
> Try the attached padb, I've added the patch for you and remove the -w option. 
>  Can you run it and send me back the output please.
> 
> Ashley.
> 
> On 25 Oct 2010, at 23:29, Jack Bryan wrote:
> 
> > Thanks
> > 
> > Here is the 
> > 
> > -bash-3.2$ qstat -fB
> > Server: clusterName
> > server_state = Active
> > scheduling = True
> > total_jobs = 26
> > state_count = Transit:0 Queued:7 Held:0 Waiting:0 Running:18 Exiting:0
> > acl_hosts = clustername
> > default_queue = normal
> > log_events = 511
> > mail_from = adm
> > query_other_jobs = True
> > resources_assigned.nodect = 246
> > scheduler_iteration = 600
> > node_check_rate = 150
> > tcp_timeout = 6
> > mom_job_sync = True
> > pbs_version = 2.4.2
> > keep_completed = 300
> > submit_hosts = clusterName
> > next_job_number = 48293
> > net_counter = 2 9 6
> > 
> > -bash-3.2$ qstat -w -n
> > qstat: invalid option -- w
> > 
> > 
> > Which line should I put the 
> > -
> > --- padb (revision 401)
> > +++ padb (working copy)
> > @@ -2824,6 +2824,7 @@
> > foreach my $server (@servers) {
> > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > }
> > + print Dumper \%pbs_tabjobs;
> > return \%pbs_tabjobs;
> > }
> > 
> > 
> > in the bin file   padb
> > 
> > Any help is appreciated.
> > 
> > thanks
> > 
> > Jack 
> > 
> > Oct. 25 2010
> > 
> > 
> > 
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > From: ash...@pittman.co.uk
> > > Date: Mon, 25 Oct 2010 22:54:21 +0100
> > > To: dtustud...@hotmail.com
> > > 
> > > 
> > > [off list]
> > > 
> > > The PBS support was added by a third-party so I've not used it in anger 
> > > myself, it appears you are doing the correct thing as far as I can tell.
> > > 
> > > Can you send me the output of the following two commands and also apply 
> > > the patch below to padb (you can do this just in the bin dir - it's a 
> > > perl script) and send me the output when you run that as well?
> > > 
> > > qstat -fB
> > > qstat -w -n
> > > 
> > > --- padb (revision 401)
> > > +++ padb (working copy)
> > > @@ -2824,6 +2824,7 @@
> > > foreach my $server (@servers) {
> > > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > > }
> > > + print Dumper \%pbs_tabjobs;
> > > return \%pbs_tabjobs;
> > > }
> > > 
> > > On 25 Oct 2010, at 22:30, Jack Bryan wrote:
> > > 
> > > > Thanks
> > > > 
> > > > I have downloaded 
> > > > http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz
> > > > 
> > > > and followed the instructions of INSTALL file and installed it at 
> > > > /mypath/padb32 
> > > > 
> > > > But, I got:
> > > > 
> > > > -bash-3.2$ padb -Ormgr=pbs -Q 48279.cluster
> > > > Job 48279.cluster is not active
> > > > 
> > > > Actually, the job was running. 
> > > > 
> > > > I have installed 
> > > > bin at 
> > > > 
> > > > /mypath/padb32/bin
> > > > 
> > > > 
> > > > libexec at
> > > > /lustre/jxding/padb32/libexec
> > > > 
> > > > When I installed it, I used 
> > > > 
> > > > ./configure --prefix=/mypath/padb32
> > > > 
> > > > I got 
> > > > -
> > > > 
> > > > checking for a BSD-c

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

Thanks
I have downloaded http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz
and followed the instructions of INSTALL file and installed it at 
/mypath/padb32 
But, I got:
-bash-3.2$ padb -Ormgr=pbs -Q 48279.clusterJob 48279.cluster is not active
Actually, the job was running. 
I have installed bin at  
/mypath/padb32/bin

libexec  at/lustre/jxding/padb32/libexec
When I installed it, I used 
./configure --prefix=/mypath/padb32
I got -
checking for a BSD-compatible install... /usr/bin/install -cchecking whether 
build environment is sane... yeschecking for a thread-safe mkdir -p... 
/bin/mkdir -pchecking for gawk... gawkchecking whether make sets $(MAKE)... 
yeschecking for gcc... gccchecking whether the C compiler works... yeschecking 
for C compiler default output file name... a.outchecking for suffix of 
executables...checking whether we are cross compiling... nochecking for suffix 
of object files... ochecking whether we are using the GNU C compiler... 
yeschecking whether gcc accepts -g... yeschecking for gcc option to accept ISO 
C89... none neededchecking for style of include used by make... GNUchecking 
dependency style of gcc... gcc3checking whether gcc and cc understand -c and -o 
together... yesconfigure: creating ./config.statusconfig.status: creating 
Makefileconfig.status: creating src/Makefileconfig.status: executing depfiles 
commands
---
-bash-3.2$ makeMaking all in srcmake[1]: Entering directory 
`/mypath/padb32/padb-3.2-beta1/src'gcc -DPACKAGE_NAME=\"\" 
-DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"padb\" 
-DVERSION=\"3.2-beta1\" -I.-Wall -g -O2 -MT minfo-minfo.o -MD -MP -MF 
.deps/minfo-minfo.Tpo -c -o minfo-minfo.o `test -f 'minfo.c' || echo 
'./'`minfo.cminfo.c: In function âfind_symâ:minfo.c:158: warning: dereferencing 
type-punned pointer will break strict-aliasing rulesminfo.c: In function 
âmainâ:minfo.c:649: warning: type-punning to incomplete type might break 
strict-aliasing rulesminfo.c:650: warning: type-punning to incomplete type 
might break strict-aliasing rulesmv -f .deps/minfo-minfo.Tpo 
.deps/minfo-minfo.Pogcc -Wall -g -O2 -ldl  -o minfo minfo-minfo.omake[1]: 
Leaving directory `/mypath/padb32/padb-3.2-beta1/src'make[1]: Entering 
directory `/mypath/padb32/padb-3.2-beta1'make[1]: Nothing to be done for 
`all-am'.make[1]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1'-
-bash-3.2$ make installMaking install in srcmake[1]: Entering directory 
`/mypath/padb32/padb-3.2-beta1/src'make[2]: Entering directory 
`/mypath/padb32/padb-3.2-beta1/src'test -z "/lustre/jxding/padb32/bin" || 
/bin/mkdir -p "/mypath/padb32/bin" /usr/bin/install -c padb 
'/lustre/jxding/padb32/bin'test -z "/lustre/jxding/padb32/libexec" || 
/bin/mkdir -p "/mypath/padb32/libexec"  /usr/bin/install -c minfo 
'/lustre/jxding/padb32/libexec'make[2]: Nothing to be done for 
`install-data-am'.make[2]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1/src'make[1]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1/src'make[1]: Entering directory 
`/mypath/padb32/padb-3.2-beta1'make[2]: Entering directory 
`/mypath/padb32/padb-3.2-beta1'make[2]: Nothing to be done for 
`install-exec-am'.make[2]: Nothing to be done for `install-data-am'.make[2]: 
Leaving directory `/mypath/padb32/padb-3.2-beta1'make[1]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1'-bash-3.2$ make installcheckMaking installcheck 
in srcmake[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'make[1]: 
Nothing to be done for `installcheck'.make[1]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1/src'make[1]: Entering directory 
`/mypath/padb32/padb-3.2-beta1'make[1]: Nothing to be done for 
`installcheck-am'.make[1]: Leaving directory 
`/mypath/padb32/padb-3.2-beta1'--
Are there something wrong with what I have done ?
Any help is appreciated. 
thanks
Jack
Oct. 25 2010

> From: ash...@pittman.co.uk
> Date: Mon, 25 Oct 2010 20:40:18 +0100
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> 
> On 25 Oct 2010, at 20:18, Jack Bryan wrote:
> 
> > Thanks
> > I have downloaded 
> > http://padb.googlecode.com/files/padb-3.0.tgz
> > 
> > and compile it.
> > 
> > But, no user manual, I can not use it by padb -aQ.
> 
> The -a flag is a shortcut to all jobs, if you are providing a jobid (which is 
> normally numeric) then don't set the -a flag.
> 
> > Do you have use manual about how to use it ? 
> 
> In my previous mail I was assuming you were using orte to launch the jobs but 
> if you are using PBS then you'll need to use the 3.2 beta as the PBS code 

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread David Zhang
can you install MPI on your local machine? As someone said earlier, you
don't need a cluster to run MPI.  You can run MPI with multiple processes on
a single computer.

On Mon, Oct 25, 2010 at 12:40 PM, Ashley Pittman wrote:

>
> On 25 Oct 2010, at 20:18, Jack Bryan wrote:
>
> > Thanks
> > I have downloaded
> > http://padb.googlecode.com/files/padb-3.0.tgz
> >
> > and compile it.
> >
> > But, no user manual, I can not use it by padb -aQ.
>
> The -a flag is a shortcut to all jobs, if you are providing a jobid (which
> is normally numeric) then don't set the -a flag.
>
> > Do you have use manual about how to use it ?
>
> In my previous mail I was assuming you were using orte to launch the jobs
> but if you are using PBS then you'll need to use the 3.2 beta as the PBS
> code is new, alternatively you could find the host where the PBS script
> itself runs and check of the "ompi-ps" command gives you any output, if it
> does then you could run it from there giving it the orte jobid.
>
> A bit of background about resource managers (in which I'm including orte
> and PBS), padb supports many resource managers and tries to automatically
> detect which ones you have installed on your system.  If you don't specify
> one then it'll see what is installed, if there is more than one resource
> manager installed then it'll see which of them claim to have active jobs -
> if only one resource manager meets this criteria then it'll pick that one -
> hence 99% of the time it should just work.  If more than one resource
> manager claims to have active jobs then padb will refuse to run but ask the
> user to specify one explicitly.
>
> You should try the following in order once you have 3.2 installed.
>
> padb -Ormgr=pbs -Q 
>
> Or - find the node where the PBS script is being executed, check that the
> ompi-ps command is returning the jobid and then run
>
> padb -Ormgr=orte -Q 
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Ashley Pittman

On 25 Oct 2010, at 20:18, Jack Bryan wrote:

> Thanks
> I have downloaded 
> http://padb.googlecode.com/files/padb-3.0.tgz
> 
> and compile it.
> 
> But, no user manual, I can not use it by padb -aQ.

The -a flag is a shortcut to all jobs, if you are providing a jobid (which is 
normally numeric) then don't set the -a flag.

> Do you have use manual about how to use it ? 

In my previous mail I was assuming you were using orte to launch the jobs but 
if you are using PBS then you'll need to use the 3.2 beta as the PBS code is 
new, alternatively you could find the host where the PBS script itself runs and 
check of the "ompi-ps" command gives you any output, if it does then you could 
run it from there giving it the orte jobid.

A bit of background about resource managers (in which I'm including orte and 
PBS), padb supports many resource managers and tries to automatically detect 
which ones you have installed on your system.  If you don't specify one then 
it'll see what is installed, if there is more than one resource manager 
installed then it'll see which of them claim to have active jobs - if only one 
resource manager meets this criteria then it'll pick that one - hence 99% of 
the time it should just work.  If more than one resource manager claims to have 
active jobs then padb will refuse to run but ask the user to specify one 
explicitly.

You should try the following in order once you have 3.2 installed.

padb -Ormgr=pbs -Q 

Or - find the node where the PBS script is being executed, check that the 
ompi-ps command is returning the jobid and then run

padb -Ormgr=orte -Q 

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

ThanksI have downloaded http://padb.googlecode.com/files/padb-3.0.tgz
and compile it.
But, no user manual, I can not use it by padb -aQ.

./padb -aQ myjobpadb: Error: --all incompatible with specific ids
Actually, myjob is running in the queue. 
Do you have use manual about how to use it ? 
thanks

> From: ash...@pittman.co.uk
> Date: Mon, 25 Oct 2010 18:08:32 +0100
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> 
> On 25 Oct 2010, at 17:26, Jack Bryan wrote:
> 
> > Thanks, the problem is still there. 
> > 
> > I used: 
> > 
> > Only process 0 returns. Other processes are still struck in
> > MPI_Finalize(). 
> > 
> > Any help is appreciated. 
> 
> You can use the command "padb -aQ" to show you the message queues for your 
> application, you'll need to download and install padb then simply run your 
> job, allow it to hang and they run padb - it'll show you the message queues 
> for each rank that it can find processes for (the ones that haven't exited).  
> If this isn't any help run "padb -axt" for the stack traces and send the 
> output to this list.
> 
> The web-site is in my signature or there is a new beta release out this week 
> at http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz
> 
> Ashley.
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

thanks
But, the code is too long.

Jack Oct. 25 2010
> Date: Mon, 25 Oct 2010 14:08:54 -0400
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Your job may be queued, not executing, because there are no
> resources available, all nodes are busy.
> Try qstat -a.
> 
> Posting a code snippet with all your MPI calls may prove effective.
> You might get a trove of advice for a thrift of effort.
> 
> Jeff Squyres wrote:
> > Check the man page for qsub for proper use.
> > 
> > 
> > On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:
> > 
> >> thanks
> >>
> >> I use 
> >> qsub -I nsga2_job.sh
> >> qsub: waiting for job 48270.clusterName to start
> >>
> >> By qstat
> >> I found the job name is none and no results show up. 
> >>
> >> No shell prompt appear, the command line is hang there , no response. 
> >>
> >> Any help is appreciated. 
> >>
> >> Thanks
> >>
> >> Jack 
> >>
> >> Oct. 25 2010
> >>
> >>> From: jsquy...@cisco.com
> >>> Date: Mon, 25 Oct 2010 13:39:30 -0400
> >>> To: us...@open-mpi.org
> >>> Subject: Re: [OMPI users] Open MPI program cannot complete
> >>>
> >>> Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, 
> >>> "qsub -I ..." ?
> >>>
> >>> Then you get a shell prompt with your allocated cores and can run stuff 
> >>> interactively. I don't know if your site allows this, but interactive 
> >>> debugging here might be *significantly* easier than try to automate some 
> >>> debugging.
> >>>
> >>>
> >>> On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
> >>>
> >>>> thanks
> >>>>
> >>>> I have to use #PBS to submit any jobs in my cluster. 
> >>>> I cannot use command line to hang a job on my cluster. 
> >>>>
> >>>> this is my script: 
> >>>> --
> >>>> #!/bin/bash
> >>>> #PBS -N jobname
> >>>> #PBS -l walltime=00:08:00,nodes=1
> >>>> #PBS -q queuename
> >>>> COMMAND=/mypath/myprog
> >>>> NCORES=5
> >>>>
> >>>> cd $PBS_O_WORKDIR
> >>>> NODES=`cat $PBS_NODEFILE | wc -l`
> >>>> NPROC=$(( $NCORES * $NODES ))
> >>>>
> >>>> mpirun -np $NPROC --mca btl self,sm,openib $COMMAND
> >>>>
> >>>> ---
> >>>>
> >>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> >>>> ZOMBIE_PID) in the script ? 
> >>>> And how to get ZOMBIE_PID from the script ? 
> >>>>
> >>>> Any help is appreciated. 
> >>>>
> >>>> thanks
> >>>>
> >>>> Oct. 25 2010
> >>>>
> >>>> Date: Mon, 25 Oct 2010 19:24:35 +0200
> >>>> From: j...@59a2.org
> >>>> To: us...@open-mpi.org
> >>>> Subject: Re: [OMPI users] Open MPI program cannot complete
> >>>>
> >>>> On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
> >>>> I need to use #PBS parallel job script to submit a job on MPI cluster. 
> >>>>
> >>>> Is it not possible to reproduce locally? Most clusters have a way to 
> >>>> submit an interactive job (which would let you start this thing and then 
> >>>> inspect individual processes). Ashley's Padb suggestion will certainly 
> >>>> be better in a non-interactive environment.
> >>>>
> >>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> >>>> ZOMBIE_PID) in the script ? 
> >>>>
> >>>> Is control returning to your script after rank 0 has exited? In that 
> >>>> case, you can just put this on the next line.
> >>>>
> >>>> How to get the ZOMBIE_PID ? 
> >>>>
> >>>> "ps" from the command line, or getpid() from C code.
> >>>>
> >>>> Jed
> >>>>
> >>>> ___ users mailing list 
> >>>> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>> ___
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> -- 
> >>> Jeff Squyres
> >>> jsquy...@cisco.com
> >>> For corporate legal information go to:
> >>> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Gus Correa

Your job may be queued, not executing, because there are no
resources available, all nodes are busy.
Try qstat -a.

Posting a code snippet with all your MPI calls may prove effective.
You might get a trove of advice for a thrift of effort.

Jeff Squyres wrote:

Check the man page for qsub for proper use.


On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:


thanks

I use 
qsub -I nsga2_job.sh

qsub: waiting for job 48270.clusterName to start

By qstat
I found the job name is none and no results show up. 

No shell prompt appear, the command line is hang there , no response. 

Any help is appreciated. 


Thanks

Jack 


Oct. 25 2010


From: jsquy...@cisco.com
Date: Mon, 25 Oct 2010 13:39:30 -0400
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, "qsub -I 
..." ?

Then you get a shell prompt with your allocated cores and can run stuff 
interactively. I don't know if your site allows this, but interactive debugging 
here might be *significantly* easier than try to automate some debugging.


On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:


thanks

I have to use #PBS to submit any jobs in my cluster. 
I cannot use command line to hang a job on my cluster. 

this is my script: 
--

#!/bin/bash
#PBS -N jobname
#PBS -l walltime=00:08:00,nodes=1
#PBS -q queuename
COMMAND=/mypath/myprog
NCORES=5

cd $PBS_O_WORKDIR
NODES=`cat $PBS_NODEFILE | wc -l`
NPROC=$(( $NCORES * $NODES ))

mpirun -np $NPROC --mca btl self,sm,openib $COMMAND

---

Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ? 
And how to get ZOMBIE_PID from the script ? 

Any help is appreciated. 


thanks

Oct. 25 2010

Date: Mon, 25 Oct 2010 19:24:35 +0200
From: j...@59a2.org
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
I need to use #PBS parallel job script to submit a job on MPI cluster. 


Is it not possible to reproduce locally? Most clusters have a way to submit an 
interactive job (which would let you start this thing and then inspect 
individual processes). Ashley's Padb suggestion will certainly be better in a 
non-interactive environment.

Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ? 


Is control returning to your script after rank 0 has exited? In that case, you 
can just put this on the next line.

How to get the ZOMBIE_PID ? 


"ps" from the command line, or getpid() from C code.

Jed

___ users mailing list 
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jeff Squyres
Check the man page for qsub for proper use.


On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:

> thanks
> 
> I use 
> qsub -I nsga2_job.sh
> qsub: waiting for job 48270.clusterName to start
> 
> By qstat
> I found the job name is none and no results show up. 
> 
> No shell prompt appear, the command line is hang there , no response. 
> 
> Any help is appreciated. 
> 
> Thanks
> 
> Jack 
> 
> Oct. 25 2010
> 
> > From: jsquy...@cisco.com
> > Date: Mon, 25 Oct 2010 13:39:30 -0400
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> > 
> > Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, 
> > "qsub -I ..." ?
> > 
> > Then you get a shell prompt with your allocated cores and can run stuff 
> > interactively. I don't know if your site allows this, but interactive 
> > debugging here might be *significantly* easier than try to automate some 
> > debugging.
> > 
> > 
> > On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
> > 
> > > thanks
> > > 
> > > I have to use #PBS to submit any jobs in my cluster. 
> > > I cannot use command line to hang a job on my cluster. 
> > > 
> > > this is my script: 
> > > --
> > > #!/bin/bash
> > > #PBS -N jobname
> > > #PBS -l walltime=00:08:00,nodes=1
> > > #PBS -q queuename
> > > COMMAND=/mypath/myprog
> > > NCORES=5
> > > 
> > > cd $PBS_O_WORKDIR
> > > NODES=`cat $PBS_NODEFILE | wc -l`
> > > NPROC=$(( $NCORES * $NODES ))
> > > 
> > > mpirun -np $NPROC --mca btl self,sm,openib $COMMAND
> > > 
> > > ---
> > > 
> > > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> > > ZOMBIE_PID) in the script ? 
> > > And how to get ZOMBIE_PID from the script ? 
> > > 
> > > Any help is appreciated. 
> > > 
> > > thanks
> > > 
> > > Oct. 25 2010
> > > 
> > > Date: Mon, 25 Oct 2010 19:24:35 +0200
> > > From: j...@59a2.org
> > > To: us...@open-mpi.org
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > 
> > > On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
> > > I need to use #PBS parallel job script to submit a job on MPI cluster. 
> > > 
> > > Is it not possible to reproduce locally? Most clusters have a way to 
> > > submit an interactive job (which would let you start this thing and then 
> > > inspect individual processes). Ashley's Padb suggestion will certainly be 
> > > better in a non-interactive environment.
> > > 
> > > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> > > ZOMBIE_PID) in the script ? 
> > > 
> > > Is control returning to your script after rank 0 has exited? In that 
> > > case, you can just put this on the next line.
> > > 
> > > How to get the ZOMBIE_PID ? 
> > > 
> > > "ps" from the command line, or getpid() from C code.
> > > 
> > > Jed
> > > 
> > > ___ users mailing list 
> > > us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

thanks
I use qsub -I nsga2_job.shqsub: waiting for job 
48270.clusterName to start
By qstatI found the job name is none and no results show up. 
No shell prompt appear, the command line is hang there , no response. 
Any help is appreciated. 
Thanks
Jack 
Oct. 25 2010
> From: jsquy...@cisco.com
> Date: Mon, 25 Oct 2010 13:39:30 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Can you use the interactive mode of PBS to get 5 cores on 1 node?  IIRC, 
> "qsub -I ..." ?
> 
> Then you get a shell prompt with your allocated cores and can run stuff 
> interactively.  I don't know if your site allows this, but interactive 
> debugging here might be *significantly* easier than try to automate some 
> debugging.
> 
> 
> On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
> 
> > thanks
> > 
> > I have to use #PBS to submit any jobs in my cluster. 
> > I cannot use command line to hang a job on my cluster. 
> > 
> > this is my script: 
> > --
> > #!/bin/bash
> > #PBS -N jobname
> > #PBS -l walltime=00:08:00,nodes=1
> > #PBS -q queuename
> > COMMAND=/mypath/myprog
> > NCORES=5
> > 
> > cd $PBS_O_WORKDIR
> > NODES=`cat $PBS_NODEFILE | wc -l`
> > NPROC=$(( $NCORES * $NODES ))
> > 
> > mpirun -np $NPROC --mca btl self,sm,openib  $COMMAND
> > 
> > ---
> > 
> > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> > ZOMBIE_PID) in the script ? 
> > And how to get ZOMBIE_PID from the script ? 
> > 
> > Any help is appreciated. 
> > 
> > thanks
> > 
> > Oct. 25 2010
> > 
> > Date: Mon, 25 Oct 2010 19:24:35 +0200
> > From: j...@59a2.org
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> > 
> > On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
> > I need to use #PBS parallel job script to submit a job on MPI cluster. 
> > 
> > Is it not possible to reproduce locally?  Most clusters have a way to 
> > submit an interactive job (which would let you start this thing and then 
> > inspect individual processes).  Ashley's Padb suggestion will certainly be 
> > better in a non-interactive environment.
> >  
> > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> > ZOMBIE_PID) in the script ? 
> > 
> > Is control returning to your script after rank 0 has exited?  In that case, 
> > you can just put this on the next line.
> >  
> > How to get the ZOMBIE_PID ? 
> > 
> > "ps" from the command line, or getpid() from C code.
> > 
> > Jed
> > 
> > ___ users mailing list 
> > us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jed Brown
On Mon, Oct 25, 2010 at 19:35, Jack Bryan  wrote:

> I have to use #PBS to submit any jobs in my cluster.
> I cannot use command line to hang a job on my cluster.
>

You don't need a cluster to run MPI jobs, can you run the job on whatever
you development machine is?  Does it hang there?

PBS interactive jobs are started with qsub -I.

>
> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid
> ZOMBIE_PID) in the script ?
>

On the line after "mpirun ...", assuming that control returns to there after
the hang.  You didn't answer whether that was the case.


> And how to get ZOMBIE_PID from the script ?
>

Simplest is "pgrep $COMMAND", or use ps.

Jed


Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jeff Squyres
Can you use the interactive mode of PBS to get 5 cores on 1 node?  IIRC, "qsub 
-I ..." ?

Then you get a shell prompt with your allocated cores and can run stuff 
interactively.  I don't know if your site allows this, but interactive 
debugging here might be *significantly* easier than try to automate some 
debugging.


On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:

> thanks
> 
> I have to use #PBS to submit any jobs in my cluster. 
> I cannot use command line to hang a job on my cluster. 
> 
> this is my script: 
> --
> #!/bin/bash
> #PBS -N jobname
> #PBS -l walltime=00:08:00,nodes=1
> #PBS -q queuename
> COMMAND=/mypath/myprog
> NCORES=5
> 
> cd $PBS_O_WORKDIR
> NODES=`cat $PBS_NODEFILE | wc -l`
> NPROC=$(( $NCORES * $NODES ))
> 
> mpirun -np $NPROC --mca btl self,sm,openib  $COMMAND
> 
> ---
> 
> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> ZOMBIE_PID) in the script ? 
> And how to get ZOMBIE_PID from the script ? 
> 
> Any help is appreciated. 
> 
> thanks
> 
> Oct. 25 2010
> 
> Date: Mon, 25 Oct 2010 19:24:35 +0200
> From: j...@59a2.org
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
> I need to use #PBS parallel job script to submit a job on MPI cluster. 
> 
> Is it not possible to reproduce locally?  Most clusters have a way to submit 
> an interactive job (which would let you start this thing and then inspect 
> individual processes).  Ashley's Padb suggestion will certainly be better in 
> a non-interactive environment.
>  
> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
> ZOMBIE_PID) in the script ? 
> 
> Is control returning to your script after rank 0 has exited?  In that case, 
> you can just put this on the next line.
>  
> How to get the ZOMBIE_PID ? 
> 
> "ps" from the command line, or getpid() from C code.
> 
> Jed
> 
> ___ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jed Brown
On Mon, Oct 25, 2010 at 19:07, Jack Bryan  wrote:

> I need to use #PBS parallel job script to submit a job on MPI cluster.
>

Is it not possible to reproduce locally?  Most clusters have a way to submit
an interactive job (which would let you start this thing and then inspect
individual processes).  Ashley's Padb suggestion will certainly be better in
a non-interactive environment.


> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid
> ZOMBIE_PID) in the script ?
>

Is control returning to your script after rank 0 has exited?  In that case,
you can just put this on the next line.


> How to get the ZOMBIE_PID ?
>

"ps" from the command line, or getpid() from C code.

Jed


Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Ashley Pittman

On 25 Oct 2010, at 17:26, Jack Bryan wrote:

> Thanks, the problem is still there. 
> 
> I used: 
> 
> Only process 0 returns. Other processes are still struck in
> MPI_Finalize(). 
> 
> Any help is appreciated. 

You can use the command "padb -aQ" to show you the message queues for your 
application, you'll need to download and install padb then simply run your job, 
allow it to hang and they run padb - it'll show you the message queues for each 
rank that it can find processes for (the ones that haven't exited).  If this 
isn't any help run "padb -axt" for the stack traces and send the output to this 
list.

The web-site is in my signature or there is a new beta release out this week at 
http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

thanks, 
Would like to tell me how to use 
(gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID)
in MPI ? 
I need to use #PBS parallel job script to submit a job on MPI cluster. 
Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
ZOMBIE_PID) in the script ? 
How to get the ZOMBIE_PID ? 
thanks
Any help is appreciated. 
Jack
Oct. 25 2010
List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 19:01:38 +0200
From: j...@59a2.org
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

On Mon, Oct 25, 2010 at 18:26, Jack Bryan <dtustud...@hotmail.com> wrote:

Thanks, the problem is still there.
This really doesn't prove that there are no outstanding asynchronous requests, 
but perhaps you know that there are not, despite not being able to post a 
complete test case here.  I suggest attaching a debugger and getting a stack 
trace from the zombies (gdb --batch -ex 'bt full' -ex 'info reg' -pid 
ZOMBIE_PID).

Jed

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users  
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

Thanks, the problem is still there. 
I used: 
cout << "In main(), I am rank " << myRank << " , I am before 
MPI_Barrier(MPI_COMM_WORLD). \n\n"  << endl ;
MPI_Barrier(MPI_COMM_WORLD);cout << "In main(), I am rank " 
<< myRank << " , I am before MPI_Finalize() and after 
MPI_Barrier(MPI_COMM_WORLD). \n\n"  << endl ; MPI_Finalize();   
  cout << "In main(), I am rank " << myRank << " , I am after MPI_Finalize(), 
then return 0 . \n\n"  << endl ;return 0 ;
Only process 0 returns. Other processes are still struck inMPI_Finalize(). 
Any help is appreciated. 
JACK
Oct. 25 2010

From: solarbik...@gmail.com
List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 08:27:19 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

I think I got this problem before.  Put a mpi_barrier(mpi_comm_world) before 
mpi_finalize for all processes.  For me, mpi terminates nicely only when all 
process are calling mpi_finalize the same time.  So I do it for all my programs.



On Mon, Oct 25, 2010 at 7:13 AM, Jack Bryan <dtustud...@hotmail.com> wrote:







Thanks, But, I have put a mpi_waitall(request) before
cout << " I am rank " << rank << " I am before MPI_Finalize()" << endl;


If the above sentence has been printed out, it means that all requests have 
been checked and finished. right ?  


What may be the possible reasons for that stuck ? 
Any help is appreciated. 
Jack
Oct. 25 2010 



List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 05:32:44 -0400
From: terry.don...@oracle.com
To: us...@open-mpi.org


Subject: Re: [OMPI users] Open MPI program cannot complete



  




  
  
So what you are saying is *all* the ranks have entered MPI_Finalize
and only a subset has exited per placing prints before and after
MPI_Finalize.  Good.  So my guess is that the processes stuck in
MPI_Finalize have a prior MPI request outstanding that for whatever
reason is unable to complete.  So I would first look at all the MPI
requests and make sure they completed.



--td



On 10/25/2010 02:38 AM, Jack Bryan wrote:

  
  thanks
  I found a problem: 
  

  
  I used: 
  
 cout << " I am rank " << rank << " I
am before MPI_Finalize()" << endl;
   MPI_Finalize(); 
   cout
<< " I am rank " << rank << " I am after
MPI_Finalize()" << endl;
   return 0;
  
  
  I can get the output " I
am rank 0 (1, 2, ) I am before MPI_Finalize() ". 
  
  
  and 
   
 " I am rank 0 I am
after MPI_Finalize() "
  But, other processes do
not printed out "I am rank ... I am after MPI_Finalize()" .
  

  
  It is weird. The process has reached the
point just before MPI_Finalize(), why they are hanged there
? 
  

  
  Are there other better
ways to check this ? 
  
  
  Any help is appreciated. 
  
  
  thanks
  
  
  Jack
  
  
  Oct. 25 2010
  

  From:
  solarbik...@gmail.com

  Date: Sun,
  24 Oct 2010 19:47:54 -0700

  To:
  us...@open-mpi.org

  Subject: Re:
  [OMPI users] Open MPI program cannot complete

  

  how do you
  know all process call mpi_finalize?  did you have all of them
  print out something before they call mpi_finalize? I think
  what Gustavo is getting at is maybe you had some MPI calls
  within your snippets that hangs your program, thus some of
  your processes never called mpi_finalize.

  

  On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com>
wrote:


  
Thanks, 



But, my code is too long to be posted. 



What are the common reasons of this kind of problems ? 



Any help is appreciated. 



Jack


  Oct. 24 2010
        
      

    > From: g...@ldeo.columbia.edu

  
  > Date: Sun, 24 Oct 2010 18:09:52 -0400
  




  > To: us...@open-mpi.org

  > Subject: Re: [OMPI users] Open MPI program cannot
  complete

  > 

  > Hi Jack

  > 

  > Your co

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread David Zhang
I think I got this problem before.  Put a mpi_barrier(mpi_comm_world) before
mpi_finalize for all processes.  For me, mpi terminates nicely only when all
process are calling mpi_finalize the same time.  So I do it for all my
programs.

On Mon, Oct 25, 2010 at 7:13 AM, Jack Bryan <dtustud...@hotmail.com> wrote:

>  Thanks,
> But, I have put a mpi_waitall(request) before
>
> cout << " I am rank " << rank << " I am before MPI_Finalize()" << endl;
>
> If the above sentence has been printed out, it means that all requests have
> been checked and finished. right ?
>
> What may be the possible reasons for that stuck ?
>
> Any help is appreciated.
>
> Jack
>
> Oct. 25 2010
> *
> *
> --
> Date: Mon, 25 Oct 2010 05:32:44 -0400
> From: terry.don...@oracle.com
>
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> So what you are saying is *all* the ranks have entered MPI_Finalize and
> only a subset has exited per placing prints before and after MPI_Finalize.
> Good.  So my guess is that the processes stuck in MPI_Finalize have a prior
> MPI request outstanding that for whatever reason is unable to complete.  So
> I would first look at all the MPI requests and make sure they completed.
>
> --td
>
> On 10/25/2010 02:38 AM, Jack Bryan wrote:
>
> thanks
> I found a problem:
>
>  I used:
>
>  cout << " I am rank " << rank << " I am before MPI_Finalize()" <<
> endl;
>  MPI_Finalize();
>  cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
>  return 0;
>
>  I can get the output " I am rank 0 (1, 2, ) I am before
> MPI_Finalize() ".
>
>  and
> " I am rank 0 I am after MPI_Finalize() "
> But, other processes do not printed out "I am rank ... I am after
> MPI_Finalize()" .
>
>  It is weird. The process has reached the point just before
> MPI_Finalize(), why they are hanged there ?
>
>  Are there other better ways to check this ?
>
>  Any help is appreciated.
>
>  thanks
>
>  Jack
>
>  Oct. 25 2010
>
> --
> From: solarbik...@gmail.com
> Date: Sun, 24 Oct 2010 19:47:54 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> how do you know all process call mpi_finalize?  did you have all of them
> print out something before they call mpi_finalize? I think what Gustavo is
> getting at is maybe you had some MPI calls within your snippets that hangs
> your program, thus some of your processes never called mpi_finalize.
>
> On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com>wrote:
>
>  Thanks,
>
>  But, my code is too long to be posted.
>
>  What are the common reasons of this kind of problems ?
>
>  Any help is appreciated.
>
>  Jack
>
> Oct. 24 2010
>
> > From: g...@ldeo.columbia.edu
>  > Date: Sun, 24 Oct 2010 18:09:52 -0400
>
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> >
> > Hi Jack
> >
> > Your code snippet is too terse, doesn't show the MPI calls.
> > It is hard to guess what is the problem this way.
> >
> > Gus Correa
> > On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
> >
> > > Thanks for the reply.
> > > But, I use mpi_waitall() to make sure that all MPI communications have
> been done before a process call MPI_Finalize() and returns.
> > >
> > > Any help is appreciated.
> > >
> > > thanks
> > >
> > > Jack
> > >
> > > Oct. 24 2010
> > >
> > > > From: g...@ldeo.columbia.edu
> > > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > > To: us...@open-mpi.org
> > > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > >
> > > > Hi Jack
> > > >
> > > > It may depend on "do some things".
> > > > Does it involve MPI communication?
> > > >
> > > > Also, why not put MPI_Finalize();return 0 outside the ifs?
> > > >
> > > > Gus Correa
> > > >
> > > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I got a problem of open MPI.
> > > > >
> > > > > My program has 5 processes.
> > > > >
> > > > > All of them can run MPI_Fina

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

Thanks, But, I have put a mpi_waitall(request) before
cout << " I am rank " << rank << " I am before MPI_Finalize()" << endl;
If the above sentence has been printed out, it means that all requests have 
been checked and finished. right ?  
What may be the possible reasons for that stuck ? 
Any help is appreciated. 
Jack
Oct. 25 2010 

List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 05:32:44 -0400
From: terry.don...@oracle.com
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete



  



Message body
  
  
So what you are saying is *all* the ranks have entered MPI_Finalize
and only a subset has exited per placing prints before and after
MPI_Finalize.  Good.  So my guess is that the processes stuck in
MPI_Finalize have a prior MPI request outstanding that for whatever
reason is unable to complete.  So I would first look at all the MPI
requests and make sure they completed.



--td



On 10/25/2010 02:38 AM, Jack Bryan wrote:

  
  thanks
  I found a problem: 
  

  
  I used: 
  
 cout << " I am rank " << rank << " I
am before MPI_Finalize()" << endl;
   MPI_Finalize(); 
   cout
<< " I am rank " << rank << " I am after
MPI_Finalize()" << endl;
   return 0;
  
  
  I can get the output " I
am rank 0 (1, 2, ) I am before MPI_Finalize() ". 
  
  
  and 
   
 " I am rank 0 I am
after MPI_Finalize() "
  But, other processes do
not printed out "I am rank ... I am after MPI_Finalize()" .
  

  
  It is weird. The process has reached the
point just before MPI_Finalize(), why they are hanged there
? 
  

  
  Are there other better
ways to check this ? 
  
  
  Any help is appreciated. 
  
  
  thanks
  
  
  Jack
  
  
  Oct. 25 2010
  

  From:
      solarbik...@gmail.com

  Date: Sun,
  24 Oct 2010 19:47:54 -0700

  To:
  us...@open-mpi.org

  Subject: Re:
  [OMPI users] Open MPI program cannot complete

  

  how do you
  know all process call mpi_finalize?  did you have all of them
  print out something before they call mpi_finalize? I think
  what Gustavo is getting at is maybe you had some MPI calls
  within your snippets that hangs your program, thus some of
  your processes never called mpi_finalize.

  

  On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com>
wrote:


  
Thanks, 



But, my code is too long to be posted. 



What are the common reasons of this kind of problems ? 



Any help is appreciated. 



Jack


  Oct. 24 2010

  

> From: g...@ldeo.columbia.edu

  
  > Date: Sun, 24 Oct 2010 18:09:52 -0400
  
    
    


  > To: us...@open-mpi.org

  > Subject: Re: [OMPI users] Open MPI program cannot
  complete

  > 

  > Hi Jack

  > 

  > Your code snippet is too terse, doesn't show the
  MPI calls.

  > It is hard to guess what is the problem this way.

  > 

  > Gus Correa 

  > On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:

  > 

  > > Thanks for the reply. 

  > > But, I use mpi_waitall() to make sure that
  all MPI communications have been done before a process
  call MPI_Finalize() and returns. 

  > > 

  > > Any help is appreciated.

  > > 

  > > thanks

  > > 

  > > Jack

  > > 

      > > Oct. 24 2010

      > > 

  > > > From: g...@ldeo.columbia.edu

  > > > Date: Sun, 24 Oct 2010 17:31:11 -0400

  > > > To: us...@open-mpi.org

  > > > Subject: Re: [OMPI users] Open MPI
  program cannot complete

  > > > 

  > > > Hi Jack


Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Terry Dontje
 So what you are saying is *all* the ranks have entered MPI_Finalize 
and only a subset has exited per placing prints before and after 
MPI_Finalize.  Good.  So my guess is that the processes stuck in 
MPI_Finalize have a prior MPI request outstanding that for whatever 
reason is unable to complete.  So I would first look at all the MPI 
requests and make sure they completed.


--td

On 10/25/2010 02:38 AM, Jack Bryan wrote:

thanks
I found a problem:

I used:

 cout << " I am rank " << rank << " I am before 
MPI_Finalize()" << endl;

 MPI_Finalize();
cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
 return 0;

I can get the output " I am rank 0 (1, 2, ) I am before 
MPI_Finalize() ".


and
   " I am rank 0 I am after MPI_Finalize() "
But, other processes do not printed out "I am rank ... I am after 
MPI_Finalize()" .


It is weird. The process has reached the point just before 
MPI_Finalize(), why they are hanged there ?


Are there other better ways to check this ?

Any help is appreciated.

thanks

Jack

Oct. 25 2010

--------
From: solarbik...@gmail.com
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize?  did you have all of 
them print out something before they call mpi_finalize? I think what 
Gustavo is getting at is maybe you had some MPI calls within your 
snippets that hangs your program, thus some of your processes never 
called mpi_finalize.


On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com 
<mailto:dtustud...@hotmail.com>> wrote:


Thanks,

But, my code is too long to be posted.

What are the common reasons of this kind of problems ?

Any help is appreciated.

Jack

Oct. 24 2010

> From: g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
    > Date: Sun, 24 Oct 2010 18:09:52 -0400

> To: us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> Hi Jack
>
> Your code snippet is too terse, doesn't show the MPI calls.
> It is hard to guess what is the problem this way.
>
> Gus Correa
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
>
> > Thanks for the reply.
> > But, I use mpi_waitall() to make sure that all MPI
communications have been done before a process call MPI_Finalize()
and returns.
> >
> > Any help is appreciated.
> >
> > thanks
> >
> > Jack
> >
    > > Oct. 24 2010
> >
    > > > From: g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org <mailto:us...@open-mpi.org>
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > >
> > > Hi Jack
> > >
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > >
> > > Also, why not put MPI_Finalize();return 0 outside the ifs?
> > >
> > > Gus Correa
> > >
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > >
> > > > Hi
> > > >
> > > > I got a problem of open MPI.
> > > >
> > > > My program has 5 processes.
> > > >
> > > > All of them can run MPI_Finalize() and return 0.
> > > >
> > > > But, the whole program cannot be completed.
> > > >
> > > > In the MPI cluster job queue, it is strill in running status.
> > > >
> > > > If I use 1 process to run it, no problem.
> > > >
> > > > Why ?
> > > >
> > > > My program:
> > > >
> > > > int main (int argc, char **argv)
> > > > {
> > > >
> > > > MPI_Init(, );
> > > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;
> > > >
> > > > if (myRank == 0)
> > > > {
> > > > do some things.
> > > > }
> > > >
> > > > if (myRank

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

thanksI found a problem: 
I used:  cout << " I am rank " << rank << " I am before MPI_Finalize()" 
<< endl; MPI_Finalize();cout << " I am rank " << rank << " I am 
after MPI_Finalize()" << endl; return 0;I can get the output " I am 
rank 0 (1, 2, ) I am before MPI_Finalize() ". and  " I am rank 
0 I am after MPI_Finalize() "But, other processes do not printed out "I am rank 
... I am after MPI_Finalize()" .
It is weird. The process has reached the point just before MPI_Finalize(), why 
they are hanged there ? 
Are there other better ways to check this ? Any help is appreciated. 
thanksJackOct. 25 2010
From: solarbik...@gmail.com
List-Post: users@lists.open-mpi.org
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize?  did you have all of them print 
out something before they call mpi_finalize? I think what Gustavo is getting at 
is maybe you had some MPI calls within your snippets that hangs your program, 
thus some of your processes never called mpi_finalize.



On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com> wrote:







Thanks, 
But, my code is too long to be posted. 
What are the common reasons of this kind of problems ? 
Any help is appreciated. 


Jack
Oct. 24 2010
> From: g...@ldeo.columbia.edu
> Date: Sun, 24 Oct 2010 18:09:52 -0400


> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Hi Jack
> 
> Your code snippet is too terse, doesn't show the MPI calls.


> It is hard to guess what is the problem this way.
> 
> Gus Correa 
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
> 
> > Thanks for the reply. 
> > But, I use mpi_waitall() to make sure that all MPI communications have been 
> > done before a process call MPI_Finalize() and returns. 


> > 
> > Any help is appreciated.
> > 
> > thanks
> > 
> > Jack
> > 
> > Oct. 24 2010
> > 
> > > From: g...@ldeo.columbia.edu


> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org
> > > Subject: Re: [OMPI users] Open MPI program cannot complete


> > > 
> > > Hi Jack
> > > 
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > > 
> > > Also, why not put MPI_Finalize();return 0 outside the ifs? 


> > > 
> > > Gus Correa
> > > 
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > > 
> > > > Hi 
> > > > 
> > > > I got a problem of open MPI.


> > > > 
> > > > My program has 5 processes. 
> > > > 
> > > > All of them can run MPI_Finalize() and return 0. 
> > > > 
> > > > But, the whole program cannot be completed. 


> > > > 
> > > > In the MPI cluster job queue, it is strill in running status. 
> > > > 
> > > > If I use 1 process to run it, no problem.
> > > > 


> > > > Why ? 
> > > > 
> > > > My program:
> > > > 
> > > > int main (int argc, char **argv) 
> > > > {
> > > > 
> > > > MPI_Init(, );


> > > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;

> > > > 

> > > > if (myRank == 0)
> > > > {
> > > > do some things. 
> > > > }
> > > > 
> > > > if (myRank != 0)
> > > > {
> > > > do some things. 


> > > > MPI_Finalize();
> > > > return 0 ;
> > > > }
> > > > if (myRank == 0)
> > > > {
> > > > MPI_Finalize();
> > > > return 0;


> > > > }
> > > > 
> > > > }
> > > > 
> > > > And, some output files get wrong codes, which can not be readible. 
> > > > In 1-process case, the program can print correct results to these 
> > > > output files . 


> > > > 
> > > > Any help is appreciated. 
> > > > 
> > > > thanks
> > > > 
> > > > Jack
> > > > 
> > > > Oct. 24 2010


> > > > 
> > > > ___
> > > > users mailing list
> > > > us...@open-mpi.org


> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > > 
> > > ___


> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users


> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users


> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


  

___

users mailing list

us...@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
David Zhang
University of California, San Diego




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users  
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jack Bryan

thanks
I used: 
 cout << " I am rank " << rank << " I am before MPI_Finalize()" << 
endl; MPI_Finalize();  return 0;
I can get the output " I am rank 0 (1, 2, ) I am before MPI_Finalize() ". 
Are there other better ways to check this ? 
Any help is appreciated. 
thanks
Jack
Oct. 25 2010
From: solarbik...@gmail.com
List-Post: users@lists.open-mpi.org
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize?  did you have all of them print 
out something before they call mpi_finalize? I think what Gustavo is getting at 
is maybe you had some MPI calls within your snippets that hangs your program, 
thus some of your processes never called mpi_finalize.



On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com> wrote:







Thanks, 
But, my code is too long to be posted. 
What are the common reasons of this kind of problems ? 
Any help is appreciated. 


Jack
Oct. 24 2010
> From: g...@ldeo.columbia.edu
> Date: Sun, 24 Oct 2010 18:09:52 -0400


> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Hi Jack
> 
> Your code snippet is too terse, doesn't show the MPI calls.


> It is hard to guess what is the problem this way.
> 
> Gus Correa 
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
> 
> > Thanks for the reply. 
> > But, I use mpi_waitall() to make sure that all MPI communications have been 
> > done before a process call MPI_Finalize() and returns. 


> > 
> > Any help is appreciated.
> > 
> > thanks
> > 
> > Jack
> > 
> > Oct. 24 2010
> > 
> > > From: g...@ldeo.columbia.edu


> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org
> > > Subject: Re: [OMPI users] Open MPI program cannot complete


> > > 
> > > Hi Jack
> > > 
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > > 
> > > Also, why not put MPI_Finalize();return 0 outside the ifs? 


> > > 
> > > Gus Correa
> > > 
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > > 
> > > > Hi 
> > > > 
> > > > I got a problem of open MPI.


> > > > 
> > > > My program has 5 processes. 
> > > > 
> > > > All of them can run MPI_Finalize() and return 0. 
> > > > 
> > > > But, the whole program cannot be completed. 


> > > > 
> > > > In the MPI cluster job queue, it is strill in running status. 
> > > > 
> > > > If I use 1 process to run it, no problem.
> > > > 


> > > > Why ? 
> > > > 
> > > > My program:
> > > > 
> > > > int main (int argc, char **argv) 
> > > > {
> > > > 
> > > > MPI_Init(, );


> > > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;

> > > > 

> > > > if (myRank == 0)
> > > > {
> > > > do some things. 
> > > > }
> > > > 
> > > > if (myRank != 0)
> > > > {
> > > > do some things. 


> > > > MPI_Finalize();
> > > > return 0 ;
> > > > }
> > > > if (myRank == 0)
> > > > {
> > > > MPI_Finalize();
> > > > return 0;


> > > > }
> > > > 
> > > > }
> > > > 
> > > > And, some output files get wrong codes, which can not be readible. 
> > > > In 1-process case, the program can print correct results to these 
> > > > output files . 


> > > > 
> > > > Any help is appreciated. 
> > > > 
> > > > thanks
> > > > 
> > > > Jack
> > > > 
> > > > Oct. 24 2010


> > > > 
> > > > ___
> > > > users mailing list
> > > > us...@open-mpi.org


> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > > 
> > > ___


> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users


> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users


> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


  

___

users mailing list

us...@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
David Zhang
University of California, San Diego




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users  
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-24 Thread Jack Bryan

Thanks, 
But, my code is too long to be posted. 
What are the common reasons of this kind of problems ? 
Any help is appreciated. 
Jack
Oct. 24 2010
> From: g...@ldeo.columbia.edu
> Date: Sun, 24 Oct 2010 18:09:52 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Hi Jack
> 
> Your code snippet is too terse, doesn't show the MPI calls.
> It is hard to guess what is the problem this way.
> 
> Gus Correa 
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
> 
> > Thanks for the reply. 
> > But, I use mpi_waitall() to make sure that all MPI communications have been 
> > done before a process call MPI_Finalize() and returns. 
> > 
> > Any help is appreciated.
> > 
> > thanks
> > 
> > Jack
> > 
> > Oct. 24 2010
> > 
> > > From: g...@ldeo.columbia.edu
> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > 
> > > Hi Jack
> > > 
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > > 
> > > Also, why not put MPI_Finalize();return 0 outside the ifs? 
> > > 
> > > Gus Correa
> > > 
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > > 
> > > > Hi 
> > > > 
> > > > I got a problem of open MPI.
> > > > 
> > > > My program has 5 processes. 
> > > > 
> > > > All of them can run MPI_Finalize() and return 0. 
> > > > 
> > > > But, the whole program cannot be completed. 
> > > > 
> > > > In the MPI cluster job queue, it is strill in running status. 
> > > > 
> > > > If I use 1 process to run it, no problem.
> > > > 
> > > > Why ? 
> > > > 
> > > > My program:
> > > > 
> > > > int main (int argc, char **argv) 
> > > > {
> > > > 
> > > > MPI_Init(, );
> > > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;
> > > > 
> > > > if (myRank == 0)
> > > > {
> > > > do some things. 
> > > > }
> > > > 
> > > > if (myRank != 0)
> > > > {
> > > > do some things. 
> > > > MPI_Finalize();
> > > > return 0 ;
> > > > }
> > > > if (myRank == 0)
> > > > {
> > > > MPI_Finalize();
> > > > return 0;
> > > > }
> > > > 
> > > > }
> > > > 
> > > > And, some output files get wrong codes, which can not be readible. 
> > > > In 1-process case, the program can print correct results to these 
> > > > output files . 
> > > > 
> > > > Any help is appreciated. 
> > > > 
> > > > thanks
> > > > 
> > > > Jack
> > > > 
> > > > Oct. 24 2010
> > > > 
> > > > ___
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > > 
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-24 Thread Gustavo Correa
Hi Jack

Your code snippet is too terse, doesn't show the MPI calls.
It is hard to guess what is the problem this way.

Gus Correa 
On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:

> Thanks for the reply. 
> But, I use mpi_waitall() to make sure that all MPI communications have been 
> done before a process call MPI_Finalize() and returns. 
> 
> Any help is appreciated.
> 
> thanks
> 
> Jack
> 
> Oct. 24 2010
> 
> > From: g...@ldeo.columbia.edu
> > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> > 
> > Hi Jack
> > 
> > It may depend on "do some things".
> > Does it involve MPI communication?
> > 
> > Also, why not put MPI_Finalize();return 0 outside the ifs? 
> > 
> > Gus Correa
> > 
> > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > 
> > > Hi 
> > > 
> > > I got a problem of open MPI.
> > > 
> > > My program has 5 processes. 
> > > 
> > > All of them can run MPI_Finalize() and return 0. 
> > > 
> > > But, the whole program cannot be completed. 
> > > 
> > > In the MPI cluster job queue, it is strill in running status. 
> > > 
> > > If I use 1 process to run it, no problem.
> > > 
> > > Why ? 
> > > 
> > > My program:
> > > 
> > > int main (int argc, char **argv) 
> > > {
> > > 
> > > MPI_Init(, );
> > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > MPI_Comm world;
> > > world = MPI_COMM_WORLD;
> > > 
> > > if (myRank == 0)
> > > {
> > > do some things. 
> > > }
> > > 
> > > if (myRank != 0)
> > > {
> > > do some things. 
> > > MPI_Finalize();
> > > return 0 ;
> > > }
> > > if (myRank == 0)
> > > {
> > > MPI_Finalize();
> > > return 0;
> > > }
> > > 
> > > }
> > > 
> > > And, some output files get wrong codes, which can not be readible. 
> > > In 1-process case, the program can print correct results to these output 
> > > files . 
> > > 
> > > Any help is appreciated. 
> > > 
> > > thanks
> > > 
> > > Jack
> > > 
> > > Oct. 24 2010
> > > 
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Open MPI program cannot complete

2010-10-24 Thread Jack Bryan

Thanks for the reply. But, I use mpi_waitall() to make sure that all MPI 
communications have been done before a process call MPI_Finalize() and returns. 
Any help is appreciated.
thanks
Jack
Oct. 24 2010

> From: g...@ldeo.columbia.edu
> Date: Sun, 24 Oct 2010 17:31:11 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI program cannot complete
> 
> Hi Jack
> 
> It may depend on "do some things".
> Does it involve MPI communication?
> 
> Also, why not put MPI_Finalize();return 0 outside the ifs? 
> 
> Gus Correa
> 
> On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> 
> > Hi 
> > 
> > I got a problem of open MPI.
> > 
> > My program has 5 processes. 
> > 
> > All of them can run MPI_Finalize() and return 0. 
> > 
> > But, the whole program cannot be completed. 
> > 
> > In the MPI cluster job queue, it is strill in running status. 
> > 
> > If I use 1 process to run it, no problem.
> > 
> > Why ? 
> > 
> > My program:
> > 
> > int main (int argc, char **argv) 
> > {
> > 
> > MPI_Init(, );
> > MPI_Comm_rank(MPI_COMM_WORLD, );
> > MPI_Comm_size(MPI_COMM_WORLD, );
> > MPI_Comm world;
> > world = MPI_COMM_WORLD;
> > 
> > if (myRank == 0)
> > {
> > do some things. 
> > }
> > 
> > if (myRank != 0)
> > {
> > do some things. 
> > MPI_Finalize();
> > return 0 ;
> > }
> > if (myRank == 0)
> > {
> > MPI_Finalize();
> > return 0;
> > }
> > 
> > }
> > 
> > And, some output files get wrong codes, which can not be readible. 
> > In 1-process case, the program can print correct results to these output 
> > files . 
> > 
> > Any help is appreciated. 
> > 
> > thanks
> > 
> > Jack
> > 
> > Oct. 24 2010
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Open MPI program cannot complete

2010-10-24 Thread Gustavo Correa
Hi Jack

It may depend on "do some things".
Does it involve MPI communication?

Also, why not put MPI_Finalize();return 0 outside the ifs? 

Gus Correa

On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:

> Hi 
> 
> I got a problem of open MPI.
> 
> My program has 5 processes. 
> 
> All of them can run MPI_Finalize() and return 0. 
> 
> But, the whole program cannot be completed. 
> 
> In the MPI cluster job queue, it is strill in running status. 
> 
> If I use 1 process to run it, no problem.
> 
> Why ? 
> 
> My program:
> 
> int main (int argc, char **argv) 
> {
> 
>   MPI_Init(, );
>   MPI_Comm_rank(MPI_COMM_WORLD, );
>   MPI_Comm_size(MPI_COMM_WORLD, );
>   MPI_Comm world;
>   world = MPI_COMM_WORLD;
> 
>   if (myRank == 0)
> {
>   do some things. 
>   }
> 
>   if (myRank != 0)
>   {
>   do some things. 
>   MPI_Finalize();
>   return 0 ;
>   }
>   if (myRank == 0)
> {
>   MPI_Finalize();
>   return 0;
>   }
>   
> }
> 
> And, some output files get wrong codes, which can not be readible. 
> In 1-process case, the program can print correct results to these output 
> files . 
> 
> Any help is appreciated. 
> 
> thanks
> 
> Jack
> 
> Oct. 24 2010
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Open MPI program cannot complete

2010-10-24 Thread Jack Bryan

Hi 
I got a problem of open MPI.
My program has 5 processes. 
All of them can run MPI_Finalize() and return 0. 
But, the whole program cannot be completed. 
In the MPI cluster job queue, it is strill in running status. 
If I use 1 process to run it, no problem.
Why ? 
My program:
int main (int argc, char **argv) {
MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); 
MPI_Comm_size(MPI_COMM_WORLD, ); MPI_Comm world; world = MPI_COMM_WORLD;
if (myRank == 0){   do some things. }
if (myRank != 0){   do some things. 
MPI_Finalize(); return 0 ;  }   if (myRank == 0){   
MPI_Finalize(); return 0;   }   }
And, some output files get wrong codes, which can not be readible. In 1-process 
case, the program can print correct results to these output files . 
Any help is appreciated. 
thanks
Jack
Oct. 24 2010