Re: [OMPI users] run openMPI jobs with SGE,

2010-04-12 Thread Reuti
Hi,

Am 09.04.2010 um 23:48 schrieb Cristobal Navarro:

> Thanks, 
> now i get mixed results and everything seems to be working ok with mixed mpi 
> xecution
> 
> is it normal that after receiving the results, the hosts remain busy like 15 
> seconds ??
> example

yes. This is the time SGE needs for housekeeping, ist can even take some 
minutes (especially if you kill a parallel job).

-- Reuti


> master:common master$ qrsh -verbose -pe orte 10 /opt/openmpi-1.4.1/bin/mpirun 
> -np 10 hostname
> Your job 65 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 65 has been successfully scheduled.
> Establishing builtin session to host worker00.local ...
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> master.local
> master.local
> master.local
> master.local
> master.local
> #after some seconds, i query the hosts status and slots are still used
> master:common master$ qstat -f
> queuename  qtype resv/used/tot. load_avg arch  
> states
> -
> all.q@master.local BIP   0/5/16 0.02 darwin-x86
>  65 0.55500 mpirun master   r 04/09/2010 17:44:36 5   
>  
> -
> all.q@worker00.local   BIP   0/5/16 0.01 darwin-x86
>  65 0.55500 mpirun master   r 04/09/2010 17:44:36 5   
>  
> master:common master$ 
> 
> but after waiting more time, they get free again
> master:common master$ qstat -f
> queuename  qtype resv/used/tot. load_avg arch  
> states
> -
> all.q@master.local BIP   0/0/16 0.01 darwin-x86
> -
> all.q@worker00.local   BIP   0/0/16 0.01 darwin-x86 
> 
> anyways these are just details, thanks to your help the important aspects are 
> working.
> Cristobal
> 
> 
> 
> 
> On Fri, Apr 9, 2010 at 1:34 PM, Reuti  wrote:
> Am 09.04.2010 um 18:57 schrieb Cristobal Navarro:
> 
> > sorry the command was missing a number
> >
> > as you said it should be
> >
> > qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> > ---
> > this is my parallel enviroment
> > qconf -sp pempi
> > pe_namepempi
> > slots  210
> > user_lists NONE
> > xuser_listsNONE
> > start_proc_args/usr/bin/true
> > stop_proc_args /usr/bin/true
> > allocation_rule$pe_slots
> 
> $pe_slots means that all slots must come from one and the same machine (e.g. 
> for smp jobs). You can try $round_robin.
> 
> -- Reuti
> 
> 
> > control_slaves TRUE
> > job_is_first_task  FALSE
> > urgency_slots  min
> > accounting_summary TRUE
> >
> > this is the queue
> > qconf -sq cola.q
> > qname cola.q
> > hostlist  @allhosts
> > seq_no0
> > load_thresholds   np_load_avg=1.75
> > suspend_thresholdsNONE
> > nsuspend  1
> > suspend_interval  00:05:00
> > priority  0
> > min_cpu_interval  00:05:00
> > processorsUNDEFINED
> > qtype BATCH INTERACTIVE
> > ckpt_list NONE
> > pe_list   make pempi
> > rerun FALSE
> > slots 2
> > tmpdir/tmp
> > shell /bin/csh
> >
> > i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe 
> > pempi N   argument and also the full path to mpirun as you guys pointed, it 
> > works!!!
> > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2 
> > /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 125 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 125 has been successfully scheduled.
> > Establishing builtin session to host ijorge.local ...
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2 
> > /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 126 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 126 has been successfully scheduled.
> > Establishing builtin session to host neoideo ...
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > cristobal@neoideo:~$
> >
> > i just wonder why i didnt get mixed hostnames? like
> > neoideo
> > neoideo
> > ijorge.local
> > ijorge.local
> > neoideo
> > ijorge.local
> >
> > ??
> >
> > thanks for the help already!!!
> >
> > Cristobal
> >
> >
> >
> >
> > On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc C

[OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Hideyuki Jitsumoto
Hi Members,

I tried to use checkpoint/restart by openmpi.
But I can not get collect checkpoint data.
I prepared execution environment as follows, the strings in () mean
name of output file which attached on next e-mail ( for mail size
limitation ):

1. installed BLCR and checked BLCR is working correctly by "make check"
2. executed ./configure with some parameters on openMPI source dir
(config.output / config.log)
3. executed make and make install (make.output.2 / install.output.2)
4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
/${INSTALL_DIR}/lib/openmpi
5. make ~/.openmpi/mca-params.conf (mca-params.conf)
6. compiled NPB and executed with -am ft-enable-cr
7. invoked ompi-checkpoint 

As result, I got the message "Checkpoint failed: no processes checkpointed."
(cr_test_cg)

In addition, when I confirmed open_info output as your demo movie, I got
"MCA crs: none (MCA v2.0, API v2.0, Component v1.4.1)" (open_info.output)

How should I do for checkpointing ?
Any guidance in this regard would be highly appreciated.

Thank you,
Hideyuki

--
Sincerely Yours,
Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp)
Tokyo Institute of Technology
Global Scientific Information and Computing center (Matsuoka Lab.)


Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Hideyuki Jitsumoto
I attache a file (1/2) on this email as mentioned previous one.
I'm very sorry to send the large log file.

Thank you,
Hideyuki


*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Hideyuki Jitsumoto
I attache a file (2/2) on this email as mentioned previous one.

Thank you,
Hideyuki


*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Hideyuki Jitsumoto
I resend this mail for sending error ( I misused the email address on FROM.)
Sorry if you receive multiple copies of this email.

I attache a file (1/2) on this email as mentioned previous one.


openmpi_config_log.tar.gz
Description: GNU Zip compressed data


Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Hideyuki Jitsumoto
I resend this mail for sending error ( I misused the email address on FROM.)
Sorry if you receive multiple copies of this email.

I attache a file (2/2) on this email as mentioned previous one.

Thank you,
Hideyuki


openmpi_others_log.tar.gz
Description: GNU Zip compressed data


Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-12 Thread Fernando Lemos
On Mon, Apr 12, 2010 at 7:36 AM, Hideyuki Jitsumoto
 wrote:
> Hi Members,
>
> I tried to use checkpoint/restart by openmpi.
> But I can not get collect checkpoint data.
> I prepared execution environment as follows, the strings in () mean
> name of output file which attached on next e-mail ( for mail size
> limitation ):
>
> 1. installed BLCR and checked BLCR is working correctly by "make check"
> 2. executed ./configure with some parameters on openMPI source dir
> (config.output / config.log)
> 3. executed make and make install (make.output.2 / install.output.2)
> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
> /${INSTALL_DIR}/lib/openmpi
> 5. make ~/.openmpi/mca-params.conf (mca-params.conf)
> 6. compiled NPB and executed with -am ft-enable-cr
> 7. invoked ompi-checkpoint 
>
> As result, I got the message "Checkpoint failed: no processes checkpointed."
> (cr_test_cg)

Are you using a shared file system? You need to use a shared file
system for checkpointing with 1.4.1:

https://svn.open-mpi.org/trac/ompi/ticket/2139

Regards,


Re: [OMPI users] Installing MPE on existing Open-MPI installation for C++ programs

2010-04-12 Thread chan

Try using "mpecc -mpicc=" to compile your C++ program.
"mpicc -mpilog" is only available in the MPICH (not MPICH2 which provides
mpicc -mpe=mpilog).  Non-MPICH(2) based implementation needs to use
mpecc instead to enable MPE.

A.Chan

- "Ridhi Dua"  wrote:

> Hello,
> I have successfully installed MPE for my existing Open-MPI
> installation and
> have been able compile using the compiler wrapper 'mpecc'.
> But, I have some C++ MPI programs which cannot be compiled using
> mpecc. How
> do I achieve this,or do I need to make changes to my MPE installation
> procedure? I used the following command for my current installation.
> 
> ./configure --prefix=/gpfs/fs3/home/xxx/mybin \
> MPI_CC=/sw/openmpi/bin/mpicc \
> --disable-f77 \
> --with-java=/usr/java/jdk1.6.0_13
> 
> (Also, I have managed to use mpecc, but not 'mpicc -mpilog hello.c' Is
> this
> assumption even correct for Open-MPI or is it an option only for MPICH
> )
> 
> Thank you.
> ~ ridZ
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users