> From: [email protected] > To: [email protected] > CC: [email protected] > Subject: RE: [petsc-users] Scalability of PETSc on vesta.alcf > Date: Mon, 20 Jan 2014 10:32:32 -0700 > > Roc Wang <[email protected]> writes: > > I tried c16 for 1024 ranks and 2048 ranks, but the job cannot run > > successfully. It seems the job was started but the program didn't > > execute. Please take a look at the attached log file for 1024 with > > c16 mode. Is this because some environment parameters I didn't set > > right? Actually, the same program is only able to run with 1024 > > ranks in c1, c2 and c32, c64 modes and 2048 ranks in c64 mode. > > You have non-scalable "Generate Vector" and VecView (the latter maybe > because you don't use MPI-IO?). It is probably failing at this step. > > | qsub -A SUGAR -t 00:10:00 -n 512 --proccount 2048 --mode script ./vesta.job > > I thought you said you were trying c16?
Yes, I said so. But, I tried both ways: qsub the executable and qsub script.
The command is like this:
qsub -n 64 -t 10 --mode c16 -O p1024_c16 --env "F00=a:BAR=b" ./x.r -ksp_type
bcgsl -ksp_bcgsl_ell 1 -sub_pc_type ilu -sub_pc_factor_levels 3 -sub_ksp_type
preonly -my_ksp_monitor true -ksp_view -log_summary
the script:
#!/bin/bash
proN=1024
preName=p$proN
echo "Script JOB with Jobid COBALT_JOBID="$preName
qsub -A SUGAR -t 00:10:00 -n 64 --proccount $proN --mode script ./vesta.job
and vesta.job:
#!/bin/sh
Nrank=1024
echo Starting Cobalt job script
LOCARGS="--block $COBALT_PARTNAME ${COBALT_CORNER:+--corner} $COBALT_CORNER
${COBALT_SHAPE:+--shape} $COBALT_SHAPE"
runjob $LOCARGS -n $Nrank -p 16 : x.r -ksp_type bcgsl -ksp_bcgsl_ell 1
-sub_pc_type ilu -sub_pc_factor_levels 3 -sub_ksp_type preonly -my_ksp_monitor
true -ksp_view -log_summary
echo End of jobscript.sh
exit 0
Both of them cannot run the program successfully. In these two ways, the
runtime log showed the job started but no output to stdout file.
I just run the same program by:
qsub -n 16 -t 10 --mode c64 -O n1024_c64 --env "F00=a:BAR=b" ./x.r -ksp_type
bcgsl -ksp_bcgsl_ell 1 -sub_pc_type ilu -sub_pc_factor_levels 3 -sub_ksp_type
preonly -my_ksp_monitor true -ksp_view -log_summary
The job was able to run and the stdout file showed all the runtime output. If
there is non-scalable "Generate Vector" and VecView (the latter maybe> because
you don't use MPI-IO?), why is c64 mode able to run? It's sort of strange to
me. Thanks.
n1024_c16_mode.cobaltlog
Description: Binary data
n1024_c16_mode.error
Description: Binary data
