Here is the qstat -f post. Note that node28 has physically 4 procs but
the moment it only runs these two jobs (each one with 1 proc)  :

/==============================/
$ qstat -f 191768 191769
Job Id: 191768.cluster
    Job_Name = vpu.GEP_1
    Job_Owner = [EMAIL PROTECTED]
    resources_used.cput = 16:34:45
    resources_used.mem = 572892kb
    resources_used.vmem = 689168kb
    resources_used.walltime = 19:37:00
    job_state = R
    queue = heavy
    server = cluster
    Checkpoint = u
    ctime = Mon Jan 21 15:54:06 2008
    Error_Path = ...(deleted)
    exec_host = node28/1
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Tue Jan 22 09:17:58 2008
    Output_Path = ...(deleted)
    Priority = 0
    qtime = Mon Jan 21 15:54:06 2008
    Rerunable = True
    Resource_List.cput = 240:00:00
    Resource_List.mem = 512mb
    Resource_List.ncpus = 1
    Resource_List.neednodes = 1
    Resource_List.nice = 13
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    Resource_List.walltime = 100:00:00
    session_id = 27220
    substate = 42
    Variable_List = PBS_O_HOME=/groups/Pu_group/ad_user,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=ad_user,
        PBS_O_PATH=~/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/
        mpich-1.2.6/bin:.:/usr/local.cc/bin:/usr/local/maui/bin:/db1/Local/src
        /Jackal/bin:/db1/Local/src/blast-2.2.14/bin:/d/bioinfo/users/meme//bin
        
:/usr/X11R6/bin,PBS_O_MAIL=/var/spool/mail/ad_user,PBS_O_SHELL=/bin/tcsh,
        PBS_O_HOST=cluster,
        
PBS_O_WORKDIR=/groups/Pu_group/ad_user/projects/nu4siteRuns/hiv-nonrecombi
        nants/sampling,PBS_O_QUEUE=heavy
    euser = ad_user
    egroup = Pu_group
    hashname = 191768.bioc
    queue_rank = 132881
    queue_type = E
    etime = Mon Jan 21 15:54:06 2008
    submit_args = -q heavy -N vpu.GEP_1 -l walltime=100:00:00

Job Id: 191769.cluster
    Job_Name = env.GF5_1
    Job_Owner = [EMAIL PROTECTED]
    resources_used.cput = 08:38:47
    resources_used.mem = 3946548kb
    resources_used.vmem = 4513768kb
    resources_used.walltime = 19:37:00
    job_state = R
    queue = heavy
    server = cluster
    Checkpoint = u
    ctime = Mon Jan 21 15:54:06 2008
    Error_Path = ...(deleted)
    exec_host = node28/2
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Tue Jan 22 09:17:58 2008
    Output_Path = ...(deleted)
    Priority = 0
    qtime = Mon Jan 21 15:54:06 2008
    Rerunable = True
    Resource_List.cput = 240:00:00
    Resource_List.mem = 512mb
    Resource_List.ncpus = 1
    Resource_List.neednodes = 1
    Resource_List.nice = 13
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    Resource_List.walltime = 100:00:00
    session_id = 27230
    substate = 42
    Variable_List = PBS_O_HOME=/groups/Pu_group/ad_user,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=ad_user,
        PBS_O_PATH=~/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/
        mpich-1.2.6/bin:.:/usr/local.cc/bin:/usr/local/maui/bin:/db1/Local/src
        /Jackal/bin:/db1/Local/src/blast-2.2.14/bin:/d/bioinfo/users/meme//bin
        
:/usr/X11R6/bin,PBS_O_MAIL=/var/spool/mail/ad_user,PBS_O_SHELL=/bin/tcsh,
        PBS_O_HOST=cluster,
        
PBS_O_WORKDIR=/groups/Pu_group/ad_user/projects/nu4siteRuns/hiv-nonrecombi
        nants/sampling,PBS_O_QUEUE=heavy
    euser = ad_user
    egroup = Pu_group
    hashname = 191769.bioc
    queue_rank = 132882
    queue_type = E
    etime = Mon Jan 21 15:54:06 2008
    submit_args = -q heavy -N env.GF5_1 -l walltime=100:00:00

/===========================/




On 1/22/08, Jan Ploski <[EMAIL PROTECTED]> wrote:
> "Itay M" <[EMAIL PROTECTED]> schrieb am 01/22/2008 01:20:10 PM:
>
> > pbsnodes -a says:
> > state = busy
> > np = 4   --- which is correct, this machine has 4 processors, but at
> > the moment only 2 processrs (=jobs) are running on it. And this is
> > where I think the problem is - while the node should allow up to 4
> > procs to be used on it, it only utilizes 2 procs. The other 2 are
> > doing nothing.
> > And yes, this is consistent with the diagnose -n shows : each node
> > that has the (for example)  "WARNING:  node 'node17' has more
> > processors utilized than dedicated (4 > 2) " problem, also uses less
> > processors than it should at the moment.
> >
> > How can I make sure the node allows to use it's maximum number of
> processors?
>
> Post the qstat -f output for the jobs running on the node which should
> have free processors.
>
> Regards,
> Jan Ploski
>
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to