Hmmm...

06/21 12:19:58 INFO:     PBS node atarnode16.atar set to state Busy
(job-exclusive)
06/21 12:19:58 INFO:     node 'atarnode16.atar' changed states from Idle
to Busy
06/21 12:19:58 ALERT:    unexpected node transition on node
'atarnode16.atar'  Idle -> Busy
06/21 12:19:58
MPBSNodeUpdate(atarnode16.atar,atarnode16.atar,Busy,RHEL4U2-
X86.BCGSC.CA)
06/21 12:19:58 INFO:     node atarnode16.atar has joblist '0/19.master'
06/21 12:19:58 INFO:     job 19 adds 1 processors per task to node
atarnode16.atar (1)
06/21 12:19:58
MPBSLoadQueueInfo(RHEL4U2-X86.BCGSC.CA,atarnode16.atar,SC)

This seems weird, why is it referencing RHEL4U2-X86.BCGSC.CA?  That's
the build host...

Anything from the TORQUE logs?

Cheers,

Bernard 

> -----Original Message-----
> From: John Meskes [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, June 21, 2006 10:40
> To: Bernard Li
> Cc: [email protected]
> Subject: RE: [Oscar-devel] job status
> 
> On 21 Jun 2006 at 8:55, Bernard Li wrote:
> 
> > Have you looked through the TORQUE/Maui logs to check if 
> anything is out
> > of the ordinary?
> 
> maui.log grows at 1M per minute. I don't see any different 
> entries at the supposed job 
> finishing time.
> Here's what I see when the job starts... /opt/maui/maui.log
> 06/21 12:19:47 INFO:     48 feasible tasks found for job 19:0 
> in partition DEFAULT (10 
> Needed)
> 06/21 12:19:47 INFO:     tasks located for job 19:  10 of 10 
> required (16 feasible)
> 06/21 12:19:47 MJobStart(19)
> 06/21 12:19:47 
> MJobDistributeTasks(19,RHEL4U2-X86.BCGSC.CA,NodeList,TaskMap)
> 06/21 12:19:47 MAMAllocJReserve(19,RIndex,ErrMsg)
> 06/21 12:19:47 MRMJobStart(19,Msg,SC)
> 06/21 12:19:47 MPBSJobStart(19,RHEL4U2-X86.BCGSC.CA,Msg,SC)
> 06/21 12:19:47 
> MPBSJobModify(19,Resource_List,Resource,atarnode25.atar+atarno
> de24.atar+atarnode23.
> atar+atarnode22.atar+atarnode21.atar+atarnode20.atar+atarnode1
> 9.atar+atarnode18.atar+a
> tarnode17.atar+atarnode16.atar)
> 06/21 12:19:47 MPBSJobModify(19,Resource_List,Resource,10:ppn=1)
> 06/21 12:19:47 INFO:     job '19' successfully started
> 06/21 12:19:47 MStatUpdateActiveJobUsage(19)
> 06/21 12:19:47 MResJCreate(19,MNodeList,00:00:00,ActiveJob,Res)
> 06/21 12:19:47 INFO:     starting job '19'
> 06/21 12:19:47 INFO:     1 jobs started on iteration 288
> Active Jobs------
> ------------------
> 06/21 12:19:47 INFO:     resources available after 
> scheduling: N: 6  P: 6
> ...skipping
> 06/21 12:19:58 INFO:     PBS node atarnode16.atar set to 
> state Busy (job-exclusive)
> 06/21 12:19:58 INFO:     node 'atarnode16.atar' changed 
> states from Idle to Busy
> 06/21 12:19:58 ALERT:    unexpected node transition on node 
> 'atarnode16.atar'  Idle -> Busy
> 06/21 12:19:58 
> MPBSNodeUpdate(atarnode16.atar,atarnode16.atar,Busy,RHEL4U2-
> X86.BCGSC.CA)
> 06/21 12:19:58 INFO:     node atarnode16.atar has joblist 
> '0/19.master'
> 06/21 12:19:58 INFO:     job 19 adds 1 processors per task to 
> node atarnode16.atar (1)
> 06/21 12:19:58 
> MPBSLoadQueueInfo(RHEL4U2-X86.BCGSC.CA,atarnode16.atar,SC)
> 
> 
> > With trunk, I believe that the dnsdomainname of client nodes are not
> > correctly set (i.e. if you run "hostname" on your client, 
> it does not
> > show FQDN).
> 
> I can ping by either name.
> 
> > Not sure if this is related though...
> > 
> > Cheers,
> > 
> > Bernard
> > 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] 
> > > [mailto:[EMAIL PROTECTED] On Behalf 
> > > Of John Meskes
> > > Sent: Wednesday, June 21, 2006 8:51
> > > To: [email protected]
> > > Subject: Re: [Oscar-devel] job status
> > > 
> > > Using Oscar5(r5000) on CentOS4.3
> > > I still have a problem with qstat.
> > > 
> > > 1-When a job completes successfully, it is not removed 
> from the queue
> > > (although the walltime does stop accumulating)
> > > If it dies due to a restriction such as memory limit 
> > > exceeded, it is removed.
> > > 
> > > 2-Cannot qstat from a node
> > > pbs_iff: Access from host not allowed, or unknown host
> > > No Permission.
> > > qstat: cannot connect to server pbs_oscar (errno=15007)
> > > 
> > > On 6 Jun 2006 at 15:16, John Meskes wrote:
> > > 
> > > > I need help with another few problems:
> > > > using CentOS, and nightly 4.2.1r4598-20060417, with 
> > > upgraded lam-7.1.2 
> > > > and torque-2.0.0p8-2
> > > > 
> > > > 1-after a job finishes, it stays in the qstat listing
> > > >  - I don't see an ending entry in the 
> > > pbs/server_priv/accounting/ file
> > > > 
> > > > 2-ganglia has gaps in the graphs. (See attached if it works)
> > > > 
> > > > Is there a nightly tarball for OSCAR-5 that's close to 
> > > production status?
> > > > ...John.
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Oscar-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/oscar-devel
> > > 
> 
> 
> 

All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to