On 21 Jun 2006 at 8:55, Bernard Li wrote: > Have you looked through the TORQUE/Maui logs to check if anything is out > of the ordinary?
maui.log grows at 1M per minute. I don't see any different entries at the supposed job finishing time. Here's what I see when the job starts... /opt/maui/maui.log 06/21 12:19:47 INFO: 48 feasible tasks found for job 19:0 in partition DEFAULT (10 Needed) 06/21 12:19:47 INFO: tasks located for job 19: 10 of 10 required (16 feasible) 06/21 12:19:47 MJobStart(19) 06/21 12:19:47 MJobDistributeTasks(19,RHEL4U2-X86.BCGSC.CA,NodeList,TaskMap) 06/21 12:19:47 MAMAllocJReserve(19,RIndex,ErrMsg) 06/21 12:19:47 MRMJobStart(19,Msg,SC) 06/21 12:19:47 MPBSJobStart(19,RHEL4U2-X86.BCGSC.CA,Msg,SC) 06/21 12:19:47 MPBSJobModify(19,Resource_List,Resource,atarnode25.atar+atarnode24.atar+atarnode23. atar+atarnode22.atar+atarnode21.atar+atarnode20.atar+atarnode19.atar+atarnode18.atar+a tarnode17.atar+atarnode16.atar) 06/21 12:19:47 MPBSJobModify(19,Resource_List,Resource,10:ppn=1) 06/21 12:19:47 INFO: job '19' successfully started 06/21 12:19:47 MStatUpdateActiveJobUsage(19) 06/21 12:19:47 MResJCreate(19,MNodeList,00:00:00,ActiveJob,Res) 06/21 12:19:47 INFO: starting job '19' 06/21 12:19:47 INFO: 1 jobs started on iteration 288 Active Jobs------ ------------------ 06/21 12:19:47 INFO: resources available after scheduling: N: 6 P: 6 ...skipping 06/21 12:19:58 INFO: PBS node atarnode16.atar set to state Busy (job-exclusive) 06/21 12:19:58 INFO: node 'atarnode16.atar' changed states from Idle to Busy 06/21 12:19:58 ALERT: unexpected node transition on node 'atarnode16.atar' Idle -> Busy 06/21 12:19:58 MPBSNodeUpdate(atarnode16.atar,atarnode16.atar,Busy,RHEL4U2- X86.BCGSC.CA) 06/21 12:19:58 INFO: node atarnode16.atar has joblist '0/19.master' 06/21 12:19:58 INFO: job 19 adds 1 processors per task to node atarnode16.atar (1) 06/21 12:19:58 MPBSLoadQueueInfo(RHEL4U2-X86.BCGSC.CA,atarnode16.atar,SC) > With trunk, I believe that the dnsdomainname of client nodes are not > correctly set (i.e. if you run "hostname" on your client, it does not > show FQDN). I can ping by either name. > Not sure if this is related though... > > Cheers, > > Bernard > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf > > Of John Meskes > > Sent: Wednesday, June 21, 2006 8:51 > > To: [email protected] > > Subject: Re: [Oscar-devel] job status > > > > Using Oscar5(r5000) on CentOS4.3 > > I still have a problem with qstat. > > > > 1-When a job completes successfully, it is not removed from the queue > > (although the walltime does stop accumulating) > > If it dies due to a restriction such as memory limit > > exceeded, it is removed. > > > > 2-Cannot qstat from a node > > pbs_iff: Access from host not allowed, or unknown host > > No Permission. > > qstat: cannot connect to server pbs_oscar (errno=15007) > > > > On 6 Jun 2006 at 15:16, John Meskes wrote: > > > > > I need help with another few problems: > > > using CentOS, and nightly 4.2.1r4598-20060417, with > > upgraded lam-7.1.2 > > > and torque-2.0.0p8-2 > > > > > > 1-after a job finishes, it stays in the qstat listing > > > - I don't see an ending entry in the > > pbs/server_priv/accounting/ file > > > > > > 2-ganglia has gaps in the graphs. (See attached if it works) > > > > > > Is there a nightly tarball for OSCAR-5 that's close to > > production status? > > > ...John. > > > > > > > > > > > > > > > > > > _______________________________________________ > > Oscar-devel mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/oscar-devel > > All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
