> Perhaps you can post the relevant bits from your mom log. The job ran for 1 minute. I did a qdel the following day.
node: /var/spool/pbs/mom_logs/20060606 06/06/2006 17:00:29;0001; pbs_mom;Job;TMomFinalizeJob3;job 49.master started, pid = 20628 06/06/2006 17:00:29;0008; pbs_mom;Job;49.master;Job Modified at request of [EMAIL PROTECTED] 06/06/2006 17:01:32;0008; pbs_mom;Job;49.master;Terminated 06/06/2006 17:01:32;0001; pbs_mom;Job;49.master;server rejected job obit - 15008 master: /var/spool/pbs/server_logs/pbs_server.log 06/06/2006 17:00:27;0008;PBS_Server;Job;49.master;Job Queued at request of [EMAIL PROTECTED], owner = [EMAIL PROTECTED], job name = STDIN, queue = parallel 06/06/2006 17:00:27;0040;PBS_Server;Svr;master;Scheduler sent command new 06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Modified at request of [EMAIL PROTECTED] 06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Run at request of [EMAIL PROTECTED] 06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Modified at request of [EMAIL PROTECTED] 06/06/2006 17:01:27;0040;PBS_Server;Svr;master;Scheduler sent command time 06/06/2006 17:02:27;0040;PBS_Server;Svr;master;Scheduler sent command time 06/06/2006 17:03:27;0040;PBS_Server;Svr;master;Scheduler sent command time . . . 06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;Job deleted at request of [EMAIL PROTECTED] 06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;Job sent signal SIGTERM on delete 06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;MOM rejected signal during delete > The gaps in the Ganglia graph looks really strange - any errors in http error > logs? Only PHP "Notice" reports. The graphs are generated, just some of the data is missing. Would you know if the data is coming from a database or from a log file? /var/log/httpd/error_log [client 65.x.x.x] PHP Notice: Undefined index: G in /var/www/html/ganglia/get_context.php on line 9, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: s in /var/www/html/ganglia/get_context.php on line 13, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: cr in /var/www/html/ganglia/get_context.php on line 14, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: hc in /var/www/html/ganglia/get_context.php on line 15, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: sh in /var/www/html/ganglia/get_context.php on line 16, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: p in /var/www/html/ganglia/get_context.php on line 18, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: t in /var/www/html/ganglia/get_context.php on line 19, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: jr in /var/www/html/ganglia/get_context.php on line 21, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: js in /var/www/html/ganglia/get_context.php on line 23, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: gw in /var/www/html/ganglia/get_context.php on line 25, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: gs in /var/www/html/ganglia/get_context.php on line 27, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: g in /var/www/html/ganglia/graph.php on line 8, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: G in /var/www/html/ganglia/graph.php on line 9, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: me in /var/www/html/ganglia/graph.php on line 10, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined index: vl in /var/www/html/ganglia/graph.php on line 15, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined variable: command in /var/www/html/ganglia/graph.php on line 40, referer: http://142.x.x.x/ganglia/ [client 65.x.x.x] PHP Notice: Undefined variable: extras in /var/www/html/ganglia/graph.php on line 288, referer: http://142.x.x.x/ganglia/ > You can also try updating to the newer version of Ganglia and see if that > helps: I'll try that in the morning before I destroy the cluster (4.2.1 -> 5.)Title: Re: RE: [Oscar-devel] job status
http://svn.oscar.openclustergroup.org/oscar/trunk/packages/ganglia/distro/rhel4-i386/
From: John Meskes [mailto:[EMAIL PROTECTED]
Sent: Wed 07/06/2006 18:48
To: Bernard Li
Cc: [email protected]
Subject: Re: RE: [Oscar-devel] job status
> > 1-after a job finishes, it stays in the qstat
listing
> > - I don't see an ending entry in the
pbs/server_priv/accounting/ file
> What happens if you qdel
it?
It goes away!
Checking the mom log, I see error 15008 which looks
like no access to host. Would this be a file permissions problem? I'm quite sure
pfilter was unselected.
> > 2-ganglia has gaps in the graphs. (See
attached if it works)
> Can you perhaps take a screenshot of the entire
page?
Well, whatever fit on one screen. see attached. If you need more I
can do it at work across 2 screens in the morning.
Gaps are in each graph, so
I would guess it's a database or logfile issue.
> What OS do you plan
to install?
CentOS 4
I'll get the base, common, and rhel4 trunk files
in the morning and give it a try.
Any
comments/hints?
...John.
_______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
_______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
