> Perhaps you can post the relevant bits from your mom log.

The job ran for 1 minute. I did a qdel the following day.

node: /var/spool/pbs/mom_logs/20060606
06/06/2006 17:00:29;0001;   pbs_mom;Job;TMomFinalizeJob3;job 49.master started, 
pid = 20628
06/06/2006 17:00:29;0008;   pbs_mom;Job;49.master;Job Modified at request of 
[EMAIL PROTECTED]
06/06/2006 17:01:32;0008;   pbs_mom;Job;49.master;Terminated
06/06/2006 17:01:32;0001;   pbs_mom;Job;49.master;server rejected job obit - 
15008

master: /var/spool/pbs/server_logs/pbs_server.log
06/06/2006 17:00:27;0008;PBS_Server;Job;49.master;Job Queued at request of 
[EMAIL PROTECTED], owner = [EMAIL PROTECTED], job name = STDIN, queue = parallel
06/06/2006 17:00:27;0040;PBS_Server;Svr;master;Scheduler sent command new
06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Modified at request of 
[EMAIL PROTECTED]
06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Run at request of [EMAIL 
PROTECTED]
06/06/2006 17:00:29;0008;PBS_Server;Job;49.master;Job Modified at request of 
[EMAIL PROTECTED]
06/06/2006 17:01:27;0040;PBS_Server;Svr;master;Scheduler sent command time
06/06/2006 17:02:27;0040;PBS_Server;Svr;master;Scheduler sent command time
06/06/2006 17:03:27;0040;PBS_Server;Svr;master;Scheduler sent command time
. . .
06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;Job deleted at request of 
[EMAIL PROTECTED]
06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;Job sent signal SIGTERM on 
delete
06/07/2006 15:48:55;0008;PBS_Server;Job;49.master;MOM rejected signal during 
delete
 

> The gaps in the Ganglia graph looks really strange - any errors in http error 
> logs?
 
Only PHP "Notice" reports. The graphs are generated, just some of the data is 
missing. Would you know if the data is coming from a database or from a log 
file?

/var/log/httpd/error_log
[client 65.x.x.x] PHP Notice:  Undefined index:  G in 
/var/www/html/ganglia/get_context.php on line 9, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  s in 
/var/www/html/ganglia/get_context.php on line 13, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  cr in 
/var/www/html/ganglia/get_context.php on line 14, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  hc in 
/var/www/html/ganglia/get_context.php on line 15, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  sh in 
/var/www/html/ganglia/get_context.php on line 16, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  p in 
/var/www/html/ganglia/get_context.php on line 18, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  t in 
/var/www/html/ganglia/get_context.php on line 19, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  jr in 
/var/www/html/ganglia/get_context.php on line 21, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  js in 
/var/www/html/ganglia/get_context.php on line 23, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  gw in 
/var/www/html/ganglia/get_context.php on line 25, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  gs in 
/var/www/html/ganglia/get_context.php on line 27, referer: 
http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  g in 
/var/www/html/ganglia/graph.php on line 8, referer: http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  G in 
/var/www/html/ganglia/graph.php on line 9, referer: http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  me in 
/var/www/html/ganglia/graph.php on line 10, referer: http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined index:  vl in 
/var/www/html/ganglia/graph.php on line 15, referer: http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined variable:  command in 
/var/www/html/ganglia/graph.php on line 40, referer: http://142.x.x.x/ganglia/
[client 65.x.x.x] PHP Notice:  Undefined variable:  extras in 
/var/www/html/ganglia/graph.php on line 288, referer: http://142.x.x.x/ganglia/

> You can also try updating to the newer version of Ganglia and see if that 
> helps:

I'll try that in the morning before I destroy the cluster (4.2.1 -> 5.)

Title: Re: RE: [Oscar-devel] job status
Hi John:
 
Perhaps you can post the relevant bits from your mom log.
 
The gaps in the Ganglia graph looks really strange - any errors in http error logs?
 
You can also try updating to the newer version of Ganglia and see if that helps:

http://svn.oscar.openclustergroup.org/oscar/trunk/packages/ganglia/distro/rhel4-i386/
 
CentOS 4 should be fairly stable - please test it out and let us know if you encounter any problems!
 
Cheers,
 
Bernard


From: John Meskes [mailto:[EMAIL PROTECTED]
Sent: Wed 07/06/2006 18:48
To: Bernard Li
Cc: [email protected]
Subject: Re: RE: [Oscar-devel] job status

> > 1-after a job finishes, it stays in the qstat listing
> >  - I don't see an ending entry in the pbs/server_priv/accounting/  file
> What happens if you qdel it?

It goes away!
Checking the mom log, I see error 15008 which looks like no access to host. Would this be a file permissions problem? I'm quite sure pfilter was unselected.

> > 2-ganglia has gaps in the graphs. (See attached if it works)
> Can you perhaps take a screenshot of the entire page?

Well, whatever fit on one screen. see attached. If you need more I can do it at work across 2 screens in the morning.
Gaps are in each graph, so I would guess it's a database or logfile issue.

> What OS do you plan to install?

CentOS 4
I'll get the base, common, and rhel4 trunk files in the morning and give it a try.
Any comments/hints?
...John.

_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to