You will get this message if the job (the LAM/MPI unit tests) were "still 
running" at the time the unit-test infrastructure timed-out and marked the test 
failed.

Note, that I'm assuming you look at this after the job actually completed.  You 
can verify this with

        qstat -a

If nothing is shown, the job completed.  If you see output like

 Job ID          Username  Queue  Jobname  ...
 --------------  --------  -----  -------
 n.oscar.opencl  oscartst  workq  lamtest

then the job is still running.  Clearly, if you are looking at this too long 
after the test start, you can assume it's hung. You can delete the job with

        qdel n

where n is from the status line.  Please let us know or file a bug on this.

If no job status is shown above, the "FAILURE" label on the cluster test could 
have been caused by

1) The job just didn't finish by the time the infrastructure timed-out.  You 
can see if this is the case by looking for the "lamtest.err" and "lamtest.out" 
files in ~oscartst/lam.  If the .err file is empty and the .out file isn't, the 
infrastructure didn't wait long enough.  
Please let us know or file a bug on this.

2) The job failed with an error.  In this case, you will hopefully see a 
non-empty ~oscartst/lam/lamtest.err. Please let us know or file a bug on this.

-- 
David N. Lombard
�
My comments represent my opinions, not those of Intel Corporation.
________________________________________
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bernard Li
Sent: Tuesday, March 01, 2005 8:11 PM
To: Dave Thom; [email protected]
Subject: RE: [Oscar-users] No Free Nodes when testing

Hi Dave:
�
BTW, the 'No Free Nodes' message means that there are still jobs in the 
queue/running - you can check your queue status by running:
�
qstat
�
on your headnode.� You may need to manually delete jobs that were stuck, for 
instance.
�
Cheers,
�
Bernard

________________________________________
From: [EMAIL PROTECTED] on behalf of Bernard Li
Sent: Mon 28/02/2005 10:11 AM
To: Dave Thom; [email protected]
Subject: RE: [Oscar-users] No Free Nodes when testing
Hi Dave:

1) Have you tried re-running the tests?
2) Are you using PBS (which you d/led from OPD) or Torque (which came
with the tarball)?

Cheers,

Bernard

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Dave Thom
> Sent: Thursday, February 24, 2005 7:21
> To: [email protected]
> Subject: Re: [Oscar-users] No Free Nodes when testing
>
> Hi All,
>
> Bernard, thanks for the reply.
> I took the easy option and started over from scratch, and now
> most things work, although I still get the following on
> running the tests via the wizard
>
> Performing root tests...
> PBS node check�����������������������������������������������
>� [PASSED]
> PBS service check:pbs_server���������������������������������
>� [PASSED]
> Maui service check:maui��������������������������������������
>� [PASSED]
> /home mounts�������������������������������������������������
>� [PASSED]
>
> Preparing user tests...
> Performing user tests...
> SSH ping test������������������������������������������������
>� [PASSED]
> SSH server->node���������������������������������������������
>� [PASSED]
> SSH node->server���������������������������������������������
>� [PASSED]
> PVM (via PBS)������������������������������������������������
>� [PASSED]
> PBS default queue definition���������������������������������
>� [PASSED]
> PBS Shell Test�����������������������������������������������
>� [PASSED]
> Ganglia test�������������������������������������������������
>� [PASSED]
> LAM/MPI (via PBS)��������������������������������������������
>� [FAILED]
> Checking for 5 free nodes:�����������������������������������
>� [FAILED]
> Not enough free nodes. Tests incomplete.
>
> checking http://localhost/ganglia (screenshot attached) shows
> all the hosts as up tail -f on the pbs log while running the
> tests doesn't look like anything is wrong
>
>
>
>
> On Wednesday 23 Feb 2005 18:00, Bernard Li wrote:
> > Hi Dave:
> >
> > gmond and gmetad are part of Ganglia - can you check to see if your
> > Ganglia graphs are coming up correctly (http://localhost/ganglia).�
> > Also worth a shot is taking a look at gmond.conf and gmetad.conf
> > configuration files to see if anything is out of the ordinary.
> >
> > Having said that, this should not be related to the PBS/Torque
> > problems you are encountering - have you tried re-running the tests?
> >
> > Cheers,
> >
> > Bernard
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On
> Behalf Of Dave
> > > Thom
> > > Sent: Wednesday, February 23, 2005 3:09
> > > To: [email protected]
> > > Subject: [Oscar-users] No Free Nodes when testing
> > >
> > > Hi all,
> > > I've installed OSCAR on 5 Dell GX150's (one server 4 clients) the
> > > server has two NICs and the clients are all on a private subnet.
> > > there were a few issues with networking that I have sorted and
> > > everything is almost working - the last issue comes when
> running the
> > > "test cluster setup"
> > > Server node is Fedora Core 2 workstation install and unpatched
> > >
> > >
> > > Performing root tests...
> > > Maui service check:maui
> > >� [PASSED]
> > > PBS node check
> > >� [PASSED]
> > > PBS service check:pbs_server
> > >� [PASSED]
> > > /home mounts
> > >� [PASSED]
> > >
> > > Preparing user tests...
> > > Performing user tests...
> > > SSH ping test
> > >� [PASSED]
> > > SSH server->node
> > >� [PASSED]
> > > SSH node->server
> > >� [PASSED]
> > > Checking for 4 free nodes:
> > >� [FAILED]
> > > Not enough free nodes. Tests incomplete.
> > > Ganglia test
> > >� [PASSED]
> > > Checking for 4 free nodes:
> > >� [FAILED]
> > > Not enough free nodes. Tests incomplete.
> > > Checking for 4 free nodes:
> > >� [FAILED]
> > > Not enough free nodes. Tests incomplete.
> > > PBS default queue definition
> > >� [PASSED]
> > > Checking for 4 free nodes:
> > >� [FAILED]
> > > Not enough free nodes. Tests incomplete.
> > > There were issues running some user test scripts.� Please
> check your
> > > logs
> > >
> > > on tailing /var/log/messages on the server I see continual
> > >
> > > Feb 23 10:47:12 cluster0 /usr/sbin/gmond[2246]:
> server_thread() Host
> > > xxx.xxx.xxx.xxx tried to connect and was refused Feb 23 10:47:12
> > > cluster0 /usr/sbin/gmetad[1842]: Process XML (MATHS Cluster):
> > > XML_ParseBuffer() error at line 1: no element found
> > >
> > > Are the two related?, any hints on getting gmond running
> properly,
> > > or fixing the no free node error?
> > >
> > > many thanks for your time
> > >
> > > --
> > > Dave Thom
> > > IT Support
> > > Mathematics and Statistics, GU
> > > e: [EMAIL PROTECTED]
> > > t: 0141 330 3521
> > > f: 0141 330 4111
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide Read honest
> & candid
> > reviews on hundreds of IT Products from real users.
> > Discover which products truly live up to the hype. Start
> reading now.
> > http://ads.osdn.com/?ad_ide95&alloc_id396&op=Click
> > _______________________________________________
> > Oscar-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>
> --
> Dave Thom
> IT Support
> Mathematics and Statistics, GU
> e: [EMAIL PROTECTED]
> t: 0141 330 3521
> f: 0141 330 4111
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=ick
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to