Title: Re: [Oscar-users] help!. building client image (scientific linux 305)
Can you try to restart pbs_mom on all your compute nodes and then run pbsnodes -a to see if the state becomes "free"?
 
# cexec /etc/init.d/pbs_mom restart
 
# pbsnodes -a
 
Cheers,
 
Bernard


From: Neil Costigan [mailto:[EMAIL PROTECTED]
Sent: Wed 05/04/2006 04:58
To: Bernard Li
Cc: [email protected]
Subject: Re: [Oscar-users] help!. building client image (scientific linux 305)

Bernard Li wrote:

>What's the output of 'pbsnodes -a'?
>

>

pbsnodes -a returns that all are unknown or down

[EMAIL PROTECTED] oscar]# pbsnodes -a
cc001.pg-207.computing.dcu.ie
     state = state-unknown,down
     np = 1
     properties = all
     ntype = cluster

cc002.pg-207.computing.dcu.ie
     state = state-unknown,down
     np = 1
     properties = all
     ntype = cluster

cc003.pg-207.computing.dcu.ie
     state = state-unknown,down
     np = 1
     properties = all
     ntype = cluster

cc004.pg-207.computing.dcu.ie
     state = state-unknown,down
     np = 1
     properties = all
     ntype = cluster


>Is pbs_mom running on all your client nodes?
>

>

a ps aux | grep pbs_mon on all nodes shows it is.


i have tried moving the pbs_oscar alias from the private to the public
address in /etc/hosts
with no success

to recap.

    * OSCAR version 4.2.1b5
    * Fedora Core 3
    * x86

- successfully passed test_cluster after inital set up with head node
and two compute nodes. happy days.
- test fails after adding two new nodes which are up and alive. can
mount /home and pass ssh pings, pvm etc.
but fail pbs

/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111

then fails with not enough free nodes.


/nc

>Cheers,
>
>Bernard
>

>
>>well it was going well
>>
>>i added two more nodes
>>and now it fails
>>
>>[EMAIL PROTECTED] oscar]# testing/test_cluster
>>Performing root tests...
>>Maui service 
>>check:maui                                                   
>>          
>>                 [PASSED]
>>Shutting down TORQUE Server:                               [  OK  ]
>>Connection refused
>>/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
>>Torque node 
>>check                                                        
>>          
>>                  [PASSED]
>>Starting TORQUE Server:                                    [  OK  ]
>>Torque service 
>>check:pbs_server                                             
>>          
>>               [PASSED]
>>/home 
>>mounts                                                       
>>          
>>                        [PASSED]
>>
>>Preparing user tests...
>>Performing user tests...
>>SSH ping 
>>test                                                         
>>          
>>                     [PASSED]
>>SSH server-
>> >node                                                       
>>          
>>                    [PASSED]
>>SSH node-
>> >server                                                     
>>          
>>                      [PASSED]
>>Checking for 4 free 
>>nodes:                                                       
>>          
>>          [FAILED]
>>Not enough free nodes. Tests incomplete.
>>Checking for 4 free 
>>nodes:                                                       
>>          
>>          [FAILED]
>>Not enough free nodes. Tests incomplete.
>>Checking for 4 free 
>>nodes:                                                       
>>          
>>          [FAILED]
>>Not enough free nodes. Tests incomplete.
>>Torque default queue 
>>definition                                                   
>>          
>>         [PASSED]
>>Checking for 4 free 
>>nodes:                                                       
>>          
>>          [FAILED]
>>Not enough free nodes. Tests incomplete.
>>Ganglia setup 
>>test                                                         
>>          
>>                [PASSED]
>>Ganglia node count 
>>test                                                         
>>          
>>           [PASSED]
>>
>>   
>>

Reply via email to