Hi Olivier,

I am coming back to you again.

I installed few nodes and right now I am performing the oscar_wizard Step 7 
"Complete Cluster Setup".
After I fixed few issues (discussed previously during our e-mail exchanges) 
with munge which
now seems to work fine
I keep receiving messages during the setup of torque like:

[torque] Updating pbs_server nodes
/opt/pbs/bin/pbsnodes: Invalid credential MSG=Hosts do not match
qmgr obj=<mynodename1> svr=default: Invalid credential MSG=Hosts do not match
create node <mynodename1>   np = 8 , properties = all
qmgr obj=<mynodename2>  svr=default: Invalid credential MSG=Hosts do not match
..

[torque] Creating TORQUE workq queue...
qmgr obj= svr=localhost: Invalid credential MSG=Hosts do not match
Max open servers: 9
create queue workq
qmgr obj=workq svr=default: Invalid credential MSG=Hosts do not match
create queue workq


and the step 7 fails.


Into the /var/log/messages log file I found

Mar 13 11:00:36 epsl90 PBS_Server: LOG_ERROR::svr_is_request, bad attempt to 
connect from
192.168.0.87:1018 (address not trusted - check entry in server_priv/nodes)

The server_priv/nodes was not created so I created it with the list of all my 
nodes:

<mynodename1>
<mynodename2>
..

It does not change anything.

cexec "tail /var/log/torque/mom_logs/20130313" returns entries like

pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not 
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Mismatching protocols. 
Expected protocol 4 but read reply for 0
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Could not read reply for 
protocol 4 command 4: End of File
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_update_stat, Couldn't read a 
reply from the server
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not 
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Mismatching protocols. 
Expected protocol 4 but read reply for 0
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Could not read reply for 
protocol 4 command 4: End of File
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_update_stat, Couldn't read a 
reply from the server
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not 
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;Torque Mom Version = 4.1.4-snap.201211201307, loglevel 
= 0

and the same entries onto the master node.

The pbsnodes -a returns :

# /opt/pbs/bin/pbsnodes -a
/opt/pbs/bin/pbsnodes: Invalid credential MSG=Hosts do not match


There is also something else which is strange:
When looking at qstat I see is a symlink to /etc/alternatives/qstat which  is a 
symlink to /opt/pbs/bin/qstat-torque
When looking at qsub I see is a symlink to /etc/alternatives/qsub which is a 
symlink to /usr/bin/qsub-torque which does not exist.


Do you have an idea what goes wrong?


Kind Regards,
Costel
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to