Hi Olivier,
I am coming back to you again.
I installed few nodes and right now I am performing the oscar_wizard Step 7
"Complete Cluster Setup".
After I fixed few issues (discussed previously during our e-mail exchanges)
with munge which
now seems to work fine
I keep receiving messages during the setup of torque like:
[torque] Updating pbs_server nodes
/opt/pbs/bin/pbsnodes: Invalid credential MSG=Hosts do not match
qmgr obj=<mynodename1> svr=default: Invalid credential MSG=Hosts do not match
create node <mynodename1> np = 8 , properties = all
qmgr obj=<mynodename2> svr=default: Invalid credential MSG=Hosts do not match
..
[torque] Creating TORQUE workq queue...
qmgr obj= svr=localhost: Invalid credential MSG=Hosts do not match
Max open servers: 9
create queue workq
qmgr obj=workq svr=default: Invalid credential MSG=Hosts do not match
create queue workq
and the step 7 fails.
Into the /var/log/messages log file I found
Mar 13 11:00:36 epsl90 PBS_Server: LOG_ERROR::svr_is_request, bad attempt to
connect from
192.168.0.87:1018 (address not trusted - check entry in server_priv/nodes)
The server_priv/nodes was not created so I created it with the list of all my
nodes:
<mynodename1>
<mynodename2>
..
It does not change anything.
cexec "tail /var/log/torque/mom_logs/20130313" returns entries like
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Mismatching protocols.
Expected protocol 4 but read reply for 0
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Could not read reply for
protocol 4 command 4: End of File
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_update_stat, Couldn't read a
reply from the server
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Mismatching protocols.
Expected protocol 4 but read reply for 0
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Could not read reply for
protocol 4 command 4: End of File
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_update_stat, Couldn't read a
reply from the server
pbs_mom.2577;Svr;pbs_mom;LOG_ERROR::mom_server_all_update_stat, Could not
contact any of the servers to send an update
pbs_mom.2577;Svr;pbs_mom;Torque Mom Version = 4.1.4-snap.201211201307, loglevel
= 0
and the same entries onto the master node.
The pbsnodes -a returns :
# /opt/pbs/bin/pbsnodes -a
/opt/pbs/bin/pbsnodes: Invalid credential MSG=Hosts do not match
There is also something else which is strange:
When looking at qstat I see is a symlink to /etc/alternatives/qstat which is a
symlink to /opt/pbs/bin/qstat-torque
When looking at qsub I see is a symlink to /etc/alternatives/qsub which is a
symlink to /usr/bin/qsub-torque which does not exist.
Do you have an idea what goes wrong?
Kind Regards,
Costel
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users