Olivier,
I am not sure I selected disable service opkg .. I do not really remember.
I checked line by line
/var/lib/torque/server_priv/nodes : I created it myself and added the hostnames
of all present and future nodes, one per line.
/etc/torque/server_name: contains "pbs_oscar > on all the nodes and the master
I did cexec iptables -L and seems disabled. I even did telnet masternode 15001
and it looks OK.
I restarted pbs_mom on nodes and pbs_server several times. I also restarted
trqauthd processes.
munge is running fine on all nodes and the server.
I changed the log level and the messages are more complete now. It looks like a
host resolution pb.:
03/13/2013 15:51:28;0004;PBS_Server.4105;Svr;authenticate_user;Hosts do not
match: Requested host <eth0_hostname>: credential host: <eth1_hostname>
Where
eth0_hostname is the first name appearing into the /etc/hosts file for the
master (the same line with pbs_server)
And
eth1_hostname is the FQDN name = DNS hostname for the master as seen from
outside the cluster.
Kind Regards,
Costel
From: LAHAYE Olivier [mailto:[email protected]]
Sent: Wednesday, March 13, 2013 2:27 PM
To: Costel Seitan
Cc: [email protected]
Subject: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in all
situations:-) and new oscar-utils package.
did you select the disable service opkg? I don't remember if I recommended it.
IT'll disable iptables if my memory is correct.
can you check /var/lib/torque/server_priv/nodes
can you check /etc/torque/server_name
anyway, can you check that iptables are disabled on nodes?
can you restart the pbs_mom on nodes and pbs_server on head?
can you check that munge is running on head and nodes
What does /opt/pbs/bin/pbsnodes reports?
Note that it is recommended to avoid running step 7 when all nodes are not up
and running. I've fixed many post install scripts so they can be run multiple
times, but sometimes there are things that can be run once. example: cexec will
automatically disable nodes that are in /etc/c3.conf and that fail to respond.
There is no command to automatically reenable dead nodes (I've asked for the
feature upstream and received positive feedback, but no delays in feature
availability).
Best regards,
Olivier.
PS: I forgot to reply to oscar-user the 1st time, but I think it can be of any
use to other oscar users, so I put my answer again in the list. please accept
my apologies for that.
--
Olivier LAHAYE
CEA DRT/LIST/DCSI/DIR
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users