Olivier,

I am not sure I selected disable service opkg .. I do not really remember.


I checked line by line

/var/lib/torque/server_priv/nodes : I created it myself and added the hostnames 
of all present and future nodes, one per line.
/etc/torque/server_name: contains "pbs_oscar > on all the nodes and the master
I did cexec iptables -L and seems disabled. I even did telnet masternode 15001 
and it looks OK.
I restarted pbs_mom on nodes and pbs_server several times. I also restarted 
trqauthd processes.
munge is running fine on all nodes and the server.

I changed the log level and the messages are more complete now. It looks like a 
host resolution pb.:

03/13/2013 15:51:28;0004;PBS_Server.4105;Svr;authenticate_user;Hosts do not 
match: Requested host <eth0_hostname>: credential host: <eth1_hostname>

Where
eth0_hostname is the first name appearing into the /etc/hosts file for the 
master (the same line with pbs_server)
And
eth1_hostname is the FQDN name = DNS hostname for the master as seen from 
outside the cluster.


Kind Regards,
Costel


From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr]
Sent: Wednesday, March 13, 2013 2:27 PM
To: Costel Seitan
Cc: oscar-users@lists.sourceforge.net
Subject: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in all 
situations:-) and new oscar-utils package.

did you select the disable service opkg? I don't remember if I recommended it. 
IT'll disable iptables if my memory is correct.

can you check /var/lib/torque/server_priv/nodes
can you check /etc/torque/server_name
anyway, can you check that iptables are disabled on nodes?
can you restart the pbs_mom on nodes and pbs_server on head?
can you check that munge is running on head and nodes

What does /opt/pbs/bin/pbsnodes reports?

Note that it is recommended to avoid running step 7 when all nodes are not up 
and running. I've fixed many post install scripts so they can be run multiple 
times, but sometimes there are things that can be run once. example: cexec will 
automatically disable nodes that are in /etc/c3.conf and that fail to respond. 
There is no command to automatically reenable dead nodes (I've asked for the 
feature upstream and received positive feedback, but no delays in feature 
availability).

Best regards,

Olivier.
PS: I forgot to reply to oscar-user the 1st time, but I think it can be of any 
use to other oscar users, so I put my answer again in the list. please accept 
my apologies for that.
--
   Olivier LAHAYE
   CEA DRT/LIST/DCSI/DIR

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to