Hi Olivier,

I checked the entries you said and everything looks correct.
I suspect somehow the munge.key file. Let's explain this:

On the master:
Eth0 -> configured with private IP address like onto all the nodes, 
corresponding to pbs_oscar, nfs_oscar; connected to the cluster
Eth1 -> configured with a public IP address and FQDN name = hostname of the 
master; connected to the outside world
/etc/hosts reflect the above situation and is synchronized on all nodes.
The nodes know and address nfs_oscar, pbs_oscar  ..

NOTE: I followed what we had on previous OSCAR cluster which worked. But some 
things changed meanwhile, one of them
being the use of munge.

Back to the configuration, the "hostname" cmd returns FQDN onto the master node 
(eth1) and the munge.key
was generated with this hostname and copied to the nodes.
When executing munge -n | unmunge onto the master  it returns the FQDN name = 
hostname (eth1).
I am wondering if this is not an issue.

The error message from the pbs_server shows a conflict between the name 
assigned to the private IP address (eth0) and the
hostname which corresponds to the public IP address (eth1).
When looking at the pbs mom config with momctl I see all master IP addresses 
were configured to be accepted.

One idea was to change the hostname to the "private" name and re-generate the 
munge.key. I did it but I have some troubles now
so I will probably switch back.

Kind Regards,
Costel




From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr]
Sent: Wednesday, March 13, 2013 4:32 PM
To: Costel Seitan; oscar-users@lists.sourceforge.net
Subject: RE : [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in 
all situations:-) and new oscar-utils package.

Hi Costel,

Don't worry about disable service,. If your iptable is disabled, then it is ok.

If I'm correct, your nodes are on a private network connected to eth1 on your 
head (and eth0 is on the public network).
If this is the case, and If I remember well my old cluster which had the same 
architecture, the /etc/hosts pbs_oscar entry should point to the IP of the eth1.
Check (on the head *and* on the nodes) that /etc/torque/server_name contains a 
hostname that can be resolved by all nodes and points to the eth1 IP. Check 
that the /etc/hosts in the image, the nodes and the head have the correct entry 
for pbs_oscar (ort the host that is in /etc/torque/server_name
The restart all pbs_mom, trqauthd and pbs_server services.

If it doesn't fix the issues, as a last resort, check the return of the 
hostname commandon the nodes and try to use that in the 
/var/lib/torque/server_priv/nodes. If hostnames are not correct, fix that in 
/etc/sysconfig/network

Beyond that I don't have anymore ideas.
Best regards,

PS: Why did you had to manually edit the nodes files, did the step 7 failed to 
setup that correctly? I almost copletely rewriten the torque setup post install 
and handely many unhandeled errors situation.... Seems that I missed some :( 
(If you can send to me the log of the torque post install it may help me).

Olivier.
--
   Olivier LAHAYE
   CEA DRT/LIST/DCSI/DIR
________________________________
De : Costel Seitan [csei...@slb.com]
Date d'envoi : mercredi 13 mars 2013 16:08
À : oscar-users@lists.sourceforge.net
Cc: LAHAYE Olivier
Objet : RE: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in 
all situations:-) and new oscar-utils package.
Olivier,

I am not sure I selected disable service opkg .. I do not really remember.


I checked line by line

/var/lib/torque/server_priv/nodes : I created it myself and added the hostnames 
of all present and future nodes, one per line.
/etc/torque/server_name: contains "pbs_oscar » on all the nodes and the master
I did cexec iptables -L and seems disabled. I even did telnet masternode 15001 
and it looks OK.
I restarted pbs_mom on nodes and pbs_server several times. I also restarted 
trqauthd processes.
munge is running fine on all nodes and the server.

I changed the log level and the messages are more complete now. It looks like a 
host resolution pb.:

03/13/2013 15:51:28;0004;PBS_Server.4105;Svr;authenticate_user;Hosts do not 
match: Requested host <eth0_hostname>: credential host: <eth1_hostname>

Where
eth0_hostname is the first name appearing into the /etc/hosts file for the 
master (the same line with pbs_server)
And
eth1_hostname is the FQDN name = DNS hostname for the master as seen from 
outside the cluster.


Kind Regards,
Costel


From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr]
Sent: Wednesday, March 13, 2013 2:27 PM
To: Costel Seitan
Cc: oscar-users@lists.sourceforge.net
Subject: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in all 
situations:-) and new oscar-utils package.

did you select the disable service opkg? I don't remember if I recommended it. 
IT'll disable iptables if my memory is correct.

can you check /var/lib/torque/server_priv/nodes
can you check /etc/torque/server_name
anyway, can you check that iptables are disabled on nodes?
can you restart the pbs_mom on nodes and pbs_server on head?
can you check that munge is running on head and nodes

What does /opt/pbs/bin/pbsnodes reports?

Note that it is recommended to avoid running step 7 when all nodes are not up 
and running. I've fixed many post install scripts so they can be run multiple 
times, but sometimes there are things that can be run once. example: cexec will 
automatically disable nodes that are in /etc/c3.conf and that fail to respond. 
There is no command to automatically reenable dead nodes (I've asked for the 
feature upstream and received positive feedback, but no delays in feature 
availability).

Best regards,

Olivier.
PS: I forgot to reply to oscar-user the 1st time, but I think it can be of any 
use to other oscar users, so I put my answer again in the list. please accept 
my apologies for that.
--
   Olivier LAHAYE
   CEA DRT/LIST/DCSI/DIR

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to