From: [EMAIL PROTECTED] on behalf of Umberto Amato
Sent: Tue 03/01/2006 07:54
To: [email protected]
Subject: Re: [Oscar-users] Torque fails creating work queue at step 7
Dear Bernard,
I moved pbs_oscar alias from the private to the
public address in /etc/hosts
and now Step 7 is successfull (thanks
again).
I'm stuck now in the Testing step
[EMAIL PROTECTED] testing]#
./test_cluster
Performing root tests...
Connection
refused
/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar,
error=111
Torque node check [PASSED]
Starting TORQUE Server: [ OK
]
Torque service check:pbs_server [PASSED]
Maui service check:maui
[PASSED]
/home mounts [PASSED]
Preparing user tests...
Performing user
tests...
SSH ping test [PASSED]
SSH server->node [PASSED]
SSH
node->server [PASSED]
Checking for 18 free nodes: [FAILED]
Not enough
free nodes. Tests incomplete.
Checking for 18 free nodes: [FAILED]
Not
enough free nodes. Tests incomplete.
Can't find string terminator '"'
anywhere before EOF at -e line 1.
Ganglia setup test [FAILED]
Torque
default queue definition [PASSED]
Checking for 18 free nodes: [FAILED]
Not
enough free nodes. Tests incomplete.
Checking for 18 free nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
There were issues
running some user test scripts. Please check your logs
located in
/home/oscartst.
Run APItests...
Running Installation tests for
pvm
[PASS] 2006-01-03T15:46:26Z pvmd-path-ls.apt
[PASS]
2006-01-03T15:46:26Z envvar-pvm_arch.apt
[PASS] 2006-01-03T15:46:26Z
envvar-pvm_root.apt
[PASS] 2006-01-03T15:46:26Z pvmd-path-which.apt
[PASS]
2006-01-03T15:46:26Z modulecmd-path-ls.apt
[PASS] 2006-01-03T15:46:26Z
pvm-module-list.apt
[PASS] 2006-01-03T15:46:26Z
pvm-module-show-pvm_rsh.apt
[PASS] 2006-01-03T15:46:26Z
pvm-module-show-pvm_arch.apt
[PASS] 2006-01-03T15:46:26Z
pvm-module-show-pvm_root.apt
and more precisely with (the rest is a
consequence)
Connection refused
/opt/pbs/bin/pbsnodes: cannot connect
to server pbs_oscar, error=111
A look at
/var/spool/pbs/server_logs/pbs_server.log shows
01/03/2006
15:30:14;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
8680,
loglevel=0
01/03/2006
15:30:17;0001;PBS_Server;Svr;PBS_Server;Connection refused (111)
in
contact_sched, Could not contact Scheduler - port 15004
01/03/2006
15:31:14;0040;PBS_Server;Svr;lilligrid.na.iac.cnr.it;Scheduler
sent command
scheduler_first
01/03/2006 15:31:30;0002;PBS_Server;Svr;Log;Log
opened
that is the problem should arise from contact_sched.
No
much hint from the pbs forum. Any further (highly appreciated) hint
from
anyone?
Umberto
----- Original Message
-----
From: "Bernard Li" <[EMAIL PROTECTED]>
To: "Umberto Amato"
<[EMAIL PROTECTED]>;
<[email protected]>
Sent:
Tuesday, January 03, 2006 12:52 AM
Subject: RE: [Oscar-users] Torque fails
creating work queue at step 7
Hi Umberto:
Try putting
"pbs_oscar" in your _external_ interface instead of your
internal interface
and see if it works. Also, there are a bunch of log
files in
/var/spool/pbs which you can take a look at
also.
Cheers,
Bernard
________________________________
From:
[EMAIL PROTECTED] on behalf of Umberto Amato
Sent: Mon
02/01/2006 08:53
To: [email protected]
Subject:
[Oscar-users] Torque fails creating work queue at step 7
Dear
all,
I-m installing OSCAR 4.1 on a cluster made with dual 64 bit Opteron
boards
and Scientific Linux Operating System 4.1. I have a problem
already
considered on the list, that is failure of Torque in creating work
queue at
Step 7. In http://sourceforge.net/mailarchive/message.php?msg_id=11522706
the
issue had been closed for lack of occurrences: here am I.
The relevant
part of the oscarinstall.log is:
Updating pbs_server
nodes
/opt/pbs/bin/pbsnodes: Server has no node list
qmgr
obj=lilligridfast1.na.iac.cnr.it svr=default: Unauthorized Request
create
node lilligridfast1.na.iac.cnr.it np = 2 , properties = all
qmgr
obj=lilligridfast2.na.iac.cnr.it svr=default: Unauthorized Request
create
node lilligridfast2.na.iac.cnr.it np = 2 , properties = all
qmgr
obj=lilligridfast3.na.iac.cnr.it svr=default: Unauthorized Request
create
node lilligridfast3.na.iac.cnr.it np = 2 , properties = all
qmgr
obj=lilligridfast4.na.iac.cnr.it svr=default: Unauthorized Request
create
node lilligridfast4.na.iac.cnr.it np = 2 , properties = all
qmgr
obj=lilligridfast5.na.iac.cnr.it svr=default: Unauthorized Request
create
node lilligridfast5.na.iac.cnr.it np = 2 , properties = all
Shutting down
TORQUE Server: [60G[ [0;32mOK[0;39m ]
Starting TORQUE Server: [60G[
[0;32mOK[0;39m ]
Creating torque workq queue...
Max open servers:
4
qmgr obj=workq svr=default: Unauthorized Request
create queue
workq
Configuration of Torque queues failed
at
/opt/oscar/packages/torque/scripts/post_install line 315
Script
/opt/oscar/packages/torque/scripts/post_install exitted badly with
exit code
'2' at ./post_install line 44
Couldn't run 'post_install' script for torque
at ./post_install line 45
Some of the post install scripts failed, please
check your logs for more
info at ./post_install line 50
--> Step 7:
Failed to properly complete the cluster install; please check
the
logs
I also attach the /etc/hosts file, because from the mail exchange it
turns
out to be the problem:
# Do not remove the following line, or
various programs
# that require network functionality will fail.
127.0.0.1
localhost.localdomain localhost
192.168.1.100 lilligridfast100.na.iac.cnr.it
lilligridfast100 oscar_server
nfs_oscar pbs_oscar
140.164.12.100
lilligrid.na.iac.cnr.it lilligrid
# These entries are managed by SIS, please
don't modify them.
192.168.1.1 lilligridfast1.na.iac.cnr.it
lilligridfast1
192.168.1.2 lilligridfast2.na.iac.cnr.it
lilligridfast2
192.168.1.3 lilligridfast3.na.iac.cnr.it
lilligridfast3
192.168.1.4 lilligridfast4.na.iac.cnr.it
lilligridfast4
192.168.1.5 lilligridfast5.na.iac.cnr.it
lilligridfast5
Ping to any of the aliases of 192.168.1.100 (including
pbs_oscar) is
successfull from the server and from the nodes, while the
corresponding host
command fails.
Any help will be greatly
appreciated
Umberto Amato
Istituto per le Applicazioni del Calcolo
-Mauro Picone- CNR
Via Pietro Castellino111
80131 Napoli
E-mail:
[EMAIL PROTECTED]
-------------------------------------------------------
This
SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for
problems? Stop! Download the new AJAX search engine that
makes
searching your log files as easy as surfing the web.
DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Oscar-users
mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users
-------------------------------------------------------
This
SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for
problems? Stop! Download the new AJAX search engine that
makes
searching your log files as easy as surfing the web.
DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Oscar-users
mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users
