Michael Edwards wrote:
This sounds like a bug in how PBS works with LDAP authentication to me.
The default OSCAR setup assumes you are using the head node as your
accounts server and simply propogates the /etc/passwd and related
files to the nodes. So if PBS is refering to those files for some
reason I immagine they will be fairly meaningless. Have you checked
the /etc/nsswitch.conf file on the nodes and host to see if they are
sane?
Yep. The files are accurate. The way I installed mine was:
Install server. No network to the world.
Install oscar and the nodes.
Install a second nic in the head node and configure.
Then config head node for ldap auth.
Then config ldap auth on the nodes.
I can ssh as an ldap user with no passwords or any warnings displayed
and then get the home directory mounted with automount. That user can
also mpirun.
So I am currently focusing on something to with pbs/torque. Because
now, I created a local user and pushed that passwd/shadow/group info to
the nodes. Can ssh, use mpirun, but qsub returns some strange info
trying to resolve a host known as: rhel4.ehpctc.intern and the job fails.
Will post any new discoveries.
Thanks,
jim
I haven't gotten around to it yet but I am planning on using LDAP at
our new facility so I am anxious to get this resolved.
On 2/16/06, Jim Summers <[EMAIL PROTECTED]> wrote:
Bernard Li wrote:
Hi Jim:
As that user, can you ssh to 192.168.0.6?
Yes.
[EMAIL PROTECTED] ~]$ ssh node6
[EMAIL PROTECTED] ~]$ ls -al
The error message "No Password Entry for User tmac0501" sounds fishy...
Strangest thing I ever saw.
does /etc/passwd, /etc/shadow look okay on that node?
Looks fine.
Regarding my local user test, I was wrong, using a local user that can
ssh to all nodes, and can run mpirun without any errors, the pbs
submitted jobs now are returning the following:
----------
p0_4722: (61.563956) Procgroup:
p0_4722: (61.564118) entry 0: node15.oscardomain 0 0
/admin/localpbs/clib/node-test localpbs
p0_4722: (61.564145) entry 1: rhel4.ehpctc.intern 1 1
/admin/localpbs/clib/node-test localpbs
p0_4722: p4_error: Could not gethostbyname for host
rhel4.ehpctc.intern; may be invalid name
: 62
----------
This is repeatable even with a varying number of nodes. I am not sure
where it is getting the
rhel4.ehpctc.intern
entry. The machines.LINUX is fine, mpirun works. It seems like there
may be a problem in the scheduler?
Cheers,
Bernard
P.S. Did you run through the "Test OSCAR Setup" step and all tests passed?
Yes most of the tests passed. It seems like one did fail, either it
was ganglia or some pvm tests. I have configured the ganglia and it
works fine. Since we aren't planning on using the pvm I disregarded it.
in the queue. I am seeing the following in the mom_logs on the nodes
involved:
=========================
02/15/2006 15:06:22;0008; pbs_mom;Job;6.master;No Password Entry for
User tmac0501
02/15/2006 15:10:26;0008; pbs_mom;Job;6.master;ERROR: received
request 'ABORT_JOB' from 192.168.0.6:1023 for job '6.master' (job does
not exist locally)
========================
Not sure what I have to configure. I haven't seen anyhting in the pbs
docs regarding authentication yet.
The ldap users can ssh to each node and their home is mounted.
TIA
--
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users
--
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users