Title: Re: [Oscar-users] PBS Jobs
Hi Jim:
 
By default, OSCAR installs both LAM/MPI and MPICH.  Both provides "mpirun".  To switch between the two implementations, you will use the program called "switcher".
 
You probably want to figure out which MPI implementation you are actually using for that particular user, whether it is MPICH or LAM/MPI.  For instance:
 
[EMAIL PROTECTED] scripts]$ switcher mpi --show
system:default=lam-7.0.6
user:exists=1
 
Also, I can tell you what rhel4.ehpctc.intern is - it's the host which Erich builds the RPMs on ;-)  Anyways, I suppose there is a hostlist somewhere that has that hostname?  I have seen similar issues before, but I can't remember exactly what the issue was...
 
Cheers,
 
Bernard


From: [EMAIL PROTECTED] on behalf of Jim Summers
Sent: Thu 16/02/2006 14:23
Cc: Steve Barnet; OscarUsers
Subject: Re: [Oscar-users] PBS Jobs

Jim Summers wrote:
> Steve Barnet wrote:
>
>> Jim Summers wrote:
>>
>> Is that name how the nodes are referenced in
>> /var/spool/pbs/server_priv/nodes ?
>
>> I can look at currently, but I'd check the nodes file on the
>> server and the pbs_mom config clients on the clients to make
>> sure that those names aren't embedded in the files. If they are
>> and they don't appear in LDAP/DNS/hosts files that would be
>> a problem.
>
>
> I am going to go ahead and eliminate dns from lookups via nsswitch.conf
> and hardocode all of the ncessary entries into /etc/hosts and see if
> that helps.
>
> Will post with results.

Still no joy.  I have instructed the nodes to only look in their local
files for host information.  Still get the following from a pbs
submitted job:
--------------------
p0_8254: (60.443138) Procgroup:
p0_8254: (60.443275)     entry 0: node15.oscardomain 0 0
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443301)     entry 1: rhel4.ehpctc.intern 1 1
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443322)     entry 2: rhel4.ehpctc.intern 1 2
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443343)     entry 3: rhel4.ehpctc.intern 1 3
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443364)     entry 4: rhel4.ehpctc.intern 1 4
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443384)     entry 5: rhel4.ehpctc.intern 1 5
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443405)     entry 6: rhel4.ehpctc.intern 1 6
/admin/localpbs/clib/node-test localpbs
p0_8254: (60.443426)     entry 7: rhel4.ehpctc.intern 1 7
/admin/localpbs/clib/node-test localpbs
p0_8254:  p4_error: Could not gethostbyname for host
rhel4.ehpctc.intern; may be invalid name
: 61
--------------------

No idea at this point.  With the p4 stuff, one would think mpich, but
when I mpirun the job it is happy.

Thanks


>
>>
>> Best,
>>
>> ---Steve
>
>

--
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to