Hi Robert,

One minor note is that you have to check to see if any normal user can access 
from server to client nodes and vice versa but not a root user.

Regards, 
- DongInn


Robert Ashcraft wrote:
> Thanks again for your help. 
> 
> We have tried a number of things but cannot seem to find what the
> problem is.  I will list a few of the things we have tried and some
> other information to see if it helps point to anything...
> 
> Problem:
> Cannot SSH from the compute back to the head node, which causes several
> problems.
> 
> General Stuff:
> - The server can SSH into the compute node with no problem
> - We can ping the server from the compute node
> - The node mounts /home via NFS fine, and /home is accessible on the
> compute node
> - pbsnodes -a displays node information correctly about the node and
> resources available
> - You can see submitted jobs using qstat
> - When you submit a job (either the test_cluster script or a user script
> with qsub), it will go to the node, run, and then return to the "Q"
> state after it runs.  I assume that normally, the node will communicate
> with the head node telling that it is done, then the head node will send
> a command to delete the job.  No output or error files are generated
> where the script was run.  The jobs can be manually deleted using qdel. 
> I assume this is caused by the failure of node->server SSH.
> 
> 
> What we have tried:
> - The stuff mentioned thus far in this thread
> - Running the start_over script and reinstalling OSCAR with only one
> active ethernet connection set to the internal network.  The result is
> the same. 
> - Changing the internal IP and hostname of the head node
> - checked the sshd_config and hosts files
> - other things I can't remember at the moment.
> 
> 
> Does anyone have any other ideas why the SSH may only be one way in
> nature???  This is terribly frustrating and it seems like no one has had
> a similar problem when trying to set up a cluster.  I really am at a
> loss what could be the cause. 
> 
> Is there any reason why a driver problem would allow pings and NFS but
> only allow a one-way SSH connection?
> 
> Hardware information:
> Server:  Q6600 on an eVGA nVidia motherboard (nVidia ethernet and SCSI
> controller)
> Nodes: Dual E5345 on an ASUS Intel motherboard (Intel ethernet and ahci
> SCSI controller)
> (is there any reason why the hardware differences could cause such a
> problem...  I already put a modprobe.conf file from a compute node
> installed with linux in the image file prior to PXE boot, otherwise it
> would kernel panic on reboot after imaging)
> 
> Thanks again for any help you can provide...
> 
> Sincerely,
> 
> Rob
> 
> 
> 
> Michael Edwards wrote:
>> Okay, so looking at the original oscarinstall.log and reading your
>> original message again, two things jump out at me.  The first is that
>> in your /etc/hosts file you have one hostname mapping to two different
>> IP addresses.  This may cause confusion.
>>
>> The other thing I notice is that OSCAR isn't seeing the hostname in
>> your /etc/hosts file at all, but a very long one instead that looks
>> like a DHCP assigned one.
>>
>> Take a look at my suggestions here (
>> http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipTwoNetworkInterfaces)
>> and see if they make any sense.
>>
>> Not sure what the issue with the new IP is but it seems like there was
>> some conflict with the old one since ssh is now at least trying to
>> connect.
>>
>> On 10/26/07, *Robert Ashcraft* <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     So, I changed the internal IP to 192.168.1.10
>>     <http://192.168.1.10> to prevent any would-be conflicts, ran the
>>     start_over script, and went through the usual setup process.
>>
>>     Basically the same thing happened, but I got the iptable and
>>     verbose SSH output.  Maybe you can help make a little sense out of
>>     it. 
>>
>>     Here is the iptable output:
>>     =====
>>
>>     Chain INPUT (policy ACCEPT)
>>     target     prot opt source               destination
>>
>>     Chain FORWARD (policy ACCEPT)
>>     target     prot opt source               destination
>>
>>     Chain OUTPUT (policy ACCEPT)
>>
>>     target     prot opt source               destination
>>
>>     =====
>>
>>     Here is the SSH -vvv command when run from oscarnode01 trying to
>>     get back into the head node:
>>
>>     =====
>>
>>     [EMAIL PROTECTED] ~]# ssh -vvv 192.168.1.10 <http://192.168.1.10>
>>     OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003
>>     debug1: Reading configuration data /etc/ssh/ssh_config
>>     debug1: Applying options for *
>>     debug2: ssh_connect: needpriv 0
>>     debug1: Connecting to 192.168.1.10 <http://192.168.1.10> [
>>     192.168.1.10 <http://192.168.1.10>] port 22.
>>     debug1: Connection established.
>>     debug1: permanently_set_uid: 0/0
>>     debug1: identity file //root//.ssh/identity type 0
>>     debug3: Not a RSA1 key file /
>>     /root//.ssh/id_rsa.
>>     debug2: key_type_from_name: unknown key type '-----BEGIN'
>>     debug3: key_read: missing keytype
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_rsa type 1
>>     debug3: Not a RSA1 key file //root//.ssh/id_dsa.
>>     debug2: key_type_from_name: unknown key type '-----BEGIN'
>>
>>     debug3: key_read: missing keytype
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_dsa type 2
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_rsa type 1
>>     debug3: Not a RSA1 key file //root//.ssh/id_dsa.
>>     debug2: key_type_from_name: unknown key type '-----BEGIN'
>>
>>     debug3: key_read: missing keytype
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_dsa type 2
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_rsa type 1
>>     debug3: Not a RSA1 key file //root//.ssh/id_dsa.
>>     debug2: key_type_from_name: unknown key type '-----BEGIN'
>>
>>     debug3: key_read: missing keytype
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_dsa type 2
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_rsa type 1
>>     debug3: Not a RSA1 key file //root//.ssh/id_dsa.
>>     debug2: key_type_from_name: unknown key type '-----BEGIN'
>>
>>     debug3: key_read: missing keytype
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug3: key_read: missing whitespace
>>     debug2: key_type_from_name: unknown key type '-----END'
>>
>>     debug3: key_read: missing keytype
>>     debug1: identity file //root//.ssh/id_dsa type 2
>>
>>     =====
>>
>>     Thanks again for your help.
>>
>>     Rob
>>
>>
>>
>>     Michael Edwards wrote:
>>>     OSCAR doesn't need a gateway on the head node to work.  One way
>>>     communication generally implies there is a firewall on the head
>>>     node or other routing problem.
>>>
>>>     What do you get from "iptables -L" on the head node?
>>>
>>>     You might try using a different address for the head node than
>>>     192.168.0.1 <http://192.168.0.1>, that is a common default
>>>     address for networking hardware and can cause problems like this
>>>     occasionally.  I have become fond of 10.0.0.x because it isn't
>>>     used as much.
>>>
>>>     You could also change the switch address too, if that is the problem.
>>>
>>>     On 10/26/07, *Robert Ashcraft* < [EMAIL PROTECTED]
>>>     <mailto:[EMAIL PROTECTED]>> wrote:
>>>
>>>         Michael,
>>>
>>>         Thanks for the response.  My colleague has tried those things
>>>         and they did not seems to help.  The "ssh -vvv" command does
>>>         not provide any output and presumably just hangs somewhere in
>>>         the connection process. 
>>>
>>>         Just as some information...  If we set up the the compute
>>>         node to connect to DCHP over the external MIT network (not
>>>         through the switch), I was able to get two way communication
>>>         (through the MIT network).  This seems to imply that it is
>>>         some wrong with the static IP setup or something related to
>>>         the switch.  However, the one-way communication is puzzling. 
>>>         I don't think we have a gateway specified for the head node
>>>         internal IP address, only the IP (192.168.0.1
>>>         <http://192.168.0.1>) and subnet mask (255.255.255.0
>>>         <http://255.255.255.0>).  Could that be the source of any
>>>         problems?
>>>
>>>         We will continue to try to diagnose the problem, but any more
>>>         insight would be welcomed.  Thanks,
>>>
>>>         Rob
>>>
>>>
>>>         Michael Edwards wrote:
>>>>         Do you have the firewall on the head node turned off?
>>>>
>>>>         You can check by doing "iptables -L" or checking under the
>>>>         "security level" utility.
>>>>
>>>>         You can also try doing "ssh -vvv [EMAIL PROTECTED] " and see if
>>>>         it gives you any clues.
>>>>
>>>>         On 10/25/07, *Robert Wilson Ashcraft* <[EMAIL PROTECTED]
>>>>         <mailto:[EMAIL PROTECTED]>> wrote:
>>>>
>>>>             Hi,
>>>>
>>>>             I am attempting to set up an OSCAR cluster.  I have
>>>>             gotten through everything
>>>>             past step 7, Complete CLuster Setup (which finished
>>>>             successfully).
>>>>
>>>>             However, when I run the cluster tests, I get several
>>>>             failures, most noticibly
>>>>             with the node--> server communication.
>>>>
>>>>             This is also confirmed by the fact that I can SSH to a
>>>>             node, but when I am
>>>>             logged into the node, I cannot SSH back into the server
>>>>             (it just hangs... no
>>>>             error message, but I can ctrl-C out of it)
>>>>
>>>>             Do you have any idea why the SSH from the client to
>>>>             server would not be working?
>>>>
>>>>             I have a feeling that if this problem is solved, the
>>>>             other failed test will work
>>>>             themselves out.
>>>>
>>>>             I am attaching the oscarinstall.log file in case that helps.
>>>>
>>>>             Here is my /etc/hosts file if that helps:
>>>>             # Do not remove the following line, or various programs
>>>>             # that require network functionality will fail.
>>>>             127.0.0.1 <http://127.0.0.1>      
>>>>             localhost.localdomain   localhost
>>>>             192.168.0.1 <http://192.168.0.1>     pharos.mit.edu
>>>>             <http://pharos.mit.edu> pharos oscar_server nfs_oscar
>>>>             pbs_oscar
>>>>             18.80.7.242 <http://18.80.7.242>     pharos.mit.edu
>>>>             <http://pharos.mit.edu> pharos
>>>>
>>>>             # These entries are managed by SIS, please don't modify
>>>>             them.
>>>>             192.168.0.100 <http://192.168.0.100>        
>>>>             oscarnode01.mit.edu
>>>>             <http://oscarnode01.mit.edu>        oscarnode01
>>>>
>>>>
>>>>             Thanks for your help.
>>>>
>>>>             Rob Ashcraft
>>>>
>>>>
>>>>             
>>>> -------------------------------------------------------------------------
>>>>
>>>>             This SF.net email is sponsored by: Splunk Inc.
>>>>             Still grepping through log files to find problems?  Stop.
>>>>             Now Search log events and configuration files using AJAX
>>>>             and a browser.
>>>>             Download your FREE copy of Splunk now >>
>>>>             http://get.splunk.com/
>>>>             _______________________________________________
>>>>             Oscar-users mailing list
>>>>             Oscar-users@lists.sourceforge.net
>>>>             <mailto:Oscar-users@lists.sourceforge.net>
>>>>             https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>>
>>>>
>>>>
>>>>         
>>>> ------------------------------------------------------------------------
>>>>
>>>>         
>>>> -------------------------------------------------------------------------
>>>>         This SF.net email is sponsored by: Splunk Inc.
>>>>         Still grepping through log files to find problems?  Stop.
>>>>
>>>>         Now Search log events and configuration files using AJAX and a 
>>>> browser.
>>>>
>>>>         Download your FREE copy of Splunk now >> 
>>>>         http://get.splunk.com/
>>>>         
>>>> ------------------------------------------------------------------------
>>>>
>>>>         _______________________________________________
>>>>         Oscar-users mailing list
>>>>
>>>>         Oscar-users@lists.sourceforge.net 
>>>> <mailto:Oscar-users@lists.sourceforge.net>
>>>>         https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>>          <https://lists.sourceforge.net/lists/listinfo/oscar-users>
>>>>           
>>>
>>>
>>>         
>>> -------------------------------------------------------------------------
>>>         This SF.net email is sponsored by: Splunk Inc.
>>>         Still grepping through log files to find problems?  Stop.
>>>         Now Search log events and configuration files using AJAX and
>>>         a browser.
>>>         Download your FREE copy of Splunk now >> http://get.splunk.com/
>>>         _______________________________________________
>>>         Oscar-users mailing list
>>>         Oscar-users@lists.sourceforge.net
>>>         <mailto:Oscar-users@lists.sourceforge.net>
>>>         https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>
>>>
>>>     ------------------------------------------------------------------------
>>>
>>>     
>>> -------------------------------------------------------------------------
>>>     This SF.net email is sponsored by: Splunk Inc.
>>>     Still grepping through log files to find problems?  Stop.
>>>     Now Search log events and configuration files using AJAX and a browser.
>>>
>>>     Download your FREE copy of Splunk now >> http://get.splunk.com/
>>>     ------------------------------------------------------------------------
>>>
>>>     _______________________________________________
>>>     Oscar-users mailing list
>>>     Oscar-users@lists.sourceforge.net
>>>      <mailto:Oscar-users@lists.sourceforge.net>
>>>     https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>       
>>
>>     -- 
>>
>>     Robert W. Ashcraft
>>
>>     Ph.D. Candidate
>>
>>     Dept. Chemical Engineering
>>
>>     Massachusetts Institute of Technology
>>
>>     77 Massachusetts Ave.
>>
>>     Room 66-264
>>
>>     Cambridge, MA 02139
>>
>>     Phone: 617-253-6554
>>
>>     E-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>
>>
>>     -------------------------------------------------------------------------
>>     This SF.net email is sponsored by: Splunk Inc.
>>     Still grepping through log files to find problems?  Stop.
>>     Now Search log events and configuration files using AJAX and a
>>     browser.
>>     Download your FREE copy of Splunk now >> http://get.splunk.com/
>>     _______________________________________________
>>     Oscar-users mailing list
>>     Oscar-users@lists.sourceforge.net
>>     <mailto:Oscar-users@lists.sourceforge.net>
>>     https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >> http://get.splunk.com/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Oscar-users mailing list
>> Oscar-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>   
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to