Hi Robert, One minor note is that you have to check to see if any normal user can access from server to client nodes and vice versa but not a root user.
Regards, - DongInn Robert Ashcraft wrote: > Thanks again for your help. > > We have tried a number of things but cannot seem to find what the > problem is. I will list a few of the things we have tried and some > other information to see if it helps point to anything... > > Problem: > Cannot SSH from the compute back to the head node, which causes several > problems. > > General Stuff: > - The server can SSH into the compute node with no problem > - We can ping the server from the compute node > - The node mounts /home via NFS fine, and /home is accessible on the > compute node > - pbsnodes -a displays node information correctly about the node and > resources available > - You can see submitted jobs using qstat > - When you submit a job (either the test_cluster script or a user script > with qsub), it will go to the node, run, and then return to the "Q" > state after it runs. I assume that normally, the node will communicate > with the head node telling that it is done, then the head node will send > a command to delete the job. No output or error files are generated > where the script was run. The jobs can be manually deleted using qdel. > I assume this is caused by the failure of node->server SSH. > > > What we have tried: > - The stuff mentioned thus far in this thread > - Running the start_over script and reinstalling OSCAR with only one > active ethernet connection set to the internal network. The result is > the same. > - Changing the internal IP and hostname of the head node > - checked the sshd_config and hosts files > - other things I can't remember at the moment. > > > Does anyone have any other ideas why the SSH may only be one way in > nature??? This is terribly frustrating and it seems like no one has had > a similar problem when trying to set up a cluster. I really am at a > loss what could be the cause. > > Is there any reason why a driver problem would allow pings and NFS but > only allow a one-way SSH connection? > > Hardware information: > Server: Q6600 on an eVGA nVidia motherboard (nVidia ethernet and SCSI > controller) > Nodes: Dual E5345 on an ASUS Intel motherboard (Intel ethernet and ahci > SCSI controller) > (is there any reason why the hardware differences could cause such a > problem... I already put a modprobe.conf file from a compute node > installed with linux in the image file prior to PXE boot, otherwise it > would kernel panic on reboot after imaging) > > Thanks again for any help you can provide... > > Sincerely, > > Rob > > > > Michael Edwards wrote: >> Okay, so looking at the original oscarinstall.log and reading your >> original message again, two things jump out at me. The first is that >> in your /etc/hosts file you have one hostname mapping to two different >> IP addresses. This may cause confusion. >> >> The other thing I notice is that OSCAR isn't seeing the hostname in >> your /etc/hosts file at all, but a very long one instead that looks >> like a DHCP assigned one. >> >> Take a look at my suggestions here ( >> http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipTwoNetworkInterfaces) >> and see if they make any sense. >> >> Not sure what the issue with the new IP is but it seems like there was >> some conflict with the old one since ssh is now at least trying to >> connect. >> >> On 10/26/07, *Robert Ashcraft* <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> So, I changed the internal IP to 192.168.1.10 >> <http://192.168.1.10> to prevent any would-be conflicts, ran the >> start_over script, and went through the usual setup process. >> >> Basically the same thing happened, but I got the iptable and >> verbose SSH output. Maybe you can help make a little sense out of >> it. >> >> Here is the iptable output: >> ===== >> >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> >> Chain OUTPUT (policy ACCEPT) >> >> target prot opt source destination >> >> ===== >> >> Here is the SSH -vvv command when run from oscarnode01 trying to >> get back into the head node: >> >> ===== >> >> [EMAIL PROTECTED] ~]# ssh -vvv 192.168.1.10 <http://192.168.1.10> >> OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003 >> debug1: Reading configuration data /etc/ssh/ssh_config >> debug1: Applying options for * >> debug2: ssh_connect: needpriv 0 >> debug1: Connecting to 192.168.1.10 <http://192.168.1.10> [ >> 192.168.1.10 <http://192.168.1.10>] port 22. >> debug1: Connection established. >> debug1: permanently_set_uid: 0/0 >> debug1: identity file //root//.ssh/identity type 0 >> debug3: Not a RSA1 key file / >> /root//.ssh/id_rsa. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_rsa type 1 >> debug3: Not a RSA1 key file //root//.ssh/id_dsa. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_dsa type 2 >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_rsa type 1 >> debug3: Not a RSA1 key file //root//.ssh/id_dsa. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_dsa type 2 >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_rsa type 1 >> debug3: Not a RSA1 key file //root//.ssh/id_dsa. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_dsa type 2 >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_rsa type 1 >> debug3: Not a RSA1 key file //root//.ssh/id_dsa. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> debug1: identity file //root//.ssh/id_dsa type 2 >> >> ===== >> >> Thanks again for your help. >> >> Rob >> >> >> >> Michael Edwards wrote: >>> OSCAR doesn't need a gateway on the head node to work. One way >>> communication generally implies there is a firewall on the head >>> node or other routing problem. >>> >>> What do you get from "iptables -L" on the head node? >>> >>> You might try using a different address for the head node than >>> 192.168.0.1 <http://192.168.0.1>, that is a common default >>> address for networking hardware and can cause problems like this >>> occasionally. I have become fond of 10.0.0.x because it isn't >>> used as much. >>> >>> You could also change the switch address too, if that is the problem. >>> >>> On 10/26/07, *Robert Ashcraft* < [EMAIL PROTECTED] >>> <mailto:[EMAIL PROTECTED]>> wrote: >>> >>> Michael, >>> >>> Thanks for the response. My colleague has tried those things >>> and they did not seems to help. The "ssh -vvv" command does >>> not provide any output and presumably just hangs somewhere in >>> the connection process. >>> >>> Just as some information... If we set up the the compute >>> node to connect to DCHP over the external MIT network (not >>> through the switch), I was able to get two way communication >>> (through the MIT network). This seems to imply that it is >>> some wrong with the static IP setup or something related to >>> the switch. However, the one-way communication is puzzling. >>> I don't think we have a gateway specified for the head node >>> internal IP address, only the IP (192.168.0.1 >>> <http://192.168.0.1>) and subnet mask (255.255.255.0 >>> <http://255.255.255.0>). Could that be the source of any >>> problems? >>> >>> We will continue to try to diagnose the problem, but any more >>> insight would be welcomed. Thanks, >>> >>> Rob >>> >>> >>> Michael Edwards wrote: >>>> Do you have the firewall on the head node turned off? >>>> >>>> You can check by doing "iptables -L" or checking under the >>>> "security level" utility. >>>> >>>> You can also try doing "ssh -vvv [EMAIL PROTECTED] " and see if >>>> it gives you any clues. >>>> >>>> On 10/25/07, *Robert Wilson Ashcraft* <[EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]>> wrote: >>>> >>>> Hi, >>>> >>>> I am attempting to set up an OSCAR cluster. I have >>>> gotten through everything >>>> past step 7, Complete CLuster Setup (which finished >>>> successfully). >>>> >>>> However, when I run the cluster tests, I get several >>>> failures, most noticibly >>>> with the node--> server communication. >>>> >>>> This is also confirmed by the fact that I can SSH to a >>>> node, but when I am >>>> logged into the node, I cannot SSH back into the server >>>> (it just hangs... no >>>> error message, but I can ctrl-C out of it) >>>> >>>> Do you have any idea why the SSH from the client to >>>> server would not be working? >>>> >>>> I have a feeling that if this problem is solved, the >>>> other failed test will work >>>> themselves out. >>>> >>>> I am attaching the oscarinstall.log file in case that helps. >>>> >>>> Here is my /etc/hosts file if that helps: >>>> # Do not remove the following line, or various programs >>>> # that require network functionality will fail. >>>> 127.0.0.1 <http://127.0.0.1> >>>> localhost.localdomain localhost >>>> 192.168.0.1 <http://192.168.0.1> pharos.mit.edu >>>> <http://pharos.mit.edu> pharos oscar_server nfs_oscar >>>> pbs_oscar >>>> 18.80.7.242 <http://18.80.7.242> pharos.mit.edu >>>> <http://pharos.mit.edu> pharos >>>> >>>> # These entries are managed by SIS, please don't modify >>>> them. >>>> 192.168.0.100 <http://192.168.0.100> >>>> oscarnode01.mit.edu >>>> <http://oscarnode01.mit.edu> oscarnode01 >>>> >>>> >>>> Thanks for your help. >>>> >>>> Rob Ashcraft >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> >>>> This SF.net email is sponsored by: Splunk Inc. >>>> Still grepping through log files to find problems? Stop. >>>> Now Search log events and configuration files using AJAX >>>> and a browser. >>>> Download your FREE copy of Splunk now >> >>>> http://get.splunk.com/ >>>> _______________________________________________ >>>> Oscar-users mailing list >>>> Oscar-users@lists.sourceforge.net >>>> <mailto:Oscar-users@lists.sourceforge.net> >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by: Splunk Inc. >>>> Still grepping through log files to find problems? Stop. >>>> >>>> Now Search log events and configuration files using AJAX and a >>>> browser. >>>> >>>> Download your FREE copy of Splunk now >> >>>> http://get.splunk.com/ >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Oscar-users mailing list >>>> >>>> Oscar-users@lists.sourceforge.net >>>> <mailto:Oscar-users@lists.sourceforge.net> >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>>> <https://lists.sourceforge.net/lists/listinfo/oscar-users> >>>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Splunk Inc. >>> Still grepping through log files to find problems? Stop. >>> Now Search log events and configuration files using AJAX and >>> a browser. >>> Download your FREE copy of Splunk now >> http://get.splunk.com/ >>> _______________________________________________ >>> Oscar-users mailing list >>> Oscar-users@lists.sourceforge.net >>> <mailto:Oscar-users@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Splunk Inc. >>> Still grepping through log files to find problems? Stop. >>> Now Search log events and configuration files using AJAX and a browser. >>> >>> Download your FREE copy of Splunk now >> http://get.splunk.com/ >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Oscar-users mailing list >>> Oscar-users@lists.sourceforge.net >>> <mailto:Oscar-users@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>> >> >> -- >> >> Robert W. Ashcraft >> >> Ph.D. Candidate >> >> Dept. Chemical Engineering >> >> Massachusetts Institute of Technology >> >> 77 Massachusetts Ave. >> >> Room 66-264 >> >> Cambridge, MA 02139 >> >> Phone: 617-253-6554 >> >> E-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a >> browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Oscar-users mailing list >> Oscar-users@lists.sourceforge.net >> <mailto:Oscar-users@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/oscar-users >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Oscar-users mailing list >> Oscar-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/oscar-users >> > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users