[ https://issues.apache.org/jira/browse/HADOOP-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HADOOP-8191: -------------------------------- Resolution: Fixed Fix Version/s: 0.23.3 0.24.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 23 and trunk. Thanks for reporting this, Philip. > SshFenceByTcpPort uses netcat incorrectly > ----------------------------------------- > > Key: HADOOP-8191 > URL: https://issues.apache.org/jira/browse/HADOOP-8191 > Project: Hadoop Common > Issue Type: Bug > Components: ha > Affects Versions: 0.23.3 > Reporter: Philip Zeyliger > Assignee: Todd Lipcon > Fix For: 0.24.0, 0.23.3 > > Attachments: hdfs-3081.txt > > > SshFencyByTcpPort currently assumes that the NN is listening on localhost. > Typical setups have the namenode listening just on the hostname of the > namenode, which would lead "nc -z" to not catch it. > Here's an example in which the NN is running, listening on 8020, but doesn't > respond to "localhost 8020". > {noformat} > [root@xxx ~]# lsof -P -p 5286 | grep -i listen > java 5286 root 110u IPv4 1772357 TCP xxx:8020 > (LISTEN) > java 5286 root 121u IPv4 1772397 TCP xxx:50070 > (LISTEN) > [root@xxx ~]# nc -z localhost 8020 > [root@xxx ~]# nc -z xxx 8020 > Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded! > {noformat} > Here's the likely offending code: > {code} > LOG.info( > "Indeterminate response from trying to kill service. " + > "Verifying whether it is running using nc..."); > rc = execCommand(session, "nc -z localhost 8020"); > {code} > Naively, we could rely on netcat to the correct hostname (since the NN ought > to be listening on the hostname it's configured as), or just to use fuser. > Fuser catches ports independently of what IPs they're bound to: > {noformat} > [root@xxx ~]# fuser 1234/tcp > 1234/tcp: 6766 6768 > [root@xxx ~]# jobs > [1]- Running nc -l localhost 1234 & > [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 & > [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 > nc 6766 root 3u IPv4 2563626 > TCP localhost:1234 (LISTEN) > nc 6768 root 3u IPv4 2563671 > TCP xxx:1234 (LISTEN) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira