[ https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lev Bronshtein updated SLIDER-1259: ----------------------------------- Description: In an an environment where Hadoop Worker nodes bind the Node Manager to an interface with a hostname different from the one returned by socket.getfqdn() for example in our test environment a difference between f-bcpc-vm3 and just bcpc-vm3, which is the hostname bound to the management interface, but not the interface for hadoop/production traffic. This results in our inability to introspect running jobs. For example running *slider registry --name slider_poc --listexp* results in the following output in the ResourceManager logs {quote}2018-01-26 17:30:32,147 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is accessing unchecked [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which is the app master GUI of application_1516910361403_0094 owned by ubuntu 2018-01-26 17:31:13,639 WARN org.mortbay.log: /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: java.net.ConnectException: Connection timed out (Connection timed out) {quote} Note how the redirect is to [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] where as it should have been to [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.] Renaming the host to f-bcpc-vm3 results in appropriate behavior. perhaps *hostname.py* can be instructed to look at one of before registering *yarn.nodemanager.address* *yarn.nodemanager.bind-host* *yarn.nodemanager.hostname* When called in Register.py register = {'responseId': int(id), 'timestamp': timestamp, 'label': self.config.getLabel(), *'publicHostname': hostname.public_hostname(),* 'agentVersion': version, 'actualState': actualState, 'expectedState': expectedState, 'allocatedPorts': allocated_ports, 'logFolders': log_folders, 'tags': tags } was: In an an environment where Hadoop Worker nodes bind the Node Manager to an interface with a hostname different from the one returned by socket.getfqdn() for example in our test environment a difference between f-bcpc-vm3 and just bcpc-vm3, which is the hostname bound to the management interface, but not the interface for hadoop/production traffic. This results in our inability to introspect running jobs. For example running *slider registry --name slider_poc --listexp* results in the following output in the ResourceManager logs {quote}2018-01-26 17:30:32,147 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is accessing unchecked [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which is the app master GUI of application_1516910361403_0094 owned by ubuntu 2018-01-26 17:31:13,639 WARN org.mortbay.log: /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: java.net.ConnectException: Connection timed out (Connection timed out) {quote} Note how the redirect is to [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] where as it should have been to [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.] Renaming the host to f-bcpc-vm3 results in appropriate behavior. perhaps *hostname.py* can be instructed to look at one of before registering *yarn.nodemanager.address* *yarn.nodemanager.bind-host* *yarn.nodemanager.hostname* > Slider does not work well in multi homed environments > ----------------------------------------------------- > > Key: SLIDER-1259 > URL: https://issues.apache.org/jira/browse/SLIDER-1259 > Project: Slider > Issue Type: Bug > Components: agent > Affects Versions: Slider 0.92 > Reporter: Lev Bronshtein > Priority: Minor > > In an an environment where Hadoop Worker nodes bind the Node Manager to an > interface with a hostname different from the one returned by socket.getfqdn() > for example in our test environment a difference between f-bcpc-vm3 and just > bcpc-vm3, which is the hostname bound to the management interface, but not > the interface for hadoop/production traffic. This results in our inability > to introspect running jobs. > > For example running *slider registry --name slider_poc --listexp* results in > the following output in the ResourceManager logs > {quote}2018-01-26 17:30:32,147 INFO > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is > accessing unchecked > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which > is the app master GUI of application_1516910361403_0094 owned by ubuntu > 2018-01-26 17:31:13,639 WARN org.mortbay.log: > /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: > java.net.ConnectException: Connection timed out (Connection timed out) > {quote} > > Note how the redirect is to > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] > where as it should have been to > [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.] > Renaming the host to f-bcpc-vm3 results in appropriate behavior. > > perhaps *hostname.py* can be instructed to look at one of before registering > *yarn.nodemanager.address* > *yarn.nodemanager.bind-host* > *yarn.nodemanager.hostname* > > When called in Register.py > register = {'responseId': int(id), > 'timestamp': timestamp, > 'label': self.config.getLabel(), > *'publicHostname': hostname.public_hostname(),* > 'agentVersion': version, > 'actualState': actualState, > 'expectedState': expectedState, > 'allocatedPorts': allocated_ports, > 'logFolders': log_folders, > 'tags': tags > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)