Sergey Shelukhin created SLIDER-1236: ----------------------------------------
Summary: 10 second sleep before installation Key: SLIDER-1236 URL: https://issues.apache.org/jira/browse/SLIDER-1236 Project: Slider Issue Type: Bug Reporter: Sergey Shelukhin Noticed when starting LLAP on a 2-node cluster. Slider AM logs: {noformat} 2017-05-22 22:04:33,047 [956937652@qtp-624693846-4] INFO agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null} ... 2017-05-22 22:04:34,946 [956937652@qtp-624693846-4] INFO agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null} {noformat} Then nothing useful goes on for a while, until: {noformat} 2017-05-22 22:04:43,099 [956937652@qtp-624693846-4] INFO agent.AgentProviderService - Installing LLAP on container_1495490227300_0002_01_000002. {noformat} If you look at the corresponding logs from both agents, you can see that they both have a gap that's pretty much exactly 10sec. After the gap, they talk back to AM; after ~30ms for each container (corresponding to the end of its gap), presumably after hearing from it, the AM starts installing LLAP. {noformat} INFO 2017-05-22 22:04:33,055 Controller.py:180 - Registered with the server with {u'exitstatus': 0, INFO 2017-05-22 22:04:33,055 Controller.py:630 - Response from server = OK INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []} INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490683064. Command(s) in progress: False. Components mapped: False INFO 2017-05-22 22:04:34,948 Controller.py:180 - Registered with the server with {u'exitstatus': 0, INFO 2017-05-22 22:04:34,948 Controller.py:630 - Response from server = OK INFO 2017-05-22 22:04:44,959 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []} INFO 2017-05-22 22:04:44,960 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490684959. Command(s) in progress: False. Components mapped: False {noformat} I've observed the same on multiple different clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)