I think that is the problem. The original hosts were on a private net and
registered as:
"href" : "http://localhost:8080/api/v1/hosts/pc1",
"Hosts" : {
"host_name" : "pc1"
}
but after the change, they identify themselves with the FQN:
"href" : "http://localhost:8080/api/v1/hosts/pc1.foo.net",
"Hosts" : {
"host_name" : "pc1.foo.net"
}
Is there some way to fix this?
TIA
Brian
On Jul 15, 2013, at 12:13 PM, Sumit Mohanty wrote:
> Is it possible that the FQDN/hostname of the agent hosts have changed?
> E.g. Agents initially registered themselves as host A (you can get that
> using API server:8080/api/v1/clusters/<cluster name>/hosts) and after the
> network configuration the agents started sending as their heartbeat as B
> (server:8080/api/v1/hosts will tell you about the hosts that have
> registered)
>
> -Sumit
>
> On 7/15/13 8:47 AM, "Brian Jeltema" <[email protected]> wrote:
>
>> I had to do some network reconfiguration on our cluster. After rebooting
>> everything and restarting
>> the ambari server and the ambari agents, the server reports (via the UI)
>> that it is not receiving heartbeats.
>> However, when I look at the server and agent logs, I see heartbeat
>> activity:
>>
>> agent:
>> INFO 2013-07-15 11:40:12,169 Heartbeat.py:61 - Sending heartbeat with
>> response id: 251 and timestamp: 1373902812168
>> INFO 2013-07-15 11:40:12,214 Controller.py:176 - No commands sent from
>> the Server.
>>
>> server
>> 11:41:44,760 INFO HeartBeatHandler:108 - Received heartbeat from host,
>> hostname=foo.net, currentResponseId=260, receivedResponseId=260
>> 11:41:44,761 INFO AgentResource:109 - Sending heartbeat response with
>> response id 261
>>
>> (response id's don't match because I didn't try to capture them in
>> unison). I suspect there may be persisted state in the postgres database
>> from the previous network configuration that is causing the problem. Any
>> suggestions for a fix short of a complete redeploy?
>>
>> TIA
>>
>> Brian
>
>