[ 
https://issues.apache.org/jira/browse/AMBARI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758146#comment-13758146
 ] 

Tom Beerbower commented on AMBARI-3013:
---------------------------------------

For solution 1, the request can never return faster than the timeout if the 
server is down.  If we make the timeout too small then we risk timing out when 
we shouldn't.

I like solution 2. We should expose the heartbeat status for all hosts so that 
any provider can make the check up front.  I think that we can assume if there 
is no heartbeat then any request to the host will fail.
                
> Powering off RM node increases API latency by a factor of 6
> -----------------------------------------------------------
>
>                 Key: AMBARI-3013
>                 URL: https://issues.apache.org/jira/browse/AMBARI-3013
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.4.0
>            Reporter: Srimanth Gunturi
>            Assignee: Mahadev konar
>              Labels: perfomance
>             Fix For: 1.4.0
>
>         Attachments: Response Time Graph_conn_timeout1000.png, Response Time 
> Graph_conn_timeout5000.png, RMpaused.png
>
>
> On a 4 node cluster I was testing the below API call.
> {noformat}
> /api/v1/clusters/${cluster}/services?fields=components/ServiceComponentInfo,components/host_components,components/host_components/HostRoles,components/host_components/metrics/jvm/memHeapUsedM,components/host_components/metrics/jvm/memHeapCommittedM,components/host_components/metrics/mapred/jobtracker/trackers_decommissioned,components/host_components/metrics/cpu/cpu_wio,components/host_components/metrics/rpc/RpcQueueTime_avg_time,components/host_components/metrics/flume/flume,components/host_components/metrics/yarn/Queue
> {noformat}
> When everything was working the latency was ~500ms. 
> I then powered off the RM node, and immediately the call latency spiked by 30 
> times (~15000ms) . After some time, it reduced, but still was 6 times the 
> original latency (~3000ms). When the machine came back online, the call again 
> fell back to its original ~500ms latency.
> Images attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to