[jira] [Updated] (AMBARI-11821) With HBase master HA Ambari sometimes displays incorrect dashboard information

Jaimin D Jetly (JIRA) Tue, 09 Jun 2015 13:32:52 -0700

     [ 
https://issues.apache.org/jira/browse/AMBARI-11821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jaimin D Jetly updated AMBARI-11821:
------------------------------------
    Description: 
PROBLEM:  When there is more than one HBase Master certain metrics, in 
particular the "Average Load" on the dashboard, are incorrect.  In the case of 
"Average Load", the load will read "0".  After checking I noticed that the web 
UI is hitting the following URL to refresh the metrics:

{code}
http://ADDRESS_OF_AMBARI_SERVER:8080/api/v1/clusters/CLUSTER_NAME/components/?ServiceComponentInfo/component_name=APP_TIMELINE_SERVER|ServiceComponentInfo/category=MASTER&fields=ServiceComponentInfo/Version,ServiceComponentInfo/StartTime,ServiceComponentInfo/HeapMemoryUsed,ServiceComponentInfo/HeapMemoryMax,ServiceComponentInfo/service_name,host_components/HostRoles/host_name,host_components/HostRoles/state,host_components/HostRoles/maintenance_state,host_components/HostRoles/stale_configs,host_components/HostRoles/ha_state,host_components/HostRoles/desired_admin_state,host_components/metrics/jvm/memHeapUsedM,host_components/metrics/jvm/HeapMemoryMax,host_components/metrics/jvm/HeapMemoryUsed,host_components/metrics/jvm/memHeapCommittedM,host_components/metrics/mapred/jobtracker/trackers_decommissioned,host_components/metrics/cpu/cpu_wio,host_components/metrics/rpc/RpcQueueTime_avg_time,host_components/metrics/dfs/FSNamesystem/*,host_components/metrics/dfs/namenode/Version,host_components/metrics/dfs/namenode/LiveNodes,host_components/metrics/dfs/namenode/DeadNodes,host_components/metrics/dfs/namenode/DecomNodes,host_components/metrics/dfs/namenode/TotalFiles,host_components/metrics/dfs/namenode/UpgradeFinalized,host_components/metrics/dfs/namenode/Safemode,host_components/metrics/runtime/StartTime,host_components/metrics/hbase/master/IsActiveMaster,ServiceComponentInfo/MasterStartTime,ServiceComponentInfo/MasterActiveTime,ServiceComponentInfo/AverageLoad,ServiceComponentInfo/Revision,ServiceComponentInfo/RegionsInTransition,metrics/api/v1/cluster/summary,metrics/api/v1/topology/summary,host_components/metrics/yarn/Queue,ServiceComponentInfo/rm_metrics/cluster/activeNMcount,ServiceComponentInfo/rm_metrics/cluster/lostNMcount,ServiceComponentInfo/rm_metrics/cluster/unhealthyNMcount,ServiceComponentInfo/rm_metrics/cluster/rebootedNMcount,ServiceComponentInfo/rm_metrics/cluster/decommissionedNMcount&minimal_response=true
{code}

The results that come back seem to sometimes be for the wrong server.  
Specifically this stuff:

{code}
      "ServiceComponentInfo" : {
        "AverageLoad" : 0.0,
        "HeapMemoryMax" : 2075918336,
        "HeapMemoryUsed" : 541616216,
        "MasterActiveTime" : 0,
        "MasterStartTime" : 1432752607527,
        "component_name" : "HBASE_MASTER",
        "service_name" : "HBASE"
      },
{code}

I'm attaching a file with the output of two different clusters at the customer 
site.  In both cases, the average load was not 0, but it shows up that way in 
the JSON.  Also notice that for one of the clusters the IsActiveMaster is 
false, and one is true.  It seems like there is a disconnect in what comes back 
in that query URL.

I reproduced this on a local cluster as follows (HDP 2.2.4 with Ambari 2.0.0):

1.  Started hbase service with 2 masters.  I observed that Average Load was 
displaying the right (non-zero) value.
2.  I then restarted the current active master.  This shifted active to the 
other one.  After a minute or so the Average Load went to 0.
3.  In one instance for some reason the problem did not happen right away.  I 
bounced ambari-server and then the problem happened.  I could restore the 
reading by shifting active back to the other master.

> With HBase master HA Ambari sometimes displays incorrect dashboard information
> ------------------------------------------------------------------------------
>
>                 Key: AMBARI-11821
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11821
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-web
>    Affects Versions: 2.1.0
>            Reporter: Jaimin D Jetly
>            Assignee: Jaimin D Jetly
>            Priority: Critical
>             Fix For: 2.1.0
>
>
> PROBLEM:  When there is more than one HBase Master certain metrics, in 
> particular the "Average Load" on the dashboard, are incorrect.  In the case 
> of "Average Load", the load will read "0".  After checking I noticed that the 
> web UI is hitting the following URL to refresh the metrics:
> {code}
> http://ADDRESS_OF_AMBARI_SERVER:8080/api/v1/clusters/CLUSTER_NAME/components/?ServiceComponentInfo/component_name=APP_TIMELINE_SERVER|ServiceComponentInfo/category=MASTER&fields=ServiceComponentInfo/Version,ServiceComponentInfo/StartTime,ServiceComponentInfo/HeapMemoryUsed,ServiceComponentInfo/HeapMemoryMax,ServiceComponentInfo/service_name,host_components/HostRoles/host_name,host_components/HostRoles/state,host_components/HostRoles/maintenance_state,host_components/HostRoles/stale_configs,host_components/HostRoles/ha_state,host_components/HostRoles/desired_admin_state,host_components/metrics/jvm/memHeapUsedM,host_components/metrics/jvm/HeapMemoryMax,host_components/metrics/jvm/HeapMemoryUsed,host_components/metrics/jvm/memHeapCommittedM,host_components/metrics/mapred/jobtracker/trackers_decommissioned,host_components/metrics/cpu/cpu_wio,host_components/metrics/rpc/RpcQueueTime_avg_time,host_components/metrics/dfs/FSNamesystem/*,host_components/metrics/dfs/namenode/Version,host_components/metrics/dfs/namenode/LiveNodes,host_components/metrics/dfs/namenode/DeadNodes,host_components/metrics/dfs/namenode/DecomNodes,host_components/metrics/dfs/namenode/TotalFiles,host_components/metrics/dfs/namenode/UpgradeFinalized,host_components/metrics/dfs/namenode/Safemode,host_components/metrics/runtime/StartTime,host_components/metrics/hbase/master/IsActiveMaster,ServiceComponentInfo/MasterStartTime,ServiceComponentInfo/MasterActiveTime,ServiceComponentInfo/AverageLoad,ServiceComponentInfo/Revision,ServiceComponentInfo/RegionsInTransition,metrics/api/v1/cluster/summary,metrics/api/v1/topology/summary,host_components/metrics/yarn/Queue,ServiceComponentInfo/rm_metrics/cluster/activeNMcount,ServiceComponentInfo/rm_metrics/cluster/lostNMcount,ServiceComponentInfo/rm_metrics/cluster/unhealthyNMcount,ServiceComponentInfo/rm_metrics/cluster/rebootedNMcount,ServiceComponentInfo/rm_metrics/cluster/decommissionedNMcount&minimal_response=true
> {code}
> The results that come back seem to sometimes be for the wrong server.  
> Specifically this stuff:
> {code}
>       "ServiceComponentInfo" : {
>         "AverageLoad" : 0.0,
>         "HeapMemoryMax" : 2075918336,
>         "HeapMemoryUsed" : 541616216,
>         "MasterActiveTime" : 0,
>         "MasterStartTime" : 1432752607527,
>         "component_name" : "HBASE_MASTER",
>         "service_name" : "HBASE"
>       },
> {code}
> I'm attaching a file with the output of two different clusters at the 
> customer site.  In both cases, the average load was not 0, but it shows up 
> that way in the JSON.  Also notice that for one of the clusters the 
> IsActiveMaster is false, and one is true.  It seems like there is a 
> disconnect in what comes back in that query URL.
> I reproduced this on a local cluster as follows (HDP 2.2.4 with Ambari 2.0.0):
> 1.  Started hbase service with 2 masters.  I observed that Average Load was 
> displaying the right (non-zero) value.
> 2.  I then restarted the current active master.  This shifted active to the 
> other one.  After a minute or so the Average Load went to 0.
> 3.  In one instance for some reason the problem did not happen right away.  I 
> bounced ambari-server and then the problem happened.  I could restore the 
> reading by shifting active back to the other master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AMBARI-11821) With HBase master HA Ambari sometimes displays incorrect dashboard information

Reply via email to