[ 
https://issues.apache.org/jira/browse/AMBARI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507459#comment-13507459
 ] 

Tom Beerbower commented on AMBARI-1044:
---------------------------------------

I don't see any related exceptions in the server log which means that either 
its not attempting to get the metrics for this host or they are just not being 
set on the host resource.

I think that I see what is happening. One of the arguments that can be 
specified for the rrd query is the Ganglia cluster (HDPHBaseMaster, 
HDPJobTracker, HDPNameNode or HDPSlaves). The question is, for a host level 
query which Ganglia cluster should we specify?

Its hard to say since a host isn't necessarily with any of the services related 
to those clusters... or maybe more than one. It turns out it doesn't really 
matter. In this case I can see the system level rrd files that we use for host 
level metrics for ip-10-224-42-108.ec2.internal under any of the Ganglia 
cluster folders. For example ...
{code}
[root@ip-10-40-91-121 rrds]# ls ./HDPHBaseMaster/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  
cpu_wio.rrd    disk_total.rrd    load_five.rrd  mem_buffers.rrd  mem_free.rrd   
 mem_total.rrd      pkts_in.rrd   proc_run.rrd    swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    
disk_free.rrd  load_fifteen.rrd  load_one.rrd   mem_cached.rrd   mem_shared.rrd 
 part_max_used.rrd  pkts_out.rrd  proc_total.rrd  swap_total.rrd

...

[root@ip-10-40-91-121 rrds]# ls HDPNameNode/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  
cpu_wio.rrd    disk_total.rrd    load_five.rrd  mem_buffers.rrd  mem_free.rrd   
 mem_total.rrd      pkts_in.rrd   proc_run.rrd    swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    
disk_free.rrd  load_fifteen.rrd  load_one.rrd   mem_cached.rrd   mem_shared.rrd 
 part_max_used.rrd  pkts_out.rrd  proc_total.rrd  swap_total.rrd
{code}
The approach that I've been using is to look through the host components for 
the host that we are interested in and try to map one of its component names 
back to a Ganglia cluster. In this case it looks like the host with the missing 
metrics is not associated with any component that would map back given the 
mapping method that I am using.

Given what I am currently seeing with the system level metrics, I think that it 
would be safe to simply use HDPSlaves as the Ganglia cluster for host level 
queries.
                
> API is not returning Ganglia metrics for one of the hosts in the cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-1044
>                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
>             Project: Ambari
>          Issue Type: Sub-task
>            Reporter: Tom Beerbower
>            Assignee: Tom Beerbower
>
> A cluster was deployed with 4 hosts, with Ambari Server running on a 
> different host.
> Host graphs are showing for 3 of the hosts.
> For one of the hosts, API is not returning any temporal data.
> Ganglia is showing host-level metrics.
> UI: 
> http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
> Ganglia UI: 
> http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=
> API response:
> {
> "href" : 
> "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15";,
> "Hosts" :
> { "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }
> }
> We need to understand the root cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to