Andrew Robertson created AMBARI-12995:
-----------------------------------------

             Summary: Ambari alerts reports "UNKNOWN" error for secondary YARN 
RM and NM in a kerberoized YARN HA deployment
                 Key: AMBARI-12995
                 URL: https://issues.apache.org/jira/browse/AMBARI-12995
             Project: Ambari
          Issue Type: Bug
          Components: alerts
    Affects Versions: 2.1.1
         Environment: Requires YARN HA with Kerberos
            Reporter: Andrew Robertson
             Fix For: 2.1.2


What is observed:

On my currently active YARN NodeManager and ResourceManager, Ambari
alerts are fine.

On the secondary YARN NodeManager and ResourceManager, Ambari reports
"Status: Unknown" / "HTTP 200 response (metrics unavailable)".  This
is for the alerts:
 - NodeManager Health Summary
 - ResourceManager CPU Utilization
 - ResourceManager RPC Latency

The Ambari web interface does not make this error obvious, as it says
"0 alerts" in the top bar. But you can see the alerts with "unknown"
status when you go to the ambari alerts page, or if you query the
alerts API.

What is expected:
Ambari alerts does not generate any alarms on a secondary YARN HA node as long 
as the node is responsive.


---
A network dump of the ambari poll against the secondary RM looks like:

Request:
"""
GET /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo HTTP/1.1
...
"""

Response:
"""
HTTP/1.1 200 OK
...
Refresh: 3; url=http://{my-primary-rm}:8088/jmx
Content-Length: 106
Server: Jetty(6.1.26.hwx)

This is standby RM. Redirecting to the current active RM:
http://{my-primary-rm}:8088/jmx
"""

--
I'm also filing a JIRA against YARN (per request from jhurley) and will post 
that info here.
--

Comment from Jonathan Hurley [email protected]:

This is caused by how YARN does HA mode. With two YARN RMs, the standby RM 
returns a 200 response with a JavaScript redirect instead of an 3xx 
redirection. When not using Kerberos, Ambari should be able to parse the 
headers and follow the JS-based redirect. However, on a Kerberized cluster, we 
use curl which cannot do this. Therefore, requests against the secondary RM 
will return an UNKNOWN response since it did get a 200. I think a few things 
can be improved here:

1) There should be a ticket filed for YARN to have their HA mode use a proper 
redirect
2) Ambari might not want to produce an UNKNOWN response here since it gives a 
false feeling that something went wrong.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to