Andrew Robertson created AMBARI-12995:
-----------------------------------------
Summary: Ambari alerts reports "UNKNOWN" error for secondary YARN
RM and NM in a kerberoized YARN HA deployment
Key: AMBARI-12995
URL: https://issues.apache.org/jira/browse/AMBARI-12995
Project: Ambari
Issue Type: Bug
Components: alerts
Affects Versions: 2.1.1
Environment: Requires YARN HA with Kerberos
Reporter: Andrew Robertson
Fix For: 2.1.2
What is observed:
On my currently active YARN NodeManager and ResourceManager, Ambari
alerts are fine.
On the secondary YARN NodeManager and ResourceManager, Ambari reports
"Status: Unknown" / "HTTP 200 response (metrics unavailable)". This
is for the alerts:
- NodeManager Health Summary
- ResourceManager CPU Utilization
- ResourceManager RPC Latency
The Ambari web interface does not make this error obvious, as it says
"0 alerts" in the top bar. But you can see the alerts with "unknown"
status when you go to the ambari alerts page, or if you query the
alerts API.
What is expected:
Ambari alerts does not generate any alarms on a secondary YARN HA node as long
as the node is responsive.
---
A network dump of the ambari poll against the secondary RM looks like:
Request:
"""
GET /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo HTTP/1.1
...
"""
Response:
"""
HTTP/1.1 200 OK
...
Refresh: 3; url=http://{my-primary-rm}:8088/jmx
Content-Length: 106
Server: Jetty(6.1.26.hwx)
This is standby RM. Redirecting to the current active RM:
http://{my-primary-rm}:8088/jmx
"""
--
I'm also filing a JIRA against YARN (per request from jhurley) and will post
that info here.
--
Comment from Jonathan Hurley [email protected]:
This is caused by how YARN does HA mode. With two YARN RMs, the standby RM
returns a 200 response with a JavaScript redirect instead of an 3xx
redirection. When not using Kerberos, Ambari should be able to parse the
headers and follow the JS-based redirect. However, on a Kerberized cluster, we
use curl which cannot do this. Therefore, requests against the secondary RM
will return an UNKNOWN response since it did get a 200. I think a few things
can be improved here:
1) There should be a ticket filed for YARN to have their HA mode use a proper
redirect
2) Ambari might not want to produce an UNKNOWN response here since it gives a
false feeling that something went wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)