> On March 8, 2017, 3:01 a.m., Sebastian Toader wrote:
> > ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py
> > Lines 152 (patched)
> > <https://reviews.apache.org/r/57410/diff/1/?file=1658685#file1658685line152>
> >
> >     Isn't kinit needed for the NN non-HA case as well?
> >     
> >     e.g. move this call right after
> >     ```
> >     if resolved_principal is not None:
> >             resolved_principal = resolved_principal.replace('_HOST', 
> > host_name)
> >     ```

The _added_ kinit is only needed before the `get_active_namenode` call.  The 
`_get_delegation_token` call uses `curl_krb_request`, which performs a kinit 
itself. 

Unfortunatley, though both calls eventually use `curl` to execute the request, 
each use a different ticket cacache.  
- `curl_krb_request` places the obtained ticket in an alternate ticket cache 
(which is preferred) 
- `get_active_namenode` eventually calls `get_value_from_jmx` which executes 
`curl` assuming the default (user interactive) ticket cache is valid.  

So 2 kinit's will need to be made until a fix is made much deeper in the code. 

This is even more unfortunate since the alert test is triggered every minute.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168252
-----------------------------------------------------------


On March 7, 2017, 11:08 p.m., Robert Levas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
> 
> (Updated March 7, 2017, 11:08 p.m.)
> 
> 
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Balázs Bence 
> Sári, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
> 
> 
> Bugs: AMBARI-20349
>     https://issues.apache.org/jira/browse/AMBARI-20349
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is 
> enabled, PXF Process alert fails with the following errors in the 
> ambari-agent.log file 
> 
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. 
> URL: 
> http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
>  line 41, in get_value_from_jmx
>     data_dict = json.loads(data)
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 
> 307, in loads
>     return _default_decoder.decode(s)
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
> 335, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
> 353, in raw_decode
>     raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l 
> -s /bin/bash -c 'curl --negotiate -u : -s 
> '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"'
>  1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. 
> URL: 
> http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
>  line 41, in get_value_from_jmx
>     data_dict = json.loads(data)
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 
> 307, in loads
>     return _default_decoder.decode(s)
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
> 335, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
> 353, in raw_decode
>     raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
> 
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a 
> JMX call.  This call requires SPNEGO authentication since SPNEGO 
> authentication is turned on for the Hadoop web interfaces. However, a valid 
> Kerberos ticket is not found in the configured user's Kerberos ticket cache. 
> In this case, the configured users is the HDFS user - which technically is 
> not necessary. 
> 
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
>     if CLUSTER_ENV_SECURITY in configurations and 
> configurations[CLUSTER_ENV_SECURITY].lower() == "true":
>       if 'dfs.nameservices' in configurations[HDFS_SITE]:
>         namenode_address = 
> get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), 
> configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
>       else:
>         namenode_address = 
> configurations[HDFS_SITE]['dfs.namenode.http-address']
> 
>       token = _get_delegation_token(namenode_address,
>                                      configurations[HADOOP_ENV_HDFS_USER],
>                                      
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
>                                      
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
>                                      None)
>       commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
> 
> Inside the call at 
> 
> ```
> namenode_address = 
> get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), 
> configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
> 
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket 
> before querying for the active NN. Possibly change the acting user to one 
> executing the PXF component.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py
>  d0ed0a4 
> 
> 
> Diff: https://reviews.apache.org/r/57410/diff/1/
> 
> 
> Testing
> -------
> 
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
> 
> 
> Thanks,
> 
> Robert Levas
> 
>

Reply via email to