> On March 8, 2017, 3:01 a.m., Sebastian Toader wrote: > > ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py > > Lines 152 (patched) > > <https://reviews.apache.org/r/57410/diff/1/?file=1658685#file1658685line152> > > > > Isn't kinit needed for the NN non-HA case as well? > > > > e.g. move this call right after > > ``` > > if resolved_principal is not None: > > resolved_principal = resolved_principal.replace('_HOST', > > host_name) > > ```
The _added_ kinit is only needed before the `get_active_namenode` call. The `_get_delegation_token` call uses `curl_krb_request`, which performs a kinit itself. Unfortunatley, though both calls eventually use `curl` to execute the request, each use a different ticket cacache. - `curl_krb_request` places the obtained ticket in an alternate ticket cache (which is preferred) - `get_active_namenode` eventually calls `get_value_from_jmx` which executes `curl` assuming the default (user interactive) ticket cache is valid. So 2 kinit's will need to be made until a fix is made much deeper in the code. This is even more unfortunate since the alert test is triggered every minute. - Robert ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57410/#review168252 ----------------------------------------------------------- On March 7, 2017, 11:08 p.m., Robert Levas wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57410/ > ----------------------------------------------------------- > > (Updated March 7, 2017, 11:08 p.m.) > > > Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Balázs Bence > Sári, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader. > > > Bugs: AMBARI-20349 > https://issues.apache.org/jira/browse/AMBARI-20349 > > > Repository: ambari > > > Description > ------- > > When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is > enabled, PXF Process alert fails with the following errors in the > ambari-agent.log file > > ``` > ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. > URL: > http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy > stem > Traceback (most recent call last): > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", > line 41, in get_value_from_jmx > data_dict = json.loads(data) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line > 307, in loads > return _default_decoder.decode(s) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line > 335, in decode > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line > 353, in raw_decode > raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded > INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l > -s /bin/bash -c 'curl --negotiate -u : -s > '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' > 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False} > INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '') > ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. > URL: > http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem > Traceback (most recent call last): > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", > line 41, in get_value_from_jmx > data_dict = json.loads(data) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line > 307, in loads > return _default_decoder.decode(s) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line > 335, in decode > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line > 353, in raw_decode > raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded > ``` > > # Cause > During the test for the _PXF Process_ alert, the Active NN is found using a > JMX call. This call requires SPNEGO authentication since SPNEGO > authentication is turned on for the Hadoop web interfaces. However, a valid > Kerberos ticket is not found in the configured user's Kerberos ticket cache. > In this case, the configured users is the HDFS user - which technically is > not necessary. > > This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137` > ``` > if CLUSTER_ENV_SECURITY in configurations and > configurations[CLUSTER_ENV_SECURITY].lower() == "true": > if 'dfs.nameservices' in configurations[HDFS_SITE]: > namenode_address = > get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), > configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1] > else: > namenode_address = > configurations[HDFS_SITE]['dfs.namenode.http-address'] > > token = _get_delegation_token(namenode_address, > configurations[HADOOP_ENV_HDFS_USER], > > configurations[HADOOP_ENV_HDFS_USER_KEYTAB], > > configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME], > None) > commonPXFHeaders.update({"X-GP-TOKEN": token}) > ``` > > Inside the call at > > ``` > namenode_address = > get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), > configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1] > ``` > > # Solution > Ensure the configured user's Kerberos ticket cache contains a valid ticket > before querying for the active NN. Possibly change the acting user to one > executing the PXF component. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py > d0ed0a4 > > > Diff: https://reviews.apache.org/r/57410/diff/1/ > > > Testing > ------- > > Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2 > > > Thanks, > > Robert Levas > >