Aravindan Vijayan created AMBARI-18705:
------------------------------------------
Summary: All host metrics not being collected by AMS if user does
not have permissions to read mount point.
Key: AMBARI-18705
URL: https://issues.apache.org/jira/browse/AMBARI-18705
Project: Ambari
Issue Type: Bug
Components: ambari-metrics
Affects Versions: 2.4.2
Reporter: Aravindan Vijayan
Assignee: Aravindan Vijayan
Priority: Critical
Fix For: 2.4.2
PROBLEM
Host metrics are not being collected by AMS in certain environments with
restrictive disks.
This is because the metrics monitor process which collects the host metrics has
insufficient permissions to read a mountpoint. It fails rather than sending
other metrics.
{code}
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 1083, in run
self.function(*self.args, **self.kwargs)
File
"/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py",
line 45, in process_event
self.process_host_collection_event(event)
File
"/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py",
line 76, in process_host_collection_event
metrics.update(self.host_info.get_combined_disk_usage())
File
"/usr/lib/python2.6/site-packages/resource_monitoring/core/host_info.py", line
188, in get_combined_disk_usage
usage = psutil.disk_usage(part.mountpoint)
File
"/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py",
line 1690, in disk_usage
return _psplatform.disk_usage(path)
File
"/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/_psposix.py",
line 121, in disk_usage
st = os.statvfs(path)
OSError: [Errno 13] Permission denied: '/abc/def/mountpoint'
{code}
{code}
[root@ctr-e45-1475874954070-13451-01-000004 ~]# ls -lrt /abc/def/mountpoint
total 4
drwx--x--- 4 nobody 3004 36 Oct 26 07:20 filecache
drwxr-s--- 5 nobody 3004 4096 Oct 26 07:21
container_e45_1475874954070_13451_01_000004
{code}
FIX
In AMS monitor, tolerate this failure and send other host metrics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)