Aravindan Vijayan created AMBARI-18705:
------------------------------------------

             Summary: All host metrics not being collected by AMS if user does 
not have permissions to read mount point.
                 Key: AMBARI-18705
                 URL: https://issues.apache.org/jira/browse/AMBARI-18705
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics
    Affects Versions: 2.4.2
            Reporter: Aravindan Vijayan
            Assignee: Aravindan Vijayan
            Priority: Critical
             Fix For: 2.4.2


PROBLEM
Host metrics are not being collected by AMS in certain environments with 
restrictive disks. 

This is because the metrics monitor process which collects the host metrics has 
insufficient  permissions to read a mountpoint. It fails rather than sending 
other metrics.

{code}
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 1083, in run
    self.function(*self.args, **self.kwargs)
  File 
"/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py",
 line 45, in process_event
    self.process_host_collection_event(event)
  File 
"/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py",
 line 76, in process_host_collection_event
    metrics.update(self.host_info.get_combined_disk_usage())
  File 
"/usr/lib/python2.6/site-packages/resource_monitoring/core/host_info.py", line 
188, in get_combined_disk_usage
    usage = psutil.disk_usage(part.mountpoint)
  File 
"/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py",
 line 1690, in disk_usage
    return _psplatform.disk_usage(path)
  File 
"/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/_psposix.py",
 line 121, in disk_usage
    st = os.statvfs(path)
OSError: [Errno 13] Permission denied: '/abc/def/mountpoint'
{code}

{code}
[root@ctr-e45-1475874954070-13451-01-000004 ~]# ls -lrt /abc/def/mountpoint
total 4
drwx--x--- 4 nobody 3004   36 Oct 26 07:20 filecache
drwxr-s--- 5 nobody 3004 4096 Oct 26 07:21 
container_e45_1475874954070_13451_01_000004
{code}

FIX
In AMS monitor, tolerate this failure and send other host metrics.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to