[ 
https://issues.apache.org/jira/browse/AMBARI-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-24638:
-------------------------------------
    Description: 
There was one process which started using memory rapidly at certain point and 
grew up to ~27GB of RSS used until eventually we restarted it. Which happened 
after a month of running of 10 ambari-agent nodes.

    [root@andrew2-1n01 ~]# ps aux | grep ambari_agent
    root     39955  0.0  0.0  47580  6024 ?        S    Aug17   0:00 
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/AmbariAgent.py start
    root     39959 20.4 10.2 31623096 27154348 ?   Sl   Aug17 7645:55 
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/main.py start

Just before the growth in memory usage is seen. This exception pops out:

ERROR 2018-09-11 10:56:59,716 websocket.py:552 - Websocket connection was 
closed with an exception
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 549, in run
    if not self.once():
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 428, in once
    if not self.process(self.buf[:requested]):
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 483, in 
process
    self.reading_buffer_size = s.parser.send(bytes) or DEFAULT_READING_SIZE
ValueError: generator already executing

This exception is not seen on all other nodes or on this one at any other 
period (during 1 month). So I suggest it can be the root cause.
Basically this error means that generator is being used by multiple threads. So 
I will upload the fix to thread-lock this place.

This is just a guess solution which might work and might not. No way to test 
really. But definitely we should try this.
    
This is noticed in ambari-2.7.1.0-73 version as well.  



  was:
There was one process which started using memory rapidly at certain point and 
grew up to ~27GB of RSS used until eventually we restarted it. Which happened 
after a month of running of 10 ambari-agent nodes.

    (docker)[root@hcube2-1n01 ~]# ps aux | grep ambari_agent
    root     39955  0.0  0.0  47580  6024 ?        S    Aug17   0:00 
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/AmbariAgent.py start
    root     39959 20.4 10.2 31623096 27154348 ?   Sl   Aug17 7645:55 
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/main.py start

Just before the growth in memory usage is seen. This exception pops out:

ERROR 2018-09-11 10:56:59,716 websocket.py:552 - Websocket connection was 
closed with an exception
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 549, in run
    if not self.once():
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 428, in once
    if not self.process(self.buf[:requested]):
  File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 483, in 
process
    self.reading_buffer_size = s.parser.send(bytes) or DEFAULT_READING_SIZE
ValueError: generator already executing

This exception is not seen on all other nodes or on this one at any other 
period (during 1 month). So I suggest it can be the root cause.
Basically this error means that generator is being used by multiple threads. So 
I will upload the fix to thread-lock this place.

This is just a guess solution which might work and might not. No way to test 
really. But definitely we should try this.
    
This is noticed in ambari-2.7.1.0-73 version as well.  




> Ambari-agent process consuming more memory.
> -------------------------------------------
>
>                 Key: AMBARI-24638
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24638
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>            Priority: Major
>             Fix For: 2.7.2
>
>         Attachments: AMBARI-24638.patch
>
>
> There was one process which started using memory rapidly at certain point and 
> grew up to ~27GB of RSS used until eventually we restarted it. Which happened 
> after a month of running of 10 ambari-agent nodes.
>     [root@andrew2-1n01 ~]# ps aux | grep ambari_agent
>     root     39955  0.0  0.0  47580  6024 ?        S    Aug17   0:00 
> /usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/AmbariAgent.py start
>     root     39959 20.4 10.2 31623096 27154348 ?   Sl   Aug17 7645:55 
> /usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/main.py start
> Just before the growth in memory usage is seen. This exception pops out:
> ERROR 2018-09-11 10:56:59,716 websocket.py:552 - Websocket connection was 
> closed with an exception
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 549, in run
>     if not self.once():
>   File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 428, in 
> once
>     if not self.process(self.buf[:requested]):
>   File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 483, in 
> process
>     self.reading_buffer_size = s.parser.send(bytes) or DEFAULT_READING_SIZE
> ValueError: generator already executing
> This exception is not seen on all other nodes or on this one at any other 
> period (during 1 month). So I suggest it can be the root cause.
> Basically this error means that generator is being used by multiple threads. 
> So I will upload the fix to thread-lock this place.
> This is just a guess solution which might work and might not. No way to test 
> really. But definitely we should try this.
>     
> This is noticed in ambari-2.7.1.0-73 version as well.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to