[jira] [Updated] (ATLAS-3878) Notifications: Improve Memory Usage in Scale Enviroment

Ashutosh Mestry (Jira) Wed, 08 Jul 2020 09:35:13 -0700


     [ 
https://issues.apache.org/jira/browse/ATLAS-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ashutosh Mestry updated ATLAS-3878:
-----------------------------------
    Description: 
*Background*

As part of entity creation, Atlas sends notifications of different types. 
Current implementation, to listeners. Listeners in turn perform specific tasks.

At a more concrete level, the _EntityAuditListenerV2_ will write audits and the 
_NotificationEntityChangeListener_ will send Kafka notifications.

Each of the listeners create notification objects. The notification objects are 
large in number and are short lived.

The transient nature of the notification objects causes memory pressure in 
scale environment.

*Solution*

Create object pool for notification objects. This way objects can be reused.and 
existing design can be kept in tact. This will also offer benefit of using 
existing test setup for verification.

*Tests Used*

_Setup_ 

Create a test rig that will spawn multiple works that will invoke Atlas' bulk 
APIs for entity creation.

Node: 40 workers, 8 GB allocated memory and 40 cores.

_Observation_

About 40 mins into the exercise, memory pressure builds up causing GC collects 
to take longer. This causes ZK timeout and finally Atlas process crashes.

  was:
*Background*

As part of entity creation, Atlas sends notifications of different types. 
Current implementation, to listeners. Listeners in turn perform specific tasks.

At a more concrete level, the _EntityAuditListenerV2_ will write audits and the 
_NotificationEntityChangeListener_ will send Kafka notifications.

Each of the listeners create notification objects. The notification objects are 
large in number and are short lived.

The transient nature of the notification objects causes memory pressure in 
scale environment.

*Solution*

Create object pool for notification objects. This way objects can be reused.and 
existing design can be kept in tact. This will also offer benefit of using 
existing test setup for verification.

*Tests Used*

_Setup_ 

Create a test rig that will spawn multiple works that will invoke Atlas' bulk 
APIs for entity creation.

Node: 40 workers, 8 GB allocated memory and 40 cores.

**_Observation_

About 40 mins into the exercise, memory pressure builds up causing GC collects 
to take longer. This causes ZK timeout and finally Atlas process crashes.


> Notifications: Improve Memory Usage in Scale Enviroment
> -------------------------------------------------------
>
>                 Key: ATLAS-3878
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3878
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core
>    Affects Versions: 2.0.0, trunk
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>            Priority: Major
>             Fix For: trunk
>
>
> *Background*
> As part of entity creation, Atlas sends notifications of different types. 
> Current implementation, to listeners. Listeners in turn perform specific 
> tasks.
> At a more concrete level, the _EntityAuditListenerV2_ will write audits and 
> the _NotificationEntityChangeListener_ will send Kafka notifications.
> Each of the listeners create notification objects. The notification objects 
> are large in number and are short lived.
> The transient nature of the notification objects causes memory pressure in 
> scale environment.
> *Solution*
> Create object pool for notification objects. This way objects can be 
> reused.and existing design can be kept in tact. This will also offer benefit 
> of using existing test setup for verification.
> *Tests Used*
> _Setup_ 
> Create a test rig that will spawn multiple works that will invoke Atlas' bulk 
> APIs for entity creation.
> Node: 40 workers, 8 GB allocated memory and 40 cores.
> _Observation_
> About 40 mins into the exercise, memory pressure builds up causing GC 
> collects to take longer. This causes ZK timeout and finally Atlas process 
> crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ATLAS-3878) Notifications: Improve Memory Usage in Scale Enviroment

Reply via email to