[ https://issues.apache.org/jira/browse/ATLAS-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Mestry updated ATLAS-3878: ----------------------------------- Description: *Background* As part of entity creation, Atlas sends notifications of different types. Current implementation, to listeners. Listeners in turn perform specific tasks. At a more concrete level, the _EntityAuditListenerV2_ will write audits and the _NotificationEntityChangeListener_ will send Kafka notifications. Each of the listeners create notification objects. The notification objects are large in number and are short lived. The transient nature of the notification objects causes memory pressure in scale environment. *Solution* Create object pool for notification objects. This way objects can be reused.and existing design can be kept in tact. This will also offer benefit of using existing test setup for verification. *Tests Used* _Setup_ Create a test rig that will spawn multiple works that will invoke Atlas' bulk APIs for entity creation. Node: 40 workers, 8 GB allocated memory and 40 cores. _Observation_ About 40 mins into the exercise, memory pressure builds up causing GC collects to take longer. This causes ZK timeout and finally Atlas process crashes. was: *Background* As part of entity creation, Atlas sends notifications of different types. Current implementation, to listeners. Listeners in turn perform specific tasks. At a more concrete level, the _EntityAuditListenerV2_ will write audits and the _NotificationEntityChangeListener_ will send Kafka notifications. Each of the listeners create notification objects. The notification objects are large in number and are short lived. The transient nature of the notification objects causes memory pressure in scale environment. *Solution* Create object pool for notification objects. This way objects can be reused.and existing design can be kept in tact. This will also offer benefit of using existing test setup for verification. *Tests Used* _Setup_ Create a test rig that will spawn multiple works that will invoke Atlas' bulk APIs for entity creation. Node: 40 workers, 8 GB allocated memory and 40 cores. **_Observation_ About 40 mins into the exercise, memory pressure builds up causing GC collects to take longer. This causes ZK timeout and finally Atlas process crashes. > Notifications: Improve Memory Usage in Scale Enviroment > ------------------------------------------------------- > > Key: ATLAS-3878 > URL: https://issues.apache.org/jira/browse/ATLAS-3878 > Project: Atlas > Issue Type: Improvement > Components: atlas-core > Affects Versions: 2.0.0, trunk > Reporter: Ashutosh Mestry > Assignee: Ashutosh Mestry > Priority: Major > Fix For: trunk > > > *Background* > As part of entity creation, Atlas sends notifications of different types. > Current implementation, to listeners. Listeners in turn perform specific > tasks. > At a more concrete level, the _EntityAuditListenerV2_ will write audits and > the _NotificationEntityChangeListener_ will send Kafka notifications. > Each of the listeners create notification objects. The notification objects > are large in number and are short lived. > The transient nature of the notification objects causes memory pressure in > scale environment. > *Solution* > Create object pool for notification objects. This way objects can be > reused.and existing design can be kept in tact. This will also offer benefit > of using existing test setup for verification. > *Tests Used* > _Setup_ > Create a test rig that will spawn multiple works that will invoke Atlas' bulk > APIs for entity creation. > Node: 40 workers, 8 GB allocated memory and 40 cores. > _Observation_ > About 40 mins into the exercise, memory pressure builds up causing GC > collects to take longer. This causes ZK timeout and finally Atlas process > crashes. -- This message was sent by Atlassian Jira (v8.3.4#803005)