[ https://issues.apache.org/jira/browse/ATLAS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356501#comment-16356501 ]
Madhan Neethiraj commented on ATLAS-2434: ----------------------------------------- [~ashutoshm] - following change might miss to updated entities in notifications - like when import adds a process entity that connects two existing entities. In this case, update to existing entities should be handled as such (instead of ignoring it). Please review the updated patch. {code} --- a/repository/src/main/java/org/apache/atlas/repository/store/graph/v1/EntityGraphMapper.java +++ b/repository/src/main/java/org/apache/atlas/repository/store/graph/v1/EntityGraphMapper.java @@ -403,7 +403,7 @@ public class EntityGraphMapper { // created new relationship, // record entity update on both vertices of the new relationship - if (currentEdge == null && newEdge != null) { + if (!context.isImport() && currentEdge == null && newEdge != null) { // based on relationship edge direction record update only on attribute vertex if (edgeDirection == IN) { @@ -706,7 +706,7 @@ public class EntityGraphMapper { // if relationship did not exist before and new relationship was created // record entity update on both relationship vertices - if (!relationshipExists) { + if (!relationshipExists && !context.isImport()) { recordEntityUpdate(attributeVertex); } } {code} > Import: Performance Improvement > ------------------------------- > > Key: ATLAS-2434 > URL: https://issues.apache.org/jira/browse/ATLAS-2434 > Project: Atlas > Issue Type: Bug > Components: atlas-core > Affects Versions: trunk > Reporter: Ashutosh Mestry > Assignee: Ashutosh Mestry > Priority: Major > Fix For: trunk > > Attachments: ATLAS-2434-2.patch, > ATLAS-2434-Import-Perf-Improvement.patch > > > *Background* > The introduction of _relationships_ within Atlas, caused the > _EntityMutationResponse_ to contain many more entities as modified than > before. > This has adverse impact on performance when it comes to bulk entity creation. > Entity creation in bulk happens during import process. Single entity creation. > *Behavior* > During import, in a typical scenario where database is being imported. The > _EntityMutationResponse_'s updated entities grows progressively. This happens > because every edge created between database-table and table-column is marked > as updated entity. > Import thus slows down progressively. > On a ZIP file used for benchmarks, showed: > * Branch-0.8 (last release): 2 minutes. > * Master (current development): 40+ minutes. > The behavior deteriorates as size of import increases. > *Possible Solution* > During import process, avoid marking entities affected due to relationship > edge creation as modified. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)