Ashutosh Mestry created ATLAS-2434:
--------------------------------------

             Summary: Import: Performance Improvement
                 Key: ATLAS-2434
                 URL: https://issues.apache.org/jira/browse/ATLAS-2434
             Project: Atlas
          Issue Type: Bug
          Components:  atlas-core
    Affects Versions: trunk
            Reporter: Ashutosh Mestry
            Assignee: Ashutosh Mestry
             Fix For: trunk


*Background*

The introduction of _relationships_ within Atlas, caused the 
_EntityMutationResponse_ to contain many more entities as modified than before.

This has adverse impact on performance when it comes to bulk entity creation. 
Entity creation in bulk happens during import process. Single entity creation.

*Behavior*

During import, in a typical scenario where database is being imported. The 
_EntityMutationResponse_'s updated entities grows progressively. This happens 
because every edge created between database-table and table-column is marked as 
updated entity.

Import thus slows down progressively.

On a ZIP file used for benchmarks, showed:
 * Branch-0.8 (last release): 2 minutes.
 * Master (current development): 40+ minutes.

The behavior deteriorates as size of import increases.

*Possible Solution*

During import process, avoid marking entities affected due to relationship edge 
creation as modified.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to