> On April 2, 2020, 5:39 a.m., Madhan Neethiraj wrote: > > repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java > > Lines 344 (patched) > > <https://reviews.apache.org/r/72287/diff/1/?file=2216514#file2216514line344> > > > > edgeLabel is typicallu used to find subset of edges from a given > > vertex. Having an edge-index on the label probably won't help improve the > > performance; however, need to understand the impact of creating this index > > in an existing Atlas instance having large number of edges. 1) Would index > > be populated with existing edge labels? 2) If yes, how long would the index > > creation take - say for 1m edges? 3) If no, would search ignore edges that > > were not indexd? > > > > I suggest to find the performace impact of not having this index.
I did a run last night without the index and it did not have impact on the performance. I have removed this change. - Ashutosh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72287/#review220181 ----------------------------------------------------------- On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72287/ > ----------------------------------------------------------- > > (Updated March 30, 2020, 11:19 p.m.) > > > Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, > and Sarath Subramanian. > > > Bugs: ATLAS-3706 > https://issues.apache.org/jira/browse/ATLAS-3706 > > > Repository: atlas > > > Description > ------- > > **Approach** > > 1. Added Metrics to most of the methods in entity creation. (The patch does > not include the additional metrics added to additional places.) > 2. Started importing large number of entities using the > _ZipFileMigrationImporter_. > 3. Observed behavior of import over 24 hours. Observations included CPU > usage, memory usage and the import throughput using the _metric.log_. > 4. Changes were added to the one at a time. Impact of the change was observed > for performance (via metric.log) and accuracy before next change was added. > > **Observations** > * Relationship creation took inordinately large amount of time under load. > The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This > implementation also caused memory build up of _AtlasEdge_ objects which > stayed in memory for long time. This had the secondary effect of slowing down > entity creation operations after about 6 hours (this duration differed with > node configuration). > * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is > time consuming. > * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation > operation included lookup by edge label. > > **Configuration** > Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space. > Atlas configuration: 32 GB RAM. > > > Diffs > ----- > > > repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java > 647e3040c > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java > 5ab9f4d13 > > > Diff: https://reviews.apache.org/r/72287/diff/1/ > > > Testing > ------- > > **Manual tests** > (See above). > Accuracy verification. > > **Unit tests** > Executed existing unit tests. > > **Pre-commit build** > https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/ > > > Thanks, > > Ashutosh Mestry > >