----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/74130/ -----------------------------------------------------------
(Updated Sept. 22, 2022, 11:47 p.m.) Review request for atlas, Jayendra Parab, Mandar Ambawane, and Pinal Shah. Repository: atlas Description (updated) ------- Problem statement : While working with a kafka dump which contained messages from spark streaming applications, it was observed that when an application is getting updated, it takes longest time while re-indexing the edges and that "deleted" relationship edges were also being re-indexed every-time an application was getting updated for an incoming process message. This takes a few minutes to process for 35k processes, average time was 135 seconds; this time would increase as new processes enter the system. Changes have been made to consider only active edges to process the relationship edges which always ends up considering only new additional edges for processing/indexing leading to a significant difference in processing time when number of deleted edges are too high for an updating entity Diffs ----- repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 68d331dfd Diff: https://reviews.apache.org/r/74130/diff/1/ Testing (updated) ------- We tested the same kafka dump for the changes and the time taken to process messages was significantly less. Running the dump with the fix showed a drastic improvement in that it considered only non-deleted edges for processing/re-indexing leading to a consistent processing time of around 1 to 2 seconds. Thanks, Sheetal Shah