-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/74713/
-----------------------------------------------------------
(Updated Nov. 28, 2023, 3:26 a.m.)
Review request for atlas, Ashutosh Mestry, Jayendra Parab, Mandar Ambawane,
Pinal Shah, Sheetal Shah, and Sidharth Mishra.
Bugs: ATLAS-4803
https://issues.apache.org/jira/browse/ATLAS-4803
Repository: atlas
Description
-------
Kafka lag was not decreasing for ATLAS_HOOK topics, create Entity API was
taking 50-60 sec per request.
Hive_table typename count was 10mn record.
Impala_lineage_column typename count was 26mn count.
Able to reproduce the issue.
Metrics
This difference exists because earlier even fromVertex did not have any edges,
the search would iterate through all the edges of the toVertex and timeConsume
was high.
Before: "getRelationshipEdge":{"count":100000,"timeTaken":50000}
After removing if condition for toVertex.hasEdge:
"getRelationshipEdge":{"count":100000,"timeTaken":80}
Diffs
-----
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
0dd573b89
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
ef0313e02
Diff: https://reviews.apache.org/r/74713/diff/1/
Testing (updated)
-------
What was the relationship type?
__hive_db.table, __hive_table.columns
What entity type was identified and tested , meaning which entity type of
vertex took time to find edges?
Impala_column_lineage, impala_process, hive_table, hive_column
What was the count of the edges corresponding to that entity type?
Hive_column = 28m
Impala_column_lineage = 24m
Timing before and after
Before: "getRelationshipEdge":{"count":100000,"timeTaken":50000}
After removing if condition for toVertex.hasEdge:
"getRelationshipEdge":{"count":100000,"timeTaken":80}
Volume testing
Initiate kafka dump and lag started decreasing.
Thanks,
Paresh Devalia