[
https://issues.apache.org/jira/browse/ATLAS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yushi Hayasaka updated ATLAS-5095:
----------------------------------
Description:
Sometimes Atlas attempts to load an entity from the cache (e.g., to notify
listeners of processed entities after `createOrUpdate()`).
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L176]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L595]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L465]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L418]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L111-L115]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java#L1145-L1146]
If the specified entity is not found in the cache, Atlas falls back to
retrieving it through `EntityGraphRetriever#toAtlasEntity`, which is slow path
compared to cache.
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L181]
Currently, we observe that Atlas tries to retrieve shell entities from
EntityGraphRetriever instead of cache.
When there are many shell entities in the event, it increases the operation
time.
As introduced in ATLAS-3405, if the non-existing entities are included in the
event, Atlas creates the shell entity.
In my understanding (please correct me if wrong), the shell entity should only
contain some properties which are specified in createShellEntityVertex.
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java#L291-L312]
So, I guess it is safe to cache after creation (e.g. right after
`createShellEntityVertex`), and it leads to improve the performance by reducing
calling slow path.
was:
Sometimes Atlas attempts to load an entity from the cache (e.g., to notify
listeners of processed entities after `createOrUpdate()`).
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L176]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L595]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L465]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L418]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L111-L115]
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java#L1145-L1146]
If the specified entity is not found in the cache, Atlas falls back to
retrieving it through `EntityGraphRetriever#toAtlasEntity`, which is slow path
compared to cache.
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L181]
Currently, we observe that Atlas tries to retrieve shell entities from
EntityGraphRetriever instead of cache.
When there are many shell entities in the event, it increases the operation
time.
As introduced in ATLAS-3405, if the non-existing entities are included in the
event, Atlas creates the shell entity.
In my understanding (please correct me if wrong), the shell entity should only
have some properties which are specified in createShellEntityVertex.
[https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java#L291-L312]
So, I guess it is safe to cache after creation (e.g. right after
`createShellEntityVertex`), and it leads to improve the performance by reducing
calling slow path.
> Cache shell entity after creation to prevent from cache miss
> ------------------------------------------------------------
>
> Key: ATLAS-5095
> URL: https://issues.apache.org/jira/browse/ATLAS-5095
> Project: Atlas
> Issue Type: Improvement
> Reporter: Yushi Hayasaka
> Priority: Minor
>
> Sometimes Atlas attempts to load an entity from the cache (e.g., to notify
> listeners of processed entities after `createOrUpdate()`).
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L176]
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L595]
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L465]
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L418]
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java#L111-L115]
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java#L1145-L1146]
> If the specified entity is not found in the cache, Atlas falls back to
> retrieving it through `EntityGraphRetriever#toAtlasEntity`, which is slow
> path compared to cache.
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java#L181]
> Currently, we observe that Atlas tries to retrieve shell entities from
> EntityGraphRetriever instead of cache.
> When there are many shell entities in the event, it increases the operation
> time.
> As introduced in ATLAS-3405, if the non-existing entities are included in the
> event, Atlas creates the shell entity.
> In my understanding (please correct me if wrong), the shell entity should
> only contain some properties which are specified in createShellEntityVertex.
> [https://github.com/apache/atlas/blob/18d7f9dccf5658988d32e387339948286810f0a8/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java#L291-L312]
> So, I guess it is safe to cache after creation (e.g. right after
> `createShellEntityVertex`), and it leads to improve the performance by
> reducing calling slow path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)