Copilot commented on code in PR #9162:
URL: https://github.com/apache/gravitino/pull/9162#discussion_r2559215231


##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -57,20 +82,28 @@ public ReverseIndexCache() {
         GenericEntity.class, 
ReverseIndexRules.GENERIC_METADATA_OBJECT_REVERSE_RULE);
   }
 
-  public boolean remove(EntityCacheKey key) {
-    return reverseIndex.remove(key.toString());
-  }
-
   public Iterable<List<EntityCacheKey>> getValuesForKeysStartingWith(String 
keyPrefix) {
     return reverseIndex.getValuesForKeysStartingWith(keyPrefix);
   }
 
-  public Iterable<CharSequence> getKeysStartingWith(String keyPrefix) {
-    return reverseIndex.getKeysStartingWith(keyPrefix);
-  }
+  public boolean remove(EntityCacheKey key) {
+    List<EntityCacheKey> relatedKeys = entityToReverseIndexMap.remove(key);
+    if (CollectionUtils.isNotEmpty(relatedKeys)) {
+      for (EntityCacheKey relatedKey : relatedKeys) {
+        List<EntityCacheKey> existingKeys = 
reverseIndex.getValueForExactKey(relatedKey.toString());
+        if (existingKeys != null && existingKeys.contains((key))) {

Review Comment:
   Extra parenthesis in the `contains()` method call. Should be 
`existingKeys.contains(key)` instead of `existingKeys.contains((key))`.
   ```suggestion
           if (existingKeys != null && existingKeys.contains(key)) {
   ```



##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -79,7 +112,8 @@ public int size() {
 
   public void put(
       NameIdentifier nameIdentifier, Entity.EntityType type, 
EntityCacheRelationKey key) {
-    EntityCacheKey entityCacheKey = EntityCacheKey.of(nameIdentifier, type);
+    EntityCacheRelationKey entityCacheKey = 
EntityCacheRelationKey.of(nameIdentifier, type);

Review Comment:
   Inconsistent key type usage: This method creates an `EntityCacheRelationKey` 
on line 115, but the `get()` method at line 133 uses `EntityCacheKey.of()` for 
the same purpose. This inconsistency could lead to subtle bugs. Both methods 
should use the same key type - either both should use 
`EntityCacheKey.of(nameIdentifier, type)` or both should use 
`EntityCacheRelationKey.of(nameIdentifier, type)`. Since the variable is used 
as a lookup key in the reverseIndex and should match what's used in `get()`, it 
should be `EntityCacheKey`.
   ```suggestion
       EntityCacheKey entityCacheKey = EntityCacheKey.of(nameIdentifier, type);
   ```



##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -45,6 +47,29 @@ public class ReverseIndexCache {
   /** Registers a reverse index processor for a specific entity class. */
   private final Map<Class<? extends Entity>, ReverseIndexRule> 
reverseIndexRules = new HashMap<>();
 
+  /**
+   * Map from data entity key to a list of entity cache relation keys. This is 
used for reverse
+   * indexing.
+   *
+   * <p>For example, a role entity may be related to multiple securable 
objects, so we need to
+   * maintain a mapping from the role entity key to the list of securable 
object keys. that is
+   * dataToReverseIndexMap: roleEntityKey -> [securableObjectKey1, 
securableObjectKey2, ...]
+   *
+   * <p>This map is used to quickly find all the related entity cache keys 
when we need to
+   * invalidate in the reverse index if a role entity is updated. The 
following is an example: a
+   * Role a has securable objects s1 and s2, so we have the following mapping: 
<br>
+   * cacheData: role1 -> role entity </br> <br>
+   * reverseIndex: s1 -> [role1], s2 -> [role1] </br>
+   *
+   * <p>This map will be: <br>
+   * role1 -> [s1, s2] </br>
+   *
+   * <p>When we update role1, we need to invalidate s1 and s2 from the reverse 
index, or the data
+   * will be in the memory forever. However, the main branch before this PR 
does not support this
+   * operation directly as we do not maintain such a map.
+   */
+  private Map<EntityCacheKey, List<EntityCacheKey>> entityToReverseIndexMap = 
Maps.newHashMap();

Review Comment:
   Thread-safety issue: `entityToReverseIndexMap` is initialized as a regular 
`HashMap` using `Maps.newHashMap()`, but it's accessed in concurrent methods 
(`put()` and `remove()`). While the `reverseIndex` uses a `ConcurrentRadixTree` 
for thread-safe operations, this new map could be accessed concurrently by 
different threads operating on different lock segments in 
`CaffeineEntityCache`. This should be changed to a thread-safe map like 
`ConcurrentHashMap` to prevent potential race conditions and data corruption. 
Use `Maps.newConcurrentMap()` or `new ConcurrentHashMap<>()` instead.
   ```suggestion
     private Map<EntityCacheKey, List<EntityCacheKey>> entityToReverseIndexMap 
= Maps.newConcurrentMap();
   ```



##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -45,6 +47,29 @@ public class ReverseIndexCache {
   /** Registers a reverse index processor for a specific entity class. */
   private final Map<Class<? extends Entity>, ReverseIndexRule> 
reverseIndexRules = new HashMap<>();
 
+  /**
+   * Map from data entity key to a list of entity cache relation keys. This is 
used for reverse
+   * indexing.
+   *
+   * <p>For example, a role entity may be related to multiple securable 
objects, so we need to
+   * maintain a mapping from the role entity key to the list of securable 
object keys. that is
+   * dataToReverseIndexMap: roleEntityKey -> [securableObjectKey1, 
securableObjectKey2, ...]

Review Comment:
   Documentation inconsistency: The comment on line 56 refers to 
"dataToReverseIndexMap" but the actual field name on line 71 is 
"entityToReverseIndexMap". The documentation should use the correct field name 
to avoid confusion.
   ```suggestion
      * entityToReverseIndexMap: roleEntityKey -> [securableObjectKey1, 
securableObjectKey2, ...]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to