kaori-seasons commented on issue #7370:
URL: https://github.com/apache/gravitino/issues/7370#issuecomment-2972319655
@yuqi1129
The root cause of this Metaspace OOM error is that the IsolatedClassLoader
instance is not released in time during the high-frequency catalog attribute
update operation, resulting in Metaspace memory leak.
- 1. Error triggering process
The loop operation in the user script triggers the following problem chain:
Catalog attribute update causes cache invalidation: Each time a PUT request
is executed to update the catalog attribute, the alterCatalog method will first
invalidate the catalog cache CatalogManager.java:683 and then reload the
catalog instance CatalogManager.java:703-706.
Frequent creation of IsolatedClassLoader: Each time the catalog is reloaded,
the createCatalogWrapper method will create a new IsolatedClassLoader instance
CatalogManager.java:962.
ServiceLoader loads consume Metaspace: In the lookupCatalogProvider method,
the ServiceLoader.load operation loads the CatalogProvider class in the
IsolatedClassLoader CatalogManager.java:1149, which creates a large amount of
class metadata in the Metaspace.
- 2. Memory leak mechanism
ClassLoader accumulation: Although the system is configured with a cache
cleanup mechanism CatalogManager.java:302-306 , under high-frequency
operations, the creation speed of new IsolatedClassLoaders may exceed the
cleanup speed of garbage collection.
Metaspace recovery lag: Each IsolatedClassLoader will load class metadata in
Metaspace. Even if the CatalogWrapper.close() method will close the classLoader
CatalogManager.java:250 , there may be a delay in the recovery of Metaspace.
*Solution transformation plan*
- 1. Short-term mitigation plan
Increase JVM Metaspace configuration:
```
-XX:MetaspaceSize=256m
-XX:MaxMetaspaceSize=512m
-XX:+CMSClassUnloadingEnabled
-XX:+UseCMSInitiatingOccupancyOnly
```
Adjust cache expiration time: Appropriately extend the expiration time of
the catalog cache to reduce unnecessary reloading frequency.
- 2. Code optimization plan
Optimize the catalog attribute update mechanism:
- Analyze which attribute changes really need to recreate the catalog
instance
- For attribute updates that do not affect the core functions of the
catalog, consider hot updates instead of reloading
- Implement an incremental update mechanism for attribute changes
Improve the IsolatedClassLoader lifecycle management:
- Add stricter resource management in the createCatalogWrapper method
CatalogManager.java:957-983
- Consider implementing the reuse mechanism of IsolatedClassLoader, and
share the classloader for catalogs of the same provider type
Enhance the cleanup logic of the IsolatedClassLoader.close() method
IsolatedClassLoader.java:150-158
Add protection mechanisms:
- Implement current limiting when the catalog operation frequency is too
high
- Increase Metaspace usage monitoring and alarms
- Implement a batch processing mechanism for catalog updates
3. Monitoring and alerting
Add key indicator monitoring:
- Metaspace usage rate
- IsolatedClassLoader creation/destruction frequency
- Catalog cache hit rate
- ServiceLoader call frequency
Implement downgrade strategy
- When Metaspace usage is too high, suspend non-critical catalog operations
- Implement a queue mechanism for catalog updates to avoid concurrent
operations
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]