Ethan Rose created HDDS-9367:
--------------------------------
Summary: Improve RocksDB related error handling within Recon
Key: HDDS-9367
URL: https://issues.apache.org/jira/browse/HDDS-9367
Project: Apache Ozone
Issue Type: Improvement
Reporter: Ethan Rose
For issues like HDDS-8734, Recon will close the RocksDB instance
[here|https://github.com/apache/ozone/blob/2c578c3ae73417d2fd2f24ab356632ddb5c427fb/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/recovery/ReconOmMetadataManagerImpl.java#L123],
but farther up
[here|https://github.com/apache/ozone/blob/2a826133d681e3ed8a789c77b6c6f8c622e4c743/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java#L543]
the exception is swallowed. This leaves Recon running but the OM DB
inaccessible. It will return 500 from APIs needing that DB and continue to log
errors, but the initial cause of the DB close could be lost if the logs roll
off from all the errors. A Recon instance could remain in this half-working
state for quite some time if it is not actively being used.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]