Ethan Rose created HDDS-9367:
--------------------------------

             Summary: Improve RocksDB related error handling within Recon
                 Key: HDDS-9367
                 URL: https://issues.apache.org/jira/browse/HDDS-9367
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ethan Rose


For issues like HDDS-8734, Recon will close the RocksDB instance 
[here|https://github.com/apache/ozone/blob/2c578c3ae73417d2fd2f24ab356632ddb5c427fb/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/recovery/ReconOmMetadataManagerImpl.java#L123],
 but farther up 
[here|https://github.com/apache/ozone/blob/2a826133d681e3ed8a789c77b6c6f8c622e4c743/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java#L543]
 the exception is swallowed. This leaves Recon running but the OM DB 
inaccessible. It will return 500 from APIs needing that DB and continue to log 
errors, but the initial cause of the DB close could be lost if the logs roll 
off from all the errors. A Recon instance could remain in this half-working 
state for quite some time if it is not actively being used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to