[
https://issues.apache.org/jira/browse/ARTEMIS-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033516#comment-17033516
]
Francesco Nigro commented on ARTEMIS-2618:
------------------------------------------
I am not sure is a good idea: there are several cases on Linux kernel where
retrying I/O after an I/O error would clear it, resulting in an unnoticed real
critical error and corrupted file system. That why this behaviour wasn't
implemented in the past.
I still think that id there is a watchdog holding the fike to be used, this
watchdog should be disabled or it need to run when there is no disk activity
instead.
The risk would be to make this happen during writing too , resulting in an even
higher risk from the point of view of disk integrity.
> Improve Handling of Shutdown on critical I/O Error
> --------------------------------------------------
>
> Key: ARTEMIS-2618
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2618
> Project: ActiveMQ Artemis
> Issue Type: Improvement
> Affects Versions: 2.11.0
> Reporter: Rico Neubauer
> Priority: Major
> Attachments: Improve-Handling-of-Shutdown-on-critic.patch
>
>
> Would like to request an improvement in the handling of critical I/O errors
> on opening journal files.
> If {{org.apache.activemq.artemis.core.io.nio.NIOSequentialFile}} fails to
> open a journal file, the whole server shuts down with {{@Message(id = 222010,
> value = "Critical IO Error, shutting down the server. file=1, message=0"}}.
> We have seen this in the wild, where a backup-software locked the file for a
> short time while journal was about getting opened, resulting in the shutdown.
> Proposed improvement would be to have a short-running retry for opening the
> journal files and only fail fatally if error persists.
> Will attach a proposal patch. Can also create a PR if you accept.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)