[
https://issues.apache.org/jira/browse/OAK-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660184#comment-16660184
]
Michael Dürig edited comment on OAK-7852 at 10/23/18 7:00 AM:
--------------------------------------------------------------
I implemented a patch for the different approach mentioned in my previous
comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This
introduces to thresholds: after a certain time without {{flush}} a warning is
written to the log for each further write operation but no more than on a
second. After some more time without {{flush}} when the second threshold is
reached an error is written to the log and further writer operations fail with
{{IOException: "Write operations disallowed: transient write operations not
flushed for too long}}" until a {{flush}} occurs.
[~frm], please have a look.
was (Author: mduerig):
I implemented a patch for the different approach mentioned in my previous
comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This
introduces to thresholds: after a certain time without {{flush}} a warning is
written to the log for each further write operation but no more than on a
second. After some more time without {{flush}} when the second threshold is
reached an error is written to the log and further writer operations fail with
{{IOException: Write operations disallowed: transient write operations not
flushed for too long}}.
[~frm], please have a look.
> Blocked background flush can cause sever data loss
> ---------------------------------------------------
>
> Key: OAK-7852
> URL: https://issues.apache.org/jira/browse/OAK-7852
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Priority: Major
> Fix For: 1.10
>
>
> When the {{FileStore background task}} fails (e.g. because of a deadlock) and
> the {{FileStore}} is subsequently shutdown in an unclean way ({{kill -9}})
> then there is a risk of a sever data loss. Although a journal could be
> reconstructed from the segments, there is a chance that most if not all of
> the revisions written since the failure of the background tasks are
> inconsistent with a {{SNFE}}.
> The expectation for such a case should be that a journal could be
> reconstructed from the segments and that all but the last few revisions are
> consistent.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)