[ 
https://issues.apache.org/jira/browse/OAK-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660184#comment-16660184
 ] 

Michael Dürig edited comment on OAK-7852 at 10/23/18 7:00 AM:
--------------------------------------------------------------

I implemented a patch for the different approach mentioned in my previous 
comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This 
introduces to thresholds: after a certain time without {{flush}} a warning is 
written to the log for each further write operation but no more than on a 
second. After some more time without {{flush}} when the second threshold is 
reached an error is written to the log and further writer operations fail with 
{{IOException: "Write operations disallowed: transient write operations not 
flushed for too long}}" until a {{flush}} occurs.

[~frm], please have a look.


was (Author: mduerig):
I implemented a patch for the different approach mentioned in my previous 
comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This 
introduces to thresholds: after a certain time without {{flush}} a warning is 
written to the log for each further write operation but no more than on a 
second. After some more time without {{flush}} when the second threshold is 
reached an error is written to the log and further writer operations fail with 
{{IOException: Write operations disallowed: transient write operations not 
flushed for too long}}.

[~frm], please have a look.

> Blocked background flush can cause sever data loss 
> ---------------------------------------------------
>
>                 Key: OAK-7852
>                 URL: https://issues.apache.org/jira/browse/OAK-7852
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Major
>             Fix For: 1.10
>
>
> When the {{FileStore background task}} fails (e.g. because of a deadlock) and 
> the {{FileStore}} is subsequently shutdown in an unclean way ({{kill -9}}) 
> then there is a risk of a sever data loss. Although a journal could be 
> reconstructed from the segments, there is a chance that most if not all of 
> the revisions written since the failure of the background tasks are 
> inconsistent with a {{SNFE}}. 
> The expectation for such a case should be that a journal could be 
> reconstructed from the segments and that all but the last few revisions are 
> consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to