[ 
https://issues.apache.org/jira/browse/IGNITE-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515669#comment-16515669
 ] 

Alexey Goncharuk commented on IGNITE-8203:
------------------------------------------

[~alex_pl], I do not think the fix is correct. The method 
{{getOrAllocatePartitionMetas}} returns allocated flag if the partition meta 
page was allocated by this call.
Consider the following sequence:
1) {{getOrAllocatePartitionMetas()}} is called, the method allocates partition 
metas and formats the page
2) The thread continues to {{CacheDataTree}} initialization, but interrupted 
exception is thrown
3) The thread catches the exception and goes to the next iteration
4) On second iteration {{getOrAllocatePartitionMetas()}} will return 
allocated=false, and cache store will not proceed to tree initialization (or an 
invalid page content exception will be thrown)

I would catch the interrupted exception in the FilePageStore itself and reopen 
the file. In this case, the page read operation will be atomic and safe for all 
PageStore users.

> Interrupting task can cause node fail with PersistenceStorageIOException. 
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-8203
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8203
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 2.4
>            Reporter: Ivan Daschinskiy
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>             Fix For: 2.6
>
>         Attachments: GridFailNodesOnCanceledTaskTest.java
>
>
> Interrupting task with simple cache operations (i.e. get, put) can cause 
> PersistenceStorageIOException. Main cause of this failure is lack of proper 
> handling InterruptedException in FilePageStore.init() etc. This cause a throw 
> of ClosedByInterruptException by FileChannel.write() and so on. 
> PersistenceStorageIOException is a critical failure and typically makes a 
> node to stop. As a workaround, I would suggest to enable AsyncFileIO by 
> default until the fix was available.
> A reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to