[ https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902941#comment-16902941 ]
Anton Kalashnikov commented on IGNITE-12048: -------------------------------------------- [~DmitriyGovorukhin] thanks for your changes. Looks good to me. > Bugs & tests fixes > ------------------ > > Key: IGNITE-12048 > URL: https://issues.apache.org/jira/browse/IGNITE-12048 > Project: Ignite > Issue Type: Bug > Reporter: Dmitriy Govorukhin > Assignee: Dmitriy Govorukhin > Priority: Major > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Page replacement can reload invalid page during checkpoint > There is a race between {{writeCheckpointPages}} and page replacement process: > * Checkpointer thread begins a checkpoint > * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page > content *and clear dirty flag* > * Page replacement tries to find a page for replacement and chooses this > page, the page is thrown away > * Before the page is written back to the store, the page is acquired again. > As a result, an older copy of the page is brought back to memory, which > causes all kinds of corruption exceptions and assertions. > ---- > checkpointReadLock() may hang during node stop > I got this hang during one of PDS (Indexing) runs (thread-dump is attached). > The following code hang: > {code:java} > checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut > .getUninterruptibly(); > {code} > It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is > stopped and {{cpBeginFut}} will be never completed. > ---- > Fixed > ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1 > Fixed *.testFailAfterStart > Reduce test time execution (scale factor for a long-running tests) -- This message was sent by Atlassian JIRA (v7.6.14#76016)