[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read
[ https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094492#comment-17094492 ] Nikolay Izhikov commented on IGNITE-11687: -- I'm excluding this ticket from the 2.8.1 due to inactivity. > Concurrent WAL replay & log may fail with CRC error on read > --- > > Key: IGNITE-11687 > URL: https://issues.apache.org/jira/browse/IGNITE-11687 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Alexey Goncharuk >Assignee: Dmitriy Govorukhin >Priority: Critical > Fix For: 2.9, 2.8.1 > > Time Spent: 10m > Remaining Estimate: 0h > > The cause is the way {{end}} is calculated for WAL iterator: > {code} > if (hnd != null) > end = hnd.position(); > {code} > {code} > @Override public FileWALPointer position() { > lock.lock(); > try { > return new FileWALPointer(getSegmentId(), (int)written, 0); > } > finally { > lock.unlock(); > } > } > {code} > Consider a partially written entry. In this case, {{written}} has been > already updated, concurrent WAL replay will attempt to read the incompletely > written record and since {{end}} is not null, iterator will fail with CRC > error. > The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read
[ https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818481#comment-16818481 ] Andrey Gura commented on IGNITE-11687: -- [~agoncharuk] I've investigated the problem deeper. While code snippet pointed by you is incorrect and must be fixed it never executes by test because MMAP mode is switched on by default. I think that {{FileWriteHandleImpl#addRecord()}} method is root of the problem. See the following code snippet: {code:java} fillBuffer(buf, rec); if (mmap) { // written field must grow only, but segment with greater position can be serialized // earlier than segment with smaller position. while (true) { long written0 = written; if (seg.position() > written0) { if (WRITTEN_UPD.compareAndSet(this, written0, seg.position())) break; } else break; } } return ptr; {code} WAL iterator on {{wal.replay()}} call gets {{hnd.written}} field value while some previous WAL record before this position is still not fully serialized. What do you think? > Concurrent WAL replay & log may fail with CRC error on read > --- > > Key: IGNITE-11687 > URL: https://issues.apache.org/jira/browse/IGNITE-11687 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > The cause is the way {{end}} is calculated for WAL iterator: > {code} > if (hnd != null) > end = hnd.position(); > {code} > {code} > @Override public FileWALPointer position() { > lock.lock(); > try { > return new FileWALPointer(getSegmentId(), (int)written, 0); > } > finally { > lock.unlock(); > } > } > {code} > Consider a partially written entry. In this case, {{written}} has been > already updated, concurrent WAL replay will attempt to read the incompletely > written record and since {{end}} is not null, iterator will fail with CRC > error. > The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read
[ https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811031#comment-16811031 ] Andrey Gura commented on IGNITE-11687: -- [~agoncharuk] You are right. It could be fixed easy. {{FileHandleManagerImpl.WALWriter#writeBuffer}} method has only one usage so it can just return delta that must be added to the {{hnd.written}.} > Concurrent WAL replay & log may fail with CRC error on read > --- > > Key: IGNITE-11687 > URL: https://issues.apache.org/jira/browse/IGNITE-11687 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.8 > > > The cause is the way {{end}} is calculated for WAL iterator: > {code} > if (hnd != null) > end = hnd.position(); > {code} > {code} > @Override public FileWALPointer position() { > lock.lock(); > try { > return new FileWALPointer(getSegmentId(), (int)written, 0); > } > finally { > lock.unlock(); > } > } > {code} > Consider a partially written entry. In this case, {{written}} has been > already updated, concurrent WAL replay will attempt to read the incompletely > written record and since {{end}} is not null, iterator will fail with CRC > error. > The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read
[ https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810748#comment-16810748 ] Alexey Goncharuk commented on IGNITE-11687: --- I believe this was broken long ago when {{SegmentedRingByteBuffer}} was introduced. In WAL manager we have the following code: {code} hdl.written += hdl.fileIO.writeFully(buf); {code} which appears to write a fully serialized batch of records, however, this may be not the case when ring byte buffer returns a list of buffers to write. At the first glance, it should be enough to update {{written}} field after all buffers polled from the ring are written in {{WALWriter}}. [~agura], what do you think? > Concurrent WAL replay & log may fail with CRC error on read > --- > > Key: IGNITE-11687 > URL: https://issues.apache.org/jira/browse/IGNITE-11687 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Priority: Major > > The cause is the way {{end}} is calculated for WAL iterator: > {code} > if (hnd != null) > end = hnd.position(); > {code} > {code} > @Override public FileWALPointer position() { > lock.lock(); > try { > return new FileWALPointer(getSegmentId(), (int)written, 0); > } > finally { > lock.unlock(); > } > } > {code} > Consider a partially written entry. In this case, {{written}} has been > already updated, concurrent WAL replay will attempt to read the incompletely > written record and since {{end}} is not null, iterator will fail with CRC > error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)