[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read

2020-04-28 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094492#comment-17094492
 ] 

Nikolay Izhikov commented on IGNITE-11687:
--

I'm excluding this ticket from the 2.8.1 due to inactivity.

> Concurrent WAL replay & log may fail with CRC error on read
> ---
>
> Key: IGNITE-11687
> URL: https://issues.apache.org/jira/browse/IGNITE-11687
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Alexey Goncharuk
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.9, 2.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cause is the way {{end}} is calculated for WAL iterator:
> {code}
> if (hnd != null)
> end = hnd.position();
> {code}
> {code}
> @Override public FileWALPointer position() {
> lock.lock();
> try {
> return new FileWALPointer(getSegmentId(), (int)written, 0);
> }
> finally {
> lock.unlock();
> }
> }
> {code}
> Consider a partially written entry. In this case, {{written}} has been 
> already updated, concurrent WAL replay will attempt to read the incompletely 
> written record and since {{end}} is not null, iterator will fail with CRC 
> error.
> The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read

2019-04-15 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818481#comment-16818481
 ] 

Andrey Gura commented on IGNITE-11687:
--

[~agoncharuk] I've investigated the problem deeper. While code snippet pointed 
by you is incorrect and must be fixed it never executes by test because MMAP 
mode is switched on by default. I think that 
{{FileWriteHandleImpl#addRecord()}} method is root of the problem. See the 
following code snippet:

{code:java}
fillBuffer(buf, rec);

if (mmap) {
// written field must grow only, but segment with 
greater position can be serialized
// earlier than segment with smaller position.
while (true) {
long written0 = written;

if (seg.position() > written0) {
if (WRITTEN_UPD.compareAndSet(this, written0, 
seg.position()))
break;
}
else
break;
}
}

return ptr;
{code}

WAL iterator on {{wal.replay()}} call gets {{hnd.written}} field value while 
some previous WAL record before this position is still not fully serialized. 
What do you think?

> Concurrent WAL replay & log may fail with CRC error on read
> ---
>
> Key: IGNITE-11687
> URL: https://issues.apache.org/jira/browse/IGNITE-11687
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Andrey Gura
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cause is the way {{end}} is calculated for WAL iterator:
> {code}
> if (hnd != null)
> end = hnd.position();
> {code}
> {code}
> @Override public FileWALPointer position() {
> lock.lock();
> try {
> return new FileWALPointer(getSegmentId(), (int)written, 0);
> }
> finally {
> lock.unlock();
> }
> }
> {code}
> Consider a partially written entry. In this case, {{written}} has been 
> already updated, concurrent WAL replay will attempt to read the incompletely 
> written record and since {{end}} is not null, iterator will fail with CRC 
> error.
> The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read

2019-04-05 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811031#comment-16811031
 ] 

Andrey Gura commented on IGNITE-11687:
--

[~agoncharuk] You are right. It could be fixed easy. 
{{FileHandleManagerImpl.WALWriter#writeBuffer}} method has only one usage so it 
can just return delta that must be added to the {{hnd.written}.}

> Concurrent WAL replay & log may fail with CRC error on read
> ---
>
> Key: IGNITE-11687
> URL: https://issues.apache.org/jira/browse/IGNITE-11687
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Andrey Gura
>Priority: Critical
> Fix For: 2.8
>
>
> The cause is the way {{end}} is calculated for WAL iterator:
> {code}
> if (hnd != null)
> end = hnd.position();
> {code}
> {code}
> @Override public FileWALPointer position() {
> lock.lock();
> try {
> return new FileWALPointer(getSegmentId(), (int)written, 0);
> }
> finally {
> lock.unlock();
> }
> }
> {code}
> Consider a partially written entry. In this case, {{written}} has been 
> already updated, concurrent WAL replay will attempt to read the incompletely 
> written record and since {{end}} is not null, iterator will fail with CRC 
> error.
> The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read

2019-04-05 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810748#comment-16810748
 ] 

Alexey Goncharuk commented on IGNITE-11687:
---

I believe this was broken long ago when {{SegmentedRingByteBuffer}} was 
introduced. In WAL manager we have the following code:
{code}
hdl.written += hdl.fileIO.writeFully(buf);
{code}
which appears to write a fully serialized batch of records, however, this may 
be not the case when ring byte buffer returns a list of buffers to write.

At the first glance, it should be enough to update {{written}} field after all 
buffers polled from the ring are written in {{WALWriter}}. [~agura], what do 
you think?

> Concurrent WAL replay & log may fail with CRC error on read
> ---
>
> Key: IGNITE-11687
> URL: https://issues.apache.org/jira/browse/IGNITE-11687
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Priority: Major
>
> The cause is the way {{end}} is calculated for WAL iterator:
> {code}
> if (hnd != null)
> end = hnd.position();
> {code}
> {code}
> @Override public FileWALPointer position() {
> lock.lock();
> try {
> return new FileWALPointer(getSegmentId(), (int)written, 0);
> }
> finally {
> lock.unlock();
> }
> }
> {code}
> Consider a partially written entry. In this case, {{written}} has been 
> already updated, concurrent WAL replay will attempt to read the incompletely 
> written record and since {{end}} is not null, iterator will fail with CRC 
> error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)