[ https://issues.apache.org/jira/browse/RATIS-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050702#comment-17050702 ]
Tsz-wo Sze commented on RATIS-815: ---------------------------------- Thanks [~ljain] for committing the patch. I just have found that there is a bug in flush(); filed RATIS-822. > Log entry corrupted with 0 checksum > ----------------------------------- > > Key: RATIS-815 > URL: https://issues.apache.org/jira/browse/RATIS-815 > Project: Ratis > Issue Type: Bug > Components: server > Affects Versions: 0.5.0 > Reporter: Attila Doroszlai > Assignee: Tsz-wo Sze > Priority: Blocker > Fix For: 0.6.0 > > Attachments: RATIS-815.temp.patch, dumps.tar.gz, logs.tar.gz, > r815_20200220.patch, r815_20200228.patch, r815_20200302.patch > > > After writing a few large keys (128MB) with very small chunks size (64KB) in > Ozone, Ratis reports log entry corruption due to checksum error: > {code} > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:396 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > Rolling segment log-62379_62465 to index:62465 > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:541 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > Rolled log segment from > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62379 > to > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_62379-62465 > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:583 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > created new log segment > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62466 > 2020-02-13 12:01:41 ERROR LogAppender:81 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236->ac5b3434-874b-4375-8a03-989e8c7fb692-GrpcLogAppender-AppenderDaemon > failed RaftLog > org.apache.ratis.server.raftlog.RaftLogIOException: > org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated > checksum is CDFED097 but read checksum is 00000000. > at > org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:297) > at > org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:213) > at > org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:179) > at > org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:122) > at > org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:77) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: > Calculated checksum is CDFED097 but read checksum is 00000000. > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:312) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:194) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:129) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98) > at > org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309) > ... 7 more > {code} > Steps to reproduce: > 1. Configure Ozone with 64KB chunk size and slightly higher buffer sizes: > {code} > ozone.scm.chunk.size: 64KB > ozone.client.stream.buffer.flush.size: 256KB > ozone.client.stream.buffer.max.size: 1MB > {code} > 2. Run Freon: > {code} > ozone freon ockg -n 1 -t 1 -p warmup > ozone freon ockg -p test -t 8 -s 134217728 -n 32 > {code} > Interestingly, even {{log_5106-5509}} has invalid entry (according to log > dump utility): > {code} > Processing Raft Log file: > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_5106-5509 > size:1030796 > ... > (t:1, i:5161), STATEMACHINELOGENTRY, client-296B6A48E40D, cid=3307 > Exception in thread "main" org.apache.ratis.protocol.ChecksumException: Log > entry corrupted: Calculated checksum is 926127AE but read checksum is > 00000000. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)