[jira] [Commented] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient
[ https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038386#comment-17038386 ] Hadoop QA commented on RATIS-816: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 5m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 42s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.tools.TestArithmeticLogDump | | | ratis.logservice.TestLogServiceWithNetty | | | ratis.logservice.TestLogServiceWithGrpc | | | ratis.grpc.TestRaftStateMachineExceptionWithGrpc | | | ratis.netty.TestRaftStateMachineExceptionWithNetty | | | ratis.server.simulation.TestRaftStateMachineExceptionWithSimulatedRpc | | | ratis.server.simulation.TestServerRestartWithSimulatedRpc | | | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc | | | ratis.netty.TestRaftSnapshotWithNetty | | | ratis.grpc.TestRaftAsyncWithGrpc | | | ratis.grpc.TestRaftSnapshotWithGrpc | | | ratis.server.simulation.TestRaftSnapshotWithSimulatedRpc | | | ratis.grpc.TestWatchRequestWithGrpc | | | ratis.server.simulation.TestRaftWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 Image:yetus/ratis:date2020-02-17 | | JIRA Issue | RATIS-816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12993678/RATIS-816.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 856995d4f072 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 46f255c | | maven | version: Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) | | Default Java | 1.8.0_242 | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/1240/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/1240/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/1240/testReport/ | | Max. process+thread count | 1370 (vs. ulimit
[jira] [Updated] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient
[ https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated RATIS-816: -- Attachment: RATIS-816.001.patch > Use peerId in error log / exception of GrpcServerProtocolClient > --- > > Key: RATIS-816 > URL: https://issues.apache.org/jira/browse/RATIS-816 > Project: Ratis > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Attachments: RATIS-816.001.patch > > > GrpcServerProtocolClient is used to send out requestVote and appendLogEntry > requests. > I propose to persist raftPeerId in the constructor and use it in the error / > exception message. > This is not just getting more meaningful message (it's a nice to have) but in > HDDS-3023 I am modifying the byte code to mock the leader->follower > communication. It's way more easier to do if the required raftPeerId is > available in the class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient
Marton Elek created RATIS-816: - Summary: Use peerId in error log / exception of GrpcServerProtocolClient Key: RATIS-816 URL: https://issues.apache.org/jira/browse/RATIS-816 Project: Ratis Issue Type: Improvement Reporter: Marton Elek GrpcServerProtocolClient is used to send out requestVote and appendLogEntry requests. I propose to persist raftPeerId in the constructor and use it in the error / exception message. This is not just getting more meaningful message (it's a nice to have) but in HDDS-3023 I am modifying the byte code to mock the leader->follower communication. It's way more easier to do if the required raftPeerId is available in the class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient
[ https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek reassigned RATIS-816: - Assignee: Marton Elek > Use peerId in error log / exception of GrpcServerProtocolClient > --- > > Key: RATIS-816 > URL: https://issues.apache.org/jira/browse/RATIS-816 > Project: Ratis > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > > GrpcServerProtocolClient is used to send out requestVote and appendLogEntry > requests. > I propose to persist raftPeerId in the constructor and use it in the error / > exception message. > This is not just getting more meaningful message (it's a nice to have) but in > HDDS-3023 I am modifying the byte code to mock the leader->follower > communication. It's way more easier to do if the required raftPeerId is > available in the class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-815) Log entry corrupted with 0 checksum
[ https://issues.apache.org/jira/browse/RATIS-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038174#comment-17038174 ] Attila Doroszlai commented on RATIS-815: Tested with RATIS-767 reverted: same problem happens. > Log entry corrupted with 0 checksum > --- > > Key: RATIS-815 > URL: https://issues.apache.org/jira/browse/RATIS-815 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Priority: Blocker > Attachments: dumps.tar.gz, logs.tar.gz > > > After writing a few large keys (128MB) with very small chunks size (64KB) in > Ozone, Ratis reports log entry corruption due to checksum error: > {code} > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:396 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > Rolling segment log-62379_62465 to index:62465 > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:541 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > Rolled log segment from > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62379 > to > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_62379-62465 > 2020-02-13 12:01:41 INFO SegmentedRaftLogWorker:583 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: > created new log segment > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62466 > 2020-02-13 12:01:41 ERROR LogAppender:81 - > e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236->ac5b3434-874b-4375-8a03-989e8c7fb692-GrpcLogAppender-AppenderDaemon > failed RaftLog > org.apache.ratis.server.raftlog.RaftLogIOException: > org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated > checksum is CDFED097 but read checksum is . > at > org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:297) > at > org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:213) > at > org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:179) > at > org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:122) > at > org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:77) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: > Calculated checksum is CDFED097 but read checksum is . > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:312) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:194) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:129) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98) > at > org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309) > ... 7 more > {code} > Steps to reproduce: > 1. Configure Ozone with 64KB chunk size and slightly higher buffer sizes: > {code} > ozone.scm.chunk.size: 64KB > ozone.client.stream.buffer.flush.size: 256KB > ozone.client.stream.buffer.max.size: 1MB > {code} > 2. Run Freon: > {code} > ozone freon ockg -n 1 -t 1 -p warmup > ozone freon ockg -p test -t 8 -s 134217728 -n 32 > {code} > Interestingly, even {{log_5106-5509}} has invalid entry (according to log > dump utility): > {code} > Processing Raft Log file: > /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_5106-5509 > size:1030796 > ... > (t:1, i:5161), STATEMACHINELOGENTRY, client-296B6A48E40D, cid=3307 > Exception in thread "main" org.apache.ratis.protocol.ChecksumException: Log > entry corrupted: Calculated checksum is 926127AE but read checksum is > . > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)