[jira] [Commented] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038386#comment-17038386
 ] 

Hadoop QA commented on RATIS-816:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  5m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 
0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 42s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.tools.TestArithmeticLogDump |
|   | ratis.logservice.TestLogServiceWithNetty |
|   | ratis.logservice.TestLogServiceWithGrpc |
|   | ratis.grpc.TestRaftStateMachineExceptionWithGrpc |
|   | ratis.netty.TestRaftStateMachineExceptionWithNetty |
|   | ratis.server.simulation.TestRaftStateMachineExceptionWithSimulatedRpc |
|   | ratis.server.simulation.TestServerRestartWithSimulatedRpc |
|   | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc |
|   | ratis.netty.TestRaftSnapshotWithNetty |
|   | ratis.grpc.TestRaftAsyncWithGrpc |
|   | ratis.grpc.TestRaftSnapshotWithGrpc |
|   | ratis.server.simulation.TestRaftSnapshotWithSimulatedRpc |
|   | ratis.grpc.TestWatchRequestWithGrpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/ratis:date2020-02-17 |
| JIRA Issue | RATIS-816 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993678/RATIS-816.001.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 856995d4f072 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 46f255c |
| maven | version: Apache Maven 3.6.3 
(cecedd343002696d0abb50b32b541b8a6ba2883f) |
| Default Java | 1.8.0_242 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1240/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1240/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1240/testReport/ |
| Max. process+thread count | 1370 (vs. ulimit 

[jira] [Updated] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient

2020-02-17 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated RATIS-816:
--
Attachment: RATIS-816.001.patch

> Use peerId in error log / exception of GrpcServerProtocolClient
> ---
>
> Key: RATIS-816
> URL: https://issues.apache.org/jira/browse/RATIS-816
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
> Attachments: RATIS-816.001.patch
>
>
> GrpcServerProtocolClient is used to send out requestVote and appendLogEntry 
> requests.
> I propose to persist raftPeerId in the constructor and use it in the error / 
> exception message.
> This is not just getting more meaningful message (it's a nice to have) but in 
> HDDS-3023 I am modifying the byte code to mock the leader->follower 
> communication. It's way more easier to do if the required raftPeerId is 
> available in the class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient

2020-02-17 Thread Marton Elek (Jira)
Marton Elek created RATIS-816:
-

 Summary: Use peerId in error log / exception of 
GrpcServerProtocolClient
 Key: RATIS-816
 URL: https://issues.apache.org/jira/browse/RATIS-816
 Project: Ratis
  Issue Type: Improvement
Reporter: Marton Elek


GrpcServerProtocolClient is used to send out requestVote and appendLogEntry 
requests.

I propose to persist raftPeerId in the constructor and use it in the error / 
exception message.

This is not just getting more meaningful message (it's a nice to have) but in 
HDDS-3023 I am modifying the byte code to mock the leader->follower 
communication. It's way more easier to do if the required raftPeerId is 
available in the class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient

2020-02-17 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek reassigned RATIS-816:
-

Assignee: Marton Elek

> Use peerId in error log / exception of GrpcServerProtocolClient
> ---
>
> Key: RATIS-816
> URL: https://issues.apache.org/jira/browse/RATIS-816
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> GrpcServerProtocolClient is used to send out requestVote and appendLogEntry 
> requests.
> I propose to persist raftPeerId in the constructor and use it in the error / 
> exception message.
> This is not just getting more meaningful message (it's a nice to have) but in 
> HDDS-3023 I am modifying the byte code to mock the leader->follower 
> communication. It's way more easier to do if the required raftPeerId is 
> available in the class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-815) Log entry corrupted with 0 checksum

2020-02-17 Thread Attila Doroszlai (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038174#comment-17038174
 ] 

Attila Doroszlai commented on RATIS-815:


Tested with RATIS-767 reverted: same problem happens.

> Log entry corrupted with 0 checksum
> ---
>
> Key: RATIS-815
> URL: https://issues.apache.org/jira/browse/RATIS-815
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Priority: Blocker
> Attachments: dumps.tar.gz, logs.tar.gz
>
>
> After writing a few large keys (128MB) with very small chunks size (64KB) in 
> Ozone, Ratis reports log entry corruption due to checksum error:
> {code}
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:396 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  Rolling segment log-62379_62465 to index:62465
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:541 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  Rolled log segment from 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62379
>  to 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_62379-62465
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:583 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  created new log segment 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62466
> 2020-02-13 12:01:41 ERROR LogAppender:81 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236->ac5b3434-874b-4375-8a03-989e8c7fb692-GrpcLogAppender-AppenderDaemon
>  failed RaftLog
> org.apache.ratis.server.raftlog.RaftLogIOException: 
> org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated 
> checksum is CDFED097 but read checksum is .
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:297)
>   at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:213)
>   at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:179)
>   at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:122)
>   at 
> org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:77)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: 
> Calculated checksum is CDFED097 but read checksum is .
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:312)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:194)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:129)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309)
>   ... 7 more
> {code}
> Steps to reproduce:
> 1. Configure Ozone with 64KB chunk size and slightly higher buffer sizes:
> {code}
> ozone.scm.chunk.size: 64KB
> ozone.client.stream.buffer.flush.size: 256KB
> ozone.client.stream.buffer.max.size: 1MB
> {code}
> 2. Run Freon:
> {code}
> ozone freon ockg -n 1 -t 1 -p warmup
> ozone freon ockg -p test -t 8 -s 134217728 -n 32
> {code}
> Interestingly, even {{log_5106-5509}} has invalid entry (according to log 
> dump utility):
> {code}
> Processing Raft Log file: 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_5106-5509
>  size:1030796
> ...
> (t:1, i:5161), STATEMACHINELOGENTRY, client-296B6A48E40D, cid=3307
> Exception in thread "main" org.apache.ratis.protocol.ChecksumException: Log 
> entry corrupted: Calculated checksum is 926127AE but read checksum is 
> .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)