[jira] [Updated] (RATIS-603) Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on errors

2019-09-13 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated RATIS-603:

Attachment: RATIS-603.005.patch

> Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on 
> errors
> ---
>
> Key: RATIS-603
> URL: https://issues.apache.org/jira/browse/RATIS-603
> Project: Ratis
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-603.001.patch, RATIS-603.002.patch, 
> RATIS-603.003.patch, RATIS-603.004.patch, RATIS-603.005.patch
>
>
> This jira proposes to add a SmLogEntryProto to toString converter so that 
> logEntry information can be printed on errors/exceptions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-603) Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on errors

2019-09-13 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929691#comment-16929691
 ] 

Mukul Kumar Singh commented on RATIS-603:
-

Thanks for the review [~szetszwo]. Patch v5 addresses the review comments.

> Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on 
> errors
> ---
>
> Key: RATIS-603
> URL: https://issues.apache.org/jira/browse/RATIS-603
> Project: Ratis
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-603.001.patch, RATIS-603.002.patch, 
> RATIS-603.003.patch, RATIS-603.004.patch, RATIS-603.005.patch
>
>
> This jira proposes to add a SmLogEntryProto to toString converter so that 
> logEntry information can be printed on errors/exceptions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-569) StatusRuntimeException because Ratis clients do not shutdown the observer cleanly

2019-09-13 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated RATIS-569:

Summary: StatusRuntimeException because Ratis clients do not shutdown the 
observer cleanly  (was: StatusRuntimeException on the datanode because clients 
do not shutdown the observer cleanly.)

> StatusRuntimeException because Ratis clients do not shutdown the observer 
> cleanly
> -
>
> Key: RATIS-569
> URL: https://issues.apache.org/jira/browse/RATIS-569
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Blocker
>  Labels: ozone
> Attachments: RATIS-569.01.patch, RATIS-569.02.patch, 
> RATIS-569.03.patch
>
>
> Running TestDataValidate in Ozone leads to StatusRuntimeException on the 
> datanode frequently.
> This causes an unclean shutdown on the stream on the datanode.
> In GrpcClientProtocolClient, shutdownNow should be followed by a 
> awaitTermination to wait for a clean shutdown.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-678) Notify Leader does not provide raft group id

2019-09-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929609#comment-16929609
 ] 

Hadoop QA commented on RATIS-678:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 39s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.netty.TestGroupManagementWithNetty |
|   | ratis.netty.TestRaftSnapshotWithNetty |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-09-13 |
| JIRA Issue | RATIS-678 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980310/RATIS-678.01.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux d49564fedb17 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 67ba9cd |
| maven | version: Apache Maven 3.6.0 
(97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/970/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/970/testReport/ |
| Max. process+thread count | 1713 (vs. ulimit of 5000) |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/970/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Notify Leader does not provide raft group id
> 
>
> Key: RATIS-678
> URL: https://issues.apache.org/jira/browse/RATIS-678
> Project: Ratis
>  Issue Type: Improvement
>  Components: raft-group
>Affects Versions: 0.4.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-678.01.patch
>
>
> org.apache.ratis.statemachine.StateMachine#notifyLeader
> does not provide the group id for which leader election is complete.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-678) Notify Leader does not provide raft group id

2019-09-13 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated RATIS-678:
--
Attachment: RATIS-678.01.patch

> Notify Leader does not provide raft group id
> 
>
> Key: RATIS-678
> URL: https://issues.apache.org/jira/browse/RATIS-678
> Project: Ratis
>  Issue Type: Improvement
>  Components: raft-group
>Affects Versions: 0.4.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-678.01.patch
>
>
> org.apache.ratis.statemachine.StateMachine#notifyLeader
> does not provide the group id for which leader election is complete.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-677) Logentry marked corrupt due to ChecksumException

2019-09-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929571#comment-16929571
 ] 

Hadoop QA commented on RATIS-677:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} root: The patch generated 2 new + 8 unchanged - 
0 fixed = 10 total (was 8) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m  4s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.grpc.TestWatchRequestWithGrpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-09-13 |
| JIRA Issue | RATIS-677 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980306/r677_20190913.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 6bdb21e7bbaf 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 67ba9cd |
| maven | version: Apache Maven 3.6.0 
(97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) |
| Default Java | 1.8.0_222 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/969/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/969/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/969/testReport/ |
| Max. process+thread count | 2339 (vs. ulimit of 5000) |
| modules | C: ratis-common ratis-server ratis-test U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/969/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Logentry marked corrupt due to ChecksumException
> 
>
> Key: RATIS-677
> URL: https://issues.apache.org/jira/browse/RATIS-677
> Project: Ratis
>  Issue Type: Bug
>  Components: 

[jira] [Commented] (RATIS-677) Logentry marked corrupt due to ChecksumException

2019-09-13 Thread Tsz Wo Nicholas Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929563#comment-16929563
 ] 

Tsz Wo Nicholas Sze commented on RATIS-677:
---

r677_20190913.patch: adds a conf so that it could change the way to handle log 
corruption.

Will add some tests once we have agreed on the approach.

> Logentry marked corrupt due to ChecksumException
> 
>
> Key: RATIS-677
> URL: https://issues.apache.org/jira/browse/RATIS-677
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Sammi Chen
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
> Attachments: r677_20190913.patch
>
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-677) Logentry marked corrupt due to ChecksumException

2019-09-13 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-677:
--
Attachment: r677_20190913.patch

> Logentry marked corrupt due to ChecksumException
> 
>
> Key: RATIS-677
> URL: https://issues.apache.org/jira/browse/RATIS-677
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Sammi Chen
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
> Attachments: r677_20190913.patch
>
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (RATIS-678) Notify Leader does not provide raft group id

2019-09-13 Thread Siddharth Wagle (Jira)
Siddharth Wagle created RATIS-678:
-

 Summary: Notify Leader does not provide raft group id
 Key: RATIS-678
 URL: https://issues.apache.org/jira/browse/RATIS-678
 Project: Ratis
  Issue Type: Improvement
  Components: raft-group
Affects Versions: 0.4.0
Reporter: Siddharth Wagle
Assignee: Siddharth Wagle
 Fix For: 0.5.0


org.apache.ratis.statemachine.StateMachine#notifyLeader

does not provide the group id for which leader election is complete.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-09-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929519#comment-16929519
 ] 

Hadoop QA commented on RATIS-647:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  3m 
50s{color} | {color:red} Docker failed to build yetus/ratis:date2019-09-13. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | RATIS-647 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980302/RATIS-647-001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/968/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-647-000.patch, RATIS-647-001.patch
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (RATIS-652) Add metrics related to snapshot and log purge

2019-09-13 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan reassigned RATIS-652:
---

Assignee: Aravindan Vijayan

> Add metrics related to snapshot and log purge
> -
>
> Key: RATIS-652
> URL: https://issues.apache.org/jira/browse/RATIS-652
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
>
> Following metrics would be good to determine overall snapshot and log purge 
> behaviour of a ratis pipeline:
>  
> |takeSnapshotLatency|Time taken to take a ratis snapshot.|
> |numSnapshots|Number of snapshots taken |
> |purgeLogRecordLatency|Time taken to purge logRecords.|
> |numPurgeLogCalls|Number of Purge log calls|
> |numInstallSnnapshotOps|Number of install snapshot calls|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-09-13 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated RATIS-647:

Attachment: RATIS-647-001.patch

> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-647-000.patch, RATIS-647-001.patch
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-648) Add metrics related to GrpcLogAppendRequests

2019-09-13 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929509#comment-16929509
 ] 

Siddharth Wagle commented on RATIS-648:
---

Hi [~shashikant]/[~msingh]/[~avijayan], preliminary patch, if changes look ok, 
will proceed to write UTs.

> Add metrics related to GrpcLogAppendRequests 
> -
>
> Key: RATIS-648
> URL: https://issues.apache.org/jira/browse/RATIS-648
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: RATIS-648.00.patch
>
>
> Following metrics would be useful related to GrpcLogAppends for performance 
> and health monitoring and tuning:
> |GrpcLogAppenderLatency|Time taken to append a log entry to each follower and 
> get acknowledgement|
> |logAppendRetryCount|Total no of retried logAppends requests|
> |logAppendRequestCount|Total no of logAppendRequest|
> |appendEntryProcessingLatency|Time required to process an append entry 
> request on each follower|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-648) Add metrics related to GrpcLogAppendRequests

2019-09-13 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated RATIS-648:
--
Attachment: RATIS-648.00.patch

> Add metrics related to GrpcLogAppendRequests 
> -
>
> Key: RATIS-648
> URL: https://issues.apache.org/jira/browse/RATIS-648
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: RATIS-648.00.patch
>
>
> Following metrics would be useful related to GrpcLogAppends for performance 
> and health monitoring and tuning:
> |GrpcLogAppenderLatency|Time taken to append a log entry to each follower and 
> get acknowledgement|
> |logAppendRetryCount|Total no of retried logAppends requests|
> |logAppendRequestCount|Total no of logAppendRequest|
> |appendEntryProcessingLatency|Time required to process an append entry 
> request on each follower|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-677) Logentry marked corrupt due to ChecksumException

2019-09-13 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-677:
--
Component/s: server

> Logentry marked corrupt due to ChecksumException
> 
>
> Key: RATIS-677
> URL: https://issues.apache.org/jira/browse/RATIS-677
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Sammi Chen
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-677) Logentry marked corrupt due to ChecksumException

2019-09-13 Thread Tsz Wo Nicholas Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929407#comment-16929407
 ] 

Tsz Wo Nicholas Sze commented on RATIS-677:
---

What is the expectation?  Should the server ignore the corrupted log and 
continue startup?

> Logentry marked corrupt due to ChecksumException
> 
>
> Key: RATIS-677
> URL: https://issues.apache.org/jira/browse/RATIS-677
> Project: Ratis
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-09-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929117#comment-16929117
 ] 

Shashikant Banerjee commented on RATIS-647:
---

[~avijayan], let's follow the hadoop convention of using cameCase for all.

> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-647-000.patch
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929101#comment-16929101
 ] 

Shashikant Banerjee commented on RATIS-670:
---

Thanks [~sdeka] for working on this. The patch overall looks good to me. Few 
comments inline:

 

1. Can we change the naming here? it looks confusing.
{code:java}
public static final String RATIS_APPLIED_INDEX_GAUGE =
"ratis_applied_index";
public static final String STATEMACHINE_APPLIED_INDEX_GAUGE =
"statemachine_applied_index";
{code}
2. Can we add a new test altogether just verifying the metric value instead of 
modifying existing "testRequestTiemout" test?

 

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch, RATIS-670.004.patch, 
> RATIS-670.005.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)