[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671135#comment-16671135 ] Hadoop QA commented on RATIS-381: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 48s{color} | {color:red} root generated 13 new + 94 unchanged - 0 fixed = 107 total (was 94) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} root: The patch generated 28 new + 137 unchanged - 8 fixed = 165 total (was 145) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 28s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.server.simulation.TestRetryCacheWithSimulatedRpc | | | ratis.server.simulation.TestRaftWithSimulatedRpc | | | ratis.server.simulation.TestServerInformationWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-11-01 | | JIRA Issue | RATIS-381 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946472/r381_20181101.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 127e6f8375db 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 1d2ebee | | Default Java | 1.8.0_181 | | javac | https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/diff-compile-javac-root.txt | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/479/testReport/ | | modules | C: ratis-common ratis-server ratis-grpc U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/479/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > RaftTestUtil.waitForLeader should not return null > - > >
[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671084#comment-16671084 ] Hadoop QA commented on RATIS-381: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 5s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 49s{color} | {color:red} root generated 13 new + 94 unchanged - 0 fixed = 107 total (was 94) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} root: The patch generated 28 new + 137 unchanged - 8 fixed = 165 total (was 145) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 10m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-11-01 | | JIRA Issue | RATIS-381 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946472/r381_20181101.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 0068a2cf0e20 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 1d2ebee | | Default Java | 1.8.0_181 | | javac | https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/diff-compile-javac-root.txt | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/478/testReport/ | | modules | C: ratis-common ratis-server ratis-grpc U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/478/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira/browse/RATIS-381 > Project: Ratis > Issue Type: Improvement > Components: test >Reporter: Tsz Wo Nicholas Sze >Assig
[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-381: -- Attachment: r381_20181101.patch > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira/browse/RATIS-381 > Project: Ratis > Issue Type: Improvement > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: r381_20181101.patch > > > Some tests may fail with NullPointerException since > RaftTestUtil.waitForLeader(..) may return null (and the tests do not check > for null) when leader elections take a long time. > It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, > if there is a no leader, throw an exception with some descriptive error > message, instead of returning null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-381: -- Attachment: (was: r381_20181031.patch) > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira/browse/RATIS-381 > Project: Ratis > Issue Type: Improvement > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: r381_20181101.patch > > > Some tests may fail with NullPointerException since > RaftTestUtil.waitForLeader(..) may return null (and the tests do not check > for null) when leader elections take a long time. > It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, > if there is a no leader, throw an exception with some descriptive error > message, instead of returning null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (RATIS-372) Basic test harness for LogService
[ https://issues.apache.org/jira/browse/RATIS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser reassigned RATIS-372: Assignee: Josh Elser (was: Rajeshbabu Chintaguntla) > Basic test harness for LogService > - > > Key: RATIS-372 > URL: https://issues.apache.org/jira/browse/RATIS-372 > Project: Ratis > Issue Type: Task > Components: LogService >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > > We should have something that can stand up a logservice in a > pseudo-distributed manner. > Docker is all the rage right now, and would make it easy to deploy onto > something like k8s in the future. > Using [docker-compose|https://docs.docker.com/compose/] would provide us a > nice way to have one docker image for the metadata service daemons, another > for logservice daemons (if needed), and then create a network that connects > them all together. The final docker-compose yaml would be something like: > It would be nice to provide a "client" container in which we show a basic > create/write/read/delete example to give folks a starting point. > * 1 network > * 3 instances of metadata service statemachines > * 3 instances of log service statemachines > * 1 image with the example client. > [~chrajeshbab...@gmail.com], make sense to you? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-372) Basic test harness for LogService
[ https://issues.apache.org/jira/browse/RATIS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670845#comment-16670845 ] Josh Elser commented on RATIS-372: -- Talked to Rajesh offline. He had to take some unplanned leave. Stealing ownership for now. > Basic test harness for LogService > - > > Key: RATIS-372 > URL: https://issues.apache.org/jira/browse/RATIS-372 > Project: Ratis > Issue Type: Task > Components: LogService >Reporter: Josh Elser >Assignee: Rajeshbabu Chintaguntla >Priority: Major > > We should have something that can stand up a logservice in a > pseudo-distributed manner. > Docker is all the rage right now, and would make it easy to deploy onto > something like k8s in the future. > Using [docker-compose|https://docs.docker.com/compose/] would provide us a > nice way to have one docker image for the metadata service daemons, another > for logservice daemons (if needed), and then create a network that connects > them all together. The final docker-compose yaml would be something like: > It would be nice to provide a "client" container in which we show a basic > create/write/read/delete example to give folks a starting point. > * 1 network > * 3 instances of metadata service statemachines > * 3 instances of log service statemachines > * 1 image with the example client. > [~chrajeshbab...@gmail.com], make sense to you? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (RATIS-385) Create README for logservice
Josh Elser created RATIS-385: Summary: Create README for logservice Key: RATIS-385 URL: https://issues.apache.org/jira/browse/RATIS-385 Project: Ratis Issue Type: Sub-task Reporter: Josh Elser We should have a nice README at https://github.com/apache/incubator-ratis/tree/master/ratis-logservice to help guide people to the project and encourage contributions/involvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (RATIS-384) writeStateMachineData times out
Arpit Agarwal created RATIS-384: --- Summary: writeStateMachineData times out Key: RATIS-384 URL: https://issues.apache.org/jira/browse/RATIS-384 Project: Ratis Issue Type: Bug Affects Versions: 0.3.0 Reporter: Nilotpal Nandi Fix For: 0.3.0 datanode stopped due to following error : datanode.log {noformat} 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: Terminating with exit status 1: 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, i:182), STATEMACHINELOGENTRY, client-611073BBFA46, cid=127-writeStateMachineData at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-271) Ratis-backed distributed log: "LogService"
[ https://issues.apache.org/jira/browse/RATIS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670801#comment-16670801 ] Josh Elser commented on RATIS-271: -- Attached a doc that [~sergey.soldatov] had put together a while back that helps describe the Metadata Service for tracking the Logs. Thought that would be helpful to folks who come along later. > Ratis-backed distributed log: "LogService" > --- > > Key: RATIS-271 > URL: https://issues.apache.org/jira/browse/RATIS-271 > Project: Ratis > Issue Type: New Feature > Components: LogService >Reporter: Josh Elser >Priority: Major > Attachments: LogService Metadata Service.pdf > > > Umbrella issue for building a distributed log using Ratis: > Doc: > [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit] > Discuss: > https://lists.apache.org/thread.html/f80dc3900f6d9f4ee4d9f9e0898cee9a232e3b1ca9a4d9a53fea1d71@%3Cdev.ratis.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-271) Ratis-backed distributed log: "LogService"
[ https://issues.apache.org/jira/browse/RATIS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated RATIS-271: - Attachment: LogService Metadata Service.pdf > Ratis-backed distributed log: "LogService" > --- > > Key: RATIS-271 > URL: https://issues.apache.org/jira/browse/RATIS-271 > Project: Ratis > Issue Type: New Feature > Components: LogService >Reporter: Josh Elser >Priority: Major > Attachments: LogService Metadata Service.pdf > > > Umbrella issue for building a distributed log using Ratis: > Doc: > [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit] > Discuss: > https://lists.apache.org/thread.html/f80dc3900f6d9f4ee4d9f9e0898cee9a232e3b1ca9a4d9a53fea1d71@%3Cdev.ratis.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-329) Current Ratis heartbeats are missing for a heavily loaded cluster
[ https://issues.apache.org/jira/browse/RATIS-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated RATIS-329: Target Version/s: (was: 0.3.0) > Current Ratis heartbeats are missing for a heavily loaded cluster > - > > Key: RATIS-329 > URL: https://issues.apache.org/jira/browse/RATIS-329 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > > Currently while running Ratis with Ozone, Frequent leader elections can be > noticed in the datanode logs. This is happening because of missing heartbeats > from the leader to follower. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated RATIS-382: Fix Version/s: 0.3.0 > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Fix For: 0.3.0 > > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty
[ https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670619#comment-16670619 ] ASF GitHub Bot commented on RATIS-383: -- xiaoyuyao opened a new pull request #1: RATIS-383. Shade native library tcnative for grpc/netty in Ratis-Thir… URL: https://github.com/apache/incubator-ratis-thirdparty/pull/1 …dparty. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Shade native library tcnative for grpc/netty in Ratis-Thirdparty > > > Key: RATIS-383 > URL: https://issues.apache.org/jira/browse/RATIS-383 > Project: Ratis > Issue Type: Bug > Components: security >Reporter: Mukul Kumar Singh >Assignee: Xiaoyu Yao >Priority: Major > Labels: ozone > Attachments: RATIS-383.001.patch > > > RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is > needed as GRPC/netty has dependency on tcnative jar/libraries that need to be > shaded in Ratis-Thirdparty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670568#comment-16670568 ] Shashikant Banerjee commented on RATIS-382: --- Looking further at the nodes, the tmp chunk files do actually exist and are completely written: {code:java} -rw-r--r-- 1 root root 16M Oct 31 07:30 /tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15.tmp -rw-r--r-- 1 root root 16M Oct 31 07:30 /tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16.tmp{code} > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty
[ https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated RATIS-383: - Attachment: RATIS-383.001.patch > Shade native library tcnative for grpc/netty in Ratis-Thirdparty > > > Key: RATIS-383 > URL: https://issues.apache.org/jira/browse/RATIS-383 > Project: Ratis > Issue Type: Bug > Components: security >Reporter: Mukul Kumar Singh >Assignee: Xiaoyu Yao >Priority: Major > Labels: ozone > Attachments: RATIS-383.001.patch > > > RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is > needed as GRPC/netty has dependency on tcnative jar/libraries that need to be > shaded in Ratis-Thirdparty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty
[ https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated RATIS-383: - Description: RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is needed as GRPC/netty has dependency on tcnative jar/libraries that need to be shaded in Ratis-Thirdparty. (was: HDDS-115 discusses making GRPC endpoint secure with mTLS. This jira will track the work needed in Ratis to make grpc communication secure.) > Shade native library tcnative for grpc/netty in Ratis-Thirdparty > > > Key: RATIS-383 > URL: https://issues.apache.org/jira/browse/RATIS-383 > Project: Ratis > Issue Type: Bug > Components: security >Reporter: Mukul Kumar Singh >Assignee: Xiaoyu Yao >Priority: Major > Labels: ozone > > RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is > needed as GRPC/netty has dependency on tcnative jar/libraries that need to be > shaded in Ratis-Thirdparty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty
Xiaoyu Yao created RATIS-383: Summary: Shade native library tcnative for grpc/netty in Ratis-Thirdparty Key: RATIS-383 URL: https://issues.apache.org/jira/browse/RATIS-383 Project: Ratis Issue Type: Bug Components: security Reporter: Mukul Kumar Singh Assignee: Xiaoyu Yao HDDS-115 discusses making GRPC endpoint secure with mTLS. This jira will track the work needed in Ratis to make grpc communication secure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (RATIS-246) Support secure gRPC endpoint with mTLS in Ratis
[ https://issues.apache.org/jira/browse/RATIS-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned RATIS-246: Assignee: Xiaoyu Yao > Support secure gRPC endpoint with mTLS in Ratis > --- > > Key: RATIS-246 > URL: https://issues.apache.org/jira/browse/RATIS-246 > Project: Ratis > Issue Type: Bug > Components: security >Reporter: Mukul Kumar Singh >Assignee: Xiaoyu Yao >Priority: Major > Labels: ozone > > HDDS-115 discusses making GRPC endpoint secure with mTLS. This jira will > track the work needed in Ratis to make grpc communication secure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670435#comment-16670435 ] Hadoop QA commented on RATIS-381: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 5s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 53s{color} | {color:red} root generated 12 new + 94 unchanged - 0 fixed = 106 total (was 94) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} root: The patch generated 16 new + 135 unchanged - 1 fixed = 151 total (was 136) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 38s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 9m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.server.simulation.TestRetryCacheWithSimulatedRpc | | | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-10-31 | | JIRA Issue | RATIS-381 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946346/r381_20181031.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 85168c03c176 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 1d2ebee | | Default Java | 1.8.0_181 | | javac | https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/diff-compile-javac-root.txt | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/477/testReport/ | | modules | C: ratis-common ratis-server U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/477/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira
[jira] [Commented] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670433#comment-16670433 ] Shashikant Banerjee commented on RATIS-382: --- >From logs on node >hadoop-root-datanode-ctr-e138-1518143905142-53-01-08.hwx.site : {code:java} 2018-10-31 07:31:06,654 ERROR org.apache.ratis.server.storage.RaftLogWorker: Terminating with exit status 1: 54026017-a738-45f5-92f9-c50a0fc24a9f-RaftLogWorker failed. org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:57: (t:3, i:57), STATEMACHINELOGENTRY, client-81616CC8EE42, cid=163-writeStateMachineData at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) {code} Timeout Exception happened around 07:31. >From Ozone.log: {code:java} 2018-10-31 07:30:50,691 [pool-3-thread-48] DEBUG (ChunkManagerImpl.java:85) - writing chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15 chunk stage:WRITE_DATA chunk file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15 tmp chunk file 2018-10-31 07:30:51,768 [pool-3-thread-49] DEBUG (ChunkManagerImpl.java:85) - writing chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16 chunk stage:WRITE_DATA chunk file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16 tmp chunk file 2018-10-31 07:30:53,757 [pool-10-thread-1] DEBUG (ChunkManagerImpl.java:85) - writing chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14 chunk stage:COMMIT_DATA chunk file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14 tmp chunk file 2018-10-31 07:31:06,673 [shutdown-hook-0] INFO (LogAdapter.java:51) - SHUTDOWN_MSG: // raftServer Stopped {code} These are the 2 write chunks during writeStateMachineData in flight. The commit for these has not happened yet. Looks like it indeed took more than 10 seconds for chunkFile *chunk file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15* to get written completely. May be increasing the timeout would help here. > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7
[jira] [Resolved] (RATIS-349) Include "incubating" in source release file name
[ https://issues.apache.org/jira/browse/RATIS-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser resolved RATIS-349. -- Resolution: Done Oops, forgot I made this. Committed as a part of RATIS-344 > Include "incubating" in source release file name > > > Key: RATIS-349 > URL: https://issues.apache.org/jira/browse/RATIS-349 > Project: Ratis > Issue Type: Task > Components: thirdparty >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > > Feedback from 0.1.0 rc1: need to get "incubating" in the artifact name > somewhere. > Will also include a rename of the modules to be a little more concise (remove > the "ratis-thirdparty-parent") as that shows up in the final name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670233#comment-16670233 ] Tsz Wo Nicholas Sze commented on RATIS-382: --- Would it be the case that the ContainerStateMachine cannot handle leader change correctly? Since the stateMachineFuture is returned by the stateMachine, it seems Ratis cannot do anything if it times out. > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670144#comment-16670144 ] Mukul Kumar Singh commented on RATIS-382: - Looked into the logs, this issue is hapening on the node when it transitions from leader to follower. hadoop-root-datanode-ctr-e138-1518143905142-53-01-08.hwx.site.log {code} 2018-10-31 07:31:06,631 INFO org.apache.ratis.server.impl.RaftServerImpl: 54026017-a738-45f5-92f9-c50a0fc24a9f changes role from CANDIDATE to FOLLOWER at term 6 for changeToFollower 2018-10-31 07:31:06,631 INFO org.apache.ratis.server.impl.RoleInfo: 54026017-a738-45f5-92f9-c50a0fc24a9f: shutdown LeaderElection 2018-10-31 07:31:06,632 INFO org.apache.ratis.server.impl.RoleInfo: 54026017-a738-45f5-92f9-c50a0fc24a9f: start FollowerState 2018-10-31 07:31:06,654 ERROR org.apache.ratis.server.storage.RaftLogWorker: Terminating with exit status 1: 54026017-a738-45f5-92f9-c50a0fc24a9f-RaftLogWorker failed. org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:57: (t:3, i:57), STATEMACHINELOGENTRY, client-81616CC8EE42, cid=163-writeStateMachineData at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) ... 3 more {code} hadoop-root-datanode-ctr-e138-1518143905142-541661-01-02.hwx.site.log {code} 2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RaftServerImpl: 9fab9937-fbcd-4196-8014-cb165045724b changes role from CANDIDATE to FOLLOWER at term 9 for changeToFollower 2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RoleInfo: 9fab9937-fbcd-4196-8014-cb165045724b: shutdown LeaderElection 2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RoleInfo: 9fab9937-fbcd-4196-8014-cb165045724b: start FollowerState 2018-10-31 09:12:00,782 WARN org.apache.ratis.grpc.client.GrpcClientProtocolService: 9fab9937-fbcd-4196-8014-cb165045724b-7: onError: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close 2018-10-31 09:12:00,786 INFO org.apache.ratis.server.impl.RaftServerImpl: 9fab9937-fbcd-4196-8014-cb165045724b: change Leader from null to f0291cb4-7a48-456a-847f-9f91a12aa850 at term 10 for appendEntries , leader elected after 1131ms 2018-10-31 09:12:02,353 INFO org.apache.ratis.grpc.server.GrpcServerProtocolService: 9fab9937-fbcd-4196-8014-cb165045724b: appendEntries completed 2018-10-31 09:12:04,516 INFO org.apache.ratis.server.storage.RaftLogWorker: Rolling segment:9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker index to:169 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, ce0084c2-97 cd-4c97-9378-e5175daad18b:172.27.15.139:9858, f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: Terminating with exit status 1: 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, i:182), STATEMACHINELOGENTRY, client-611073BBFA46, cid=127-writeStateMachineData at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) ... 3 more {code} > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-80
[jira] [Commented] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670125#comment-16670125 ] Tsz Wo Nicholas Sze commented on RATIS-382: --- TimeoutIOException was caused by stateMachineFuture.get() timeout. This does not like a Ratis bug. > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Moved] (RATIS-382) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh moved HDDS-768 to RATIS-382: -- Affects Version/s: (was: 0.3.0) 0.3.0 Workflow: no-reopen-closed, patch-avail (was: patch-available, re-open possible) Key: RATIS-382 (was: HDDS-768) Project: Ratis (was: Hadoop Distributed Data Store) > writeStateMachineData times out > --- > > Key: RATIS-382 > URL: https://issues.apache.org/jira/browse/RATIS-382 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Blocker > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669782#comment-16669782 ] Tsz Wo Nicholas Sze commented on RATIS-381: --- r381_20181031.patch: throws an exception but not returns null. > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira/browse/RATIS-381 > Project: Ratis > Issue Type: Improvement > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: r381_20181031.patch > > > Some tests may fail with NullPointerException since > RaftTestUtil.waitForLeader(..) may return null (and the tests do not check > for null) when leader elections take a long time. > It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, > if there is a no leader, throw an exception with some descriptive error > message, instead of returning null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null
[ https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-381: -- Attachment: r381_20181031.patch > RaftTestUtil.waitForLeader should not return null > - > > Key: RATIS-381 > URL: https://issues.apache.org/jira/browse/RATIS-381 > Project: Ratis > Issue Type: Improvement > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: r381_20181031.patch > > > Some tests may fail with NullPointerException since > RaftTestUtil.waitForLeader(..) may return null (and the tests do not check > for null) when leader elections take a long time. > It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, > if there is a no leader, throw an exception with some descriptive error > message, instead of returning null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-359) Add timeout support for Watch requests
[ https://issues.apache.org/jira/browse/RATIS-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669721#comment-16669721 ] Tsz Wo Nicholas Sze commented on RATIS-359: --- No, they are not the same: - RATIS-345: bypass sliding windows so that the watch requests won't be blocked by the write requests. - RATIS-359: add timeout so that a watch request may fail by timeout. > Add timeout support for Watch requests > -- > > Key: RATIS-359 > URL: https://issues.apache.org/jira/browse/RATIS-359 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > > After a watch request is added to a server, it will stay there until the > watch condition is satisfied. In this JIRA, we propose adding timeout > support so that a watch request will be failed and removed from the server > when it times out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)