[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671135#comment-16671135
 ] 

Hadoop QA commented on RATIS-381:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 48s{color} 
| {color:red} root generated 13 new + 94 unchanged - 0 fixed = 107 total (was 
94) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} root: The patch generated 28 new + 137 unchanged 
- 8 fixed = 165 total (was 145) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 28s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.server.simulation.TestRetryCacheWithSimulatedRpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.server.simulation.TestServerInformationWithSimulatedRpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-11-01 
|
| JIRA Issue | RATIS-381 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946472/r381_20181101.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux 127e6f8375db 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 1d2ebee |
| Default Java | 1.8.0_181 |
| javac | 
https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/diff-compile-javac-root.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/479/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/479/testReport/ |
| modules | C: ratis-common ratis-server ratis-grpc U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/479/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> RaftTestUtil.waitForLeader should not return null
> -
>
>  

[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671084#comment-16671084
 ] 

Hadoop QA commented on RATIS-381:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
5s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 49s{color} 
| {color:red} root generated 13 new + 94 unchanged - 0 fixed = 107 total (was 
94) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} root: The patch generated 28 new + 137 unchanged 
- 8 fixed = 165 total (was 145) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 20s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 10m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-11-01 
|
| JIRA Issue | RATIS-381 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946472/r381_20181101.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux 0068a2cf0e20 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 1d2ebee |
| Default Java | 1.8.0_181 |
| javac | 
https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/diff-compile-javac-root.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/478/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/478/testReport/ |
| modules | C: ratis-common ratis-server ratis-grpc U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/478/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira/browse/RATIS-381
> Project: Ratis
>  Issue Type: Improvement
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assig

[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-381:
--
Attachment: r381_20181101.patch

> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira/browse/RATIS-381
> Project: Ratis
>  Issue Type: Improvement
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r381_20181101.patch
>
>
> Some tests may fail with NullPointerException since 
> RaftTestUtil.waitForLeader(..) may return null (and the tests do not check 
> for null) when leader elections take a long time.
> It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, 
> if there is a no leader, throw an exception with some descriptive error 
> message, instead of returning null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-381:
--
Attachment: (was: r381_20181031.patch)

> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira/browse/RATIS-381
> Project: Ratis
>  Issue Type: Improvement
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r381_20181101.patch
>
>
> Some tests may fail with NullPointerException since 
> RaftTestUtil.waitForLeader(..) may return null (and the tests do not check 
> for null) when leader elections take a long time.
> It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, 
> if there is a no leader, throw an exception with some descriptive error 
> message, instead of returning null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-372) Basic test harness for LogService

2018-10-31 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned RATIS-372:


Assignee: Josh Elser  (was: Rajeshbabu Chintaguntla)

> Basic test harness for LogService
> -
>
> Key: RATIS-372
> URL: https://issues.apache.org/jira/browse/RATIS-372
> Project: Ratis
>  Issue Type: Task
>  Components: LogService
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> We should have something that can stand up a logservice in a 
> pseudo-distributed manner.
> Docker is all the rage right now, and would make it easy to deploy onto 
> something like k8s in the future.
> Using [docker-compose|https://docs.docker.com/compose/] would provide us a 
> nice way to have one docker image for the metadata service daemons, another 
> for logservice daemons (if needed), and then create a network that connects 
> them all together. The final docker-compose yaml would be something like:
> It would be nice to provide a "client" container in which we show a basic 
> create/write/read/delete example to give folks a starting point.
> * 1 network
> * 3 instances of metadata service statemachines
> * 3 instances of log service statemachines
> * 1 image with the example client.
> [~chrajeshbab...@gmail.com], make sense to you?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-372) Basic test harness for LogService

2018-10-31 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670845#comment-16670845
 ] 

Josh Elser commented on RATIS-372:
--

Talked to Rajesh offline. He had to take some unplanned leave. Stealing 
ownership for now.

> Basic test harness for LogService
> -
>
> Key: RATIS-372
> URL: https://issues.apache.org/jira/browse/RATIS-372
> Project: Ratis
>  Issue Type: Task
>  Components: LogService
>Reporter: Josh Elser
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>
> We should have something that can stand up a logservice in a 
> pseudo-distributed manner.
> Docker is all the rage right now, and would make it easy to deploy onto 
> something like k8s in the future.
> Using [docker-compose|https://docs.docker.com/compose/] would provide us a 
> nice way to have one docker image for the metadata service daemons, another 
> for logservice daemons (if needed), and then create a network that connects 
> them all together. The final docker-compose yaml would be something like:
> It would be nice to provide a "client" container in which we show a basic 
> create/write/read/delete example to give folks a starting point.
> * 1 network
> * 3 instances of metadata service statemachines
> * 3 instances of log service statemachines
> * 1 image with the example client.
> [~chrajeshbab...@gmail.com], make sense to you?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-385) Create README for logservice

2018-10-31 Thread Josh Elser (JIRA)
Josh Elser created RATIS-385:


 Summary: Create README for logservice
 Key: RATIS-385
 URL: https://issues.apache.org/jira/browse/RATIS-385
 Project: Ratis
  Issue Type: Sub-task
Reporter: Josh Elser


We should have a nice README at 
https://github.com/apache/incubator-ratis/tree/master/ratis-logservice to help 
guide people to the project and encourage contributions/involvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-384) writeStateMachineData times out

2018-10-31 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created RATIS-384:
---

 Summary: writeStateMachineData times out
 Key: RATIS-384
 URL: https://issues.apache.org/jira/browse/RATIS-384
 Project: Ratis
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Nilotpal Nandi
 Fix For: 0.3.0


datanode stopped due to following error :

datanode.log
{noformat}
2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
[9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
i:182), STATEMACHINELOGENTRY, client-611073BBFA46, cid=127-writeStateMachineData
 at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
 at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
 at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
 at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
 ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-271) Ratis-backed distributed log: "LogService"

2018-10-31 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670801#comment-16670801
 ] 

Josh Elser commented on RATIS-271:
--

Attached a doc that [~sergey.soldatov] had put together a while back that helps 
describe the Metadata Service for tracking the Logs. Thought that would be 
helpful to folks who come along later.

> Ratis-backed distributed log: "LogService" 
> ---
>
> Key: RATIS-271
> URL: https://issues.apache.org/jira/browse/RATIS-271
> Project: Ratis
>  Issue Type: New Feature
>  Components: LogService
>Reporter: Josh Elser
>Priority: Major
> Attachments: LogService Metadata Service.pdf
>
>
> Umbrella issue for building a distributed log using Ratis:
> Doc: 
> [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit]
> Discuss: 
> https://lists.apache.org/thread.html/f80dc3900f6d9f4ee4d9f9e0898cee9a232e3b1ca9a4d9a53fea1d71@%3Cdev.ratis.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-271) Ratis-backed distributed log: "LogService"

2018-10-31 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated RATIS-271:
-
Attachment: LogService Metadata Service.pdf

> Ratis-backed distributed log: "LogService" 
> ---
>
> Key: RATIS-271
> URL: https://issues.apache.org/jira/browse/RATIS-271
> Project: Ratis
>  Issue Type: New Feature
>  Components: LogService
>Reporter: Josh Elser
>Priority: Major
> Attachments: LogService Metadata Service.pdf
>
>
> Umbrella issue for building a distributed log using Ratis:
> Doc: 
> [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit]
> Discuss: 
> https://lists.apache.org/thread.html/f80dc3900f6d9f4ee4d9f9e0898cee9a232e3b1ca9a4d9a53fea1d71@%3Cdev.ratis.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-329) Current Ratis heartbeats are missing for a heavily loaded cluster

2018-10-31 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-329:

Target Version/s:   (was: 0.3.0)

> Current Ratis heartbeats are missing for a heavily loaded cluster
> -
>
> Key: RATIS-329
> URL: https://issues.apache.org/jira/browse/RATIS-329
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
>
> Currently while running Ratis with Ozone, Frequent leader elections can be 
> noticed in the datanode logs. This is happening because of missing heartbeats 
> from the leader to follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-382:

Fix Version/s: 0.3.0

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670619#comment-16670619
 ] 

ASF GitHub Bot commented on RATIS-383:
--

xiaoyuyao opened a new pull request #1: RATIS-383. Shade native library 
tcnative for grpc/netty in Ratis-Thir…
URL: https://github.com/apache/incubator-ratis-thirdparty/pull/1
 
 
   …dparty.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Shade native library tcnative for grpc/netty in Ratis-Thirdparty
> 
>
> Key: RATIS-383
> URL: https://issues.apache.org/jira/browse/RATIS-383
> Project: Ratis
>  Issue Type: Bug
>  Components: security
>Reporter: Mukul Kumar Singh
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-383.001.patch
>
>
> RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is 
> needed as GRPC/netty has dependency on tcnative jar/libraries that need to be 
> shaded in Ratis-Thirdparty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670568#comment-16670568
 ] 

Shashikant Banerjee commented on RATIS-382:
---

Looking further at the nodes, the tmp chunk files do actually exist and are 
completely written:
{code:java}
-rw-r--r-- 1 root root 16M Oct 31 07:30 
/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15.tmp
-rw-r--r-- 1 root root 16M Oct 31 07:30 
/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16.tmp{code}

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty

2018-10-31 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated RATIS-383:
-
Attachment: RATIS-383.001.patch

> Shade native library tcnative for grpc/netty in Ratis-Thirdparty
> 
>
> Key: RATIS-383
> URL: https://issues.apache.org/jira/browse/RATIS-383
> Project: Ratis
>  Issue Type: Bug
>  Components: security
>Reporter: Mukul Kumar Singh
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-383.001.patch
>
>
> RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is 
> needed as GRPC/netty has dependency on tcnative jar/libraries that need to be 
> shaded in Ratis-Thirdparty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty

2018-10-31 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated RATIS-383:
-
Description: RATIS-246 discusses making GRPC endpoint secure with mTLS. 
This jira is needed as GRPC/netty has dependency on tcnative jar/libraries that 
need to be shaded in Ratis-Thirdparty.  (was: HDDS-115 discusses making GRPC 
endpoint secure with mTLS. This jira will track the work needed in Ratis to 
make grpc communication secure.)

> Shade native library tcnative for grpc/netty in Ratis-Thirdparty
> 
>
> Key: RATIS-383
> URL: https://issues.apache.org/jira/browse/RATIS-383
> Project: Ratis
>  Issue Type: Bug
>  Components: security
>Reporter: Mukul Kumar Singh
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: ozone
>
> RATIS-246 discusses making GRPC endpoint secure with mTLS. This jira is 
> needed as GRPC/netty has dependency on tcnative jar/libraries that need to be 
> shaded in Ratis-Thirdparty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-383) Shade native library tcnative for grpc/netty in Ratis-Thirdparty

2018-10-31 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created RATIS-383:


 Summary: Shade native library tcnative for grpc/netty in 
Ratis-Thirdparty
 Key: RATIS-383
 URL: https://issues.apache.org/jira/browse/RATIS-383
 Project: Ratis
  Issue Type: Bug
  Components: security
Reporter: Mukul Kumar Singh
Assignee: Xiaoyu Yao


HDDS-115 discusses making GRPC endpoint secure with mTLS. This jira will track 
the work needed in Ratis to make grpc communication secure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-246) Support secure gRPC endpoint with mTLS in Ratis

2018-10-31 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned RATIS-246:


Assignee: Xiaoyu Yao

> Support secure gRPC endpoint with mTLS in Ratis
> ---
>
> Key: RATIS-246
> URL: https://issues.apache.org/jira/browse/RATIS-246
> Project: Ratis
>  Issue Type: Bug
>  Components: security
>Reporter: Mukul Kumar Singh
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: ozone
>
> HDDS-115 discusses making GRPC endpoint secure with mTLS. This jira will 
> track the work needed in Ratis to make grpc communication secure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670435#comment-16670435
 ] 

Hadoop QA commented on RATIS-381:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
5s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 53s{color} 
| {color:red} root generated 12 new + 94 unchanged - 0 fixed = 106 total (was 
94) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} root: The patch generated 16 new + 135 unchanged 
- 1 fixed = 151 total (was 136) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 38s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  9m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.server.simulation.TestRetryCacheWithSimulatedRpc |
|   | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-10-31 
|
| JIRA Issue | RATIS-381 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946346/r381_20181031.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux 85168c03c176 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 1d2ebee |
| Default Java | 1.8.0_181 |
| javac | 
https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/diff-compile-javac-root.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/477/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/477/testReport/ |
| modules | C: ratis-common ratis-server U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/477/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira

[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670433#comment-16670433
 ] 

Shashikant Banerjee commented on RATIS-382:
---

>From logs on node 
>hadoop-root-datanode-ctr-e138-1518143905142-53-01-08.hwx.site :

 
{code:java}
2018-10-31 07:31:06,654 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
54026017-a738-45f5-92f9-c50a0fc24a9f-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:57: (t:3, 
i:57), STATEMACHINELOGENTRY, client-81616CC8EE42, cid=163-writeStateMachineData
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
{code}
Timeout Exception happened around 07:31.

 

>From Ozone.log:

 
{code:java}
2018-10-31 07:30:50,691 [pool-3-thread-48] DEBUG (ChunkManagerImpl.java:85) - 
writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15
 chunk stage:WRITE_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15
 tmp chunk file
2018-10-31 07:30:51,768 [pool-3-thread-49] DEBUG (ChunkManagerImpl.java:85) - 
writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16
 chunk stage:WRITE_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16
 tmp chunk file

2018-10-31 07:30:53,757 [pool-10-thread-1] DEBUG (ChunkManagerImpl.java:85)     
- writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14
 chunk stage:COMMIT_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14
 tmp chunk file

2018-10-31 07:31:06,673 [shutdown-hook-0] INFO  (LogAdapter.java:51)     - 
SHUTDOWN_MSG: // raftServer Stopped
{code}
 

These are the 2 write chunks during writeStateMachineData in flight. The commit 
for these has not happened yet. Looks like it indeed took more than 10 seconds 
for chunkFile *chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15*
 to get written completely. May be increasing the timeout would help here.

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7

[jira] [Resolved] (RATIS-349) Include "incubating" in source release file name

2018-10-31 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved RATIS-349.
--
Resolution: Done

Oops, forgot I made this. Committed as a part of RATIS-344

> Include "incubating" in source release file name
> 
>
> Key: RATIS-349
> URL: https://issues.apache.org/jira/browse/RATIS-349
> Project: Ratis
>  Issue Type: Task
>  Components: thirdparty
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> Feedback from 0.1.0 rc1: need to get "incubating" in the artifact name 
> somewhere.
> Will also include a rename of the modules to be a little more concise (remove 
> the "ratis-thirdparty-parent") as that shows up in the final name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670233#comment-16670233
 ] 

Tsz Wo Nicholas Sze commented on RATIS-382:
---

Would it be the case that the ContainerStateMachine cannot handle leader change 
correctly?  Since the stateMachineFuture is returned by the stateMachine, it 
seems Ratis cannot do anything if it times out.

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670144#comment-16670144
 ] 

Mukul Kumar Singh commented on RATIS-382:
-

Looked into the logs, this issue is hapening on the node when it transitions 
from leader to follower.

hadoop-root-datanode-ctr-e138-1518143905142-53-01-08.hwx.site.log
{code}
2018-10-31 07:31:06,631 INFO org.apache.ratis.server.impl.RaftServerImpl: 
54026017-a738-45f5-92f9-c50a0fc24a9f changes role from CANDIDATE to FOLLOWER at 
term 6 for changeToFollower
2018-10-31 07:31:06,631 INFO org.apache.ratis.server.impl.RoleInfo: 
54026017-a738-45f5-92f9-c50a0fc24a9f: shutdown LeaderElection
2018-10-31 07:31:06,632 INFO org.apache.ratis.server.impl.RoleInfo: 
54026017-a738-45f5-92f9-c50a0fc24a9f: start FollowerState
2018-10-31 07:31:06,654 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
54026017-a738-45f5-92f9-c50a0fc24a9f-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:57: (t:3, 
i:57), STATEMACHINELOGENTRY, client-81616CC8EE42, cid=163-writeStateMachineData
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
... 3 more
{code}

hadoop-root-datanode-ctr-e138-1518143905142-541661-01-02.hwx.site.log 
{code}
2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9fab9937-fbcd-4196-8014-cb165045724b changes role from CANDIDATE to FOLLOWER at 
term 9 for changeToFollower
2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RoleInfo: 
9fab9937-fbcd-4196-8014-cb165045724b: shutdown LeaderElection
2018-10-31 09:11:59,883 INFO org.apache.ratis.server.impl.RoleInfo: 
9fab9937-fbcd-4196-8014-cb165045724b: start FollowerState
2018-10-31 09:12:00,782 WARN 
org.apache.ratis.grpc.client.GrpcClientProtocolService: 
9fab9937-fbcd-4196-8014-cb165045724b-7: onError: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
 cancelled before receiving half close
2018-10-31 09:12:00,786 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9fab9937-fbcd-4196-8014-cb165045724b: change Leader from null to 
f0291cb4-7a48-456a-847f-9f91a12aa850 at term 10 for appendEntries
, leader elected after 1131ms
2018-10-31 09:12:02,353 INFO 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
9fab9937-fbcd-4196-8014-cb165045724b: appendEntries completed
2018-10-31 09:12:04,516 INFO org.apache.ratis.server.storage.RaftLogWorker: 
Rolling segment:9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker index to:169
2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
[9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, ce0084c2-97
cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
i:182), STATEMACHINELOGENTRY, client-611073BBFA46, cid=127-writeStateMachineData
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
... 3 more
{code}

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-80

[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670125#comment-16670125
 ] 

Tsz Wo Nicholas Sze commented on RATIS-382:
---

TimeoutIOException was caused by stateMachineFuture.get() timeout.  This does 
not like a Ratis bug.

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Moved] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh moved HDDS-768 to RATIS-382:
--

Affects Version/s: (was: 0.3.0)
   0.3.0
 Workflow: no-reopen-closed, patch-avail  (was: patch-available, 
re-open possible)
  Key: RATIS-382  (was: HDDS-768)
  Project: Ratis  (was: Hadoop Distributed Data Store)

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669782#comment-16669782
 ] 

Tsz Wo Nicholas Sze commented on RATIS-381:
---

r381_20181031.patch: throws an exception but not returns null.

> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira/browse/RATIS-381
> Project: Ratis
>  Issue Type: Improvement
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r381_20181031.patch
>
>
> Some tests may fail with NullPointerException since 
> RaftTestUtil.waitForLeader(..) may return null (and the tests do not check 
> for null) when leader elections take a long time.
> It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, 
> if there is a no leader, throw an exception with some descriptive error 
> message, instead of returning null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-381) RaftTestUtil.waitForLeader should not return null

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-381:
--
Attachment: r381_20181031.patch

> RaftTestUtil.waitForLeader should not return null
> -
>
> Key: RATIS-381
> URL: https://issues.apache.org/jira/browse/RATIS-381
> Project: Ratis
>  Issue Type: Improvement
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r381_20181031.patch
>
>
> Some tests may fail with NullPointerException since 
> RaftTestUtil.waitForLeader(..) may return null (and the tests do not check 
> for null) when leader elections take a long time.
> It seems RaftTestUtil.waitForLeader(..) better waits for a longer period and, 
> if there is a no leader, throw an exception with some descriptive error 
> message, instead of returning null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-359) Add timeout support for Watch requests

2018-10-31 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669721#comment-16669721
 ] 

Tsz Wo Nicholas Sze commented on RATIS-359:
---

No, they are not the same:
- RATIS-345: bypass sliding windows so that the watch requests won't be blocked 
by the write requests.
- RATIS-359: add timeout so that a watch request may fail by timeout.

> Add timeout support for Watch requests
> --
>
> Key: RATIS-359
> URL: https://issues.apache.org/jira/browse/RATIS-359
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>
> After a watch request is added to a server, it will stay there until the 
> watch condition is satisfied.  In this JIRA, we propose adding timeout 
> support so that a watch request will be failed and removed from the server 
> when it times out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)