[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-17 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954002#comment-16954002
 ] 

Lokesh Jain commented on RATIS-706:
---

[~elek]  Thanks for reporting the issue! [~szetszwo] Thanks for the 
contribution! I have committed the patch to the master branch.

> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt, r706_20191011.patch, r706_20191011b.patch
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953937#comment-16953937
 ] 

Hadoop QA commented on RATIS-706:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
5s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m  2s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.logservice.TestLogServiceWithNetty |
|   | ratis.logservice.TestLogServiceWithGrpc |
|   | ratis.logservice.server.TestMetaServer |
|   | ratis.grpc.TestServerRestartWithGrpc |
|   | ratis.grpc.TestRaftReconfigurationWithGrpc |
|   | ratis.server.simulation.TestRaftReconfigurationWithSimulatedRpc |
|   | ratis.netty.TestRaftReconfigurationWithNetty |
|   | ratis.server.raftlog.TestRaftLogMetrics |
|   | ratis.grpc.TestRaftAsyncWithGrpc |
|   | ratis.grpc.TestWatchRequestWithGrpc |
|   | ratis.examples.filestore.TestFileStoreWithGrpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-17 |
| JIRA Issue | RATIS-706 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982735/r706_20191011b.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 51181d5e7d30 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 9cbf1ef |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1075/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1075/testReport/ |
| Max. process+thread count | 3368 (vs. ulimit of 5000) |
| modules | C: ratis-common ratis-test U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1075/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-17 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953915#comment-16953915
 ] 

Lokesh Jain commented on RATIS-706:
---

[~szetszwo] Thanks for working on this! The changes look good to me. +1.

> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt, r706_20191011.patch, r706_20191011b.patch
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
>   - waiting to lock 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-14 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951107#comment-16951107
 ] 

Marton Elek commented on RATIS-706:
---

Thanks the quick fix [~szetszwo]. I deployed it and worked well, couldn't see 
the dead lock any more. 

 

I am not sure If I can judge the code but will try if nobody else will review 
it...

> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt, r706_20191011.patch, r706_20191011b.patch
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949196#comment-16949196
 ] 

Hadoop QA commented on RATIS-706:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 39s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.examples.filestore.TestFileStoreWithNetty |
|   | ratis.netty.TestRaftSnapshotWithNetty |
|   | ratis.server.simulation.TestRaftExceptionWithSimulation |
|   | ratis.grpc.TestGroupInfoWithGrpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.grpc.TestRaftAsyncWithGrpc |
|   | ratis.netty.TestRetryCacheWithNettyRpc |
|   | ratis.grpc.TestRaftWithGrpc |
|   | ratis.grpc.TestLeaderElectionWithGrpc |
|   | ratis.grpc.TestServerRestartWithGrpc |
|   | ratis.netty.TestGroupManagementWithNetty |
|   | ratis.netty.TestRaftWithNetty |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-11 |
| JIRA Issue | RATIS-706 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982735/r706_20191011b.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 8f427da481b7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 699792d |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/testReport/ |
| Max. process+thread count | 2689 (vs. ulimit of 5000) |
| modules | C: ratis-common ratis-test U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-10 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949150#comment-16949150
 ] 

Tsz-wo Sze commented on RATIS-706:
--

r706_20191011b.patch: adds a test.

> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt, r706_20191011.patch, r706_20191011b.patch
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
>   - waiting to lock <0x00033eb99b10> (a 
> 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-10 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949096#comment-16949096
 ] 

Tsz-wo Sze commented on RATIS-706:
--

r706_20191011.patch: close proxy without holding lock.

Will add a test.



> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt, r706_20191011.patch
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
>   - waiting to lock <0x00033eb99b10> (a 
> 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-10 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949072#comment-16949072
 ] 

Tsz-wo Sze commented on RATIS-706:
--

Thanks for filing this bug.  It is a real deadlock.  One easy fix is to change 
PeerProxyMap to close a proxy without holding the lock.  Will post a patch.

> Dead lock in GrpcClientRpc
> --
>
> Key: RATIS-706
> URL: https://issues.apache.org/jira/browse/RATIS-706
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Marton Elek
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: jstack.txt
>
>
> I started an Ozone cluster on Kubernetes and started a freon test (ozone 
> freon ockg -n1)
> After a while I found that the one freon instance is not creating keys any 
> more. I checked the om RPC endpoint with ozone insight and no RPC messages 
> has been arrived.
> Based on the jstack output we have a deadlock between 
> PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync.
> I am not sure (yet) what is the exact problem, but based on the stack traces 
> It seems to be Ratis related.
> {code}
> Found one Java-level deadlock:
> =
> "pool-2-thread-6":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x01170980 (object 0x00033eb99b10, a 
> org.apache.ratis.util.SlidingWindow$Client),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]"
> "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty 
> queue]":
>   waiting to lock monitor 0x7f80356c8800 (object 0x00033eb70a00, a 
> java.lang.Object),
>   which is held by 
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]"
> Java stack information for the threads listed above:
> ===
> "pool-2-thread-6":
>   at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103)
>   - waiting to lock <0x00033eb70a00> (a java.lang.Object)
>   at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75)
>   at 
> org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59)
>   at 
> org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
>   - locked <0x0003f2ba4240> (a 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty 
> queue]":
>   at 
> org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
>