[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-10-29 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962740#comment-16962740
 ] 

Li Cheng commented on HDDS-2186:


The OOM issue resolved in https://github.com/apache/hadoop-ozone/pull/28.

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: flaky-test
> Fix For: 0.5.0
>
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatis

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-10-17 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953568#comment-16953568
 ] 

Li Cheng commented on HDDS-2186:


Try to resolve this in [https://github.com/apache/hadoop-ozone/pull/28]

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: flaky-test
> Fix For: 0.5.0
>
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRp

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-10-11 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949932#comment-16949932
 ] 

Li Cheng commented on HDDS-2186:


[https://github.com/apache/hadoop/pull/1431]

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: flaky-test
> Fix For: HDDS-1564
>
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelinePr

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-10-11 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949933#comment-16949933
 ] 

Li Cheng commented on HDDS-2186:


[https://github.com/apache/hadoop/pull/1431]

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: flaky-test
> Fix For: HDDS-1564
>
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelinePr

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-10-11 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949930#comment-16949930
 ] 

Li Cheng commented on HDDS-2186:


It turns out the miniOzoneCluster running out of memory is triggered by endless 
creation of pipeline. Add logic to restrict endless pipeline creation in 
[https://github.com/apache/hadoop/pull/1431]. 

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: flaky-test
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> o

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-09-27 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939296#comment-16939296
 ] 

Li Cheng commented on HDDS-2186:


[~ljain] After some investigation, it turned out MiniOzoneCluster is abusing 
resources to create pipelines. Reason it didn't have problems before is that 
every datanode could only be assigned to one pipeline so that the quota runs 
out fast. Now the limit is taken off and there is no virtual limit to prevent 
cluster from creating pipelines other than ratis says resource like memory is 
not enough. I'm adding logic to prevent this, but unfortunately, factor ONE and 
factor THREE pipelines need to be handled differently, the logic grows more and 
more complex. 

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Priority: Major
>  Labels: flaky-test
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-09-27 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939290#comment-16939290
 ] 

Lokesh Jain commented on HDDS-2186:
---

[~timmylicheng] You are right. This might be related to multiple ratis 
pipelines in the datanode. I would suggest taking a heap dump and analysing the 
heap and direct memory usage.

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Priority: Major
>  Labels: flaky-test
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipel

[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions

2019-09-26 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938670#comment-16938670
 ] 

Li Cheng commented on HDDS-2186:


Note that CI is also affected and it cannot print out the correct output due to 
memory issues.

> Fix tests using MiniOzoneCluster for its memory related exceptions
> --
>
> Key: HDDS-2186
> URL: https://issues.apache.org/jira/browse/HDDS-2186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: HDDS-1564
>Reporter: Li Cheng
>Priority: Major
>  Labels: flaky-test
>
> After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a 
> bunch of 'out of memory' exceptions in ratis. Attached sample stacks.
>  
> 2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exception2019-09-26 15:12:22,824 
> [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker]
>  ERROR segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:run(323)) - 
> 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker
>  hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at 
> java.nio.Bits.reserveMemory(Bits.java:694) at 
> java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at 
> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
>  at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
>  at java.lang.Thread.run(Thread.java:748)
>  
> which leads to:
> 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR 
> pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 
> [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990
>  for 
> c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException:
>  deadline exceeded after 2999881264ns at 
> org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at 
> org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) 
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) 
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171)
>  at 
> java.util.concurre