[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883675#comment-16883675 ] Hudson commented on HDDS-1384: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16900 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16900/]) HDDS-1384. TestBlockOutputStreamWithFailures is failing (elek: rev 9119ed07ff32143b548316bf69c49695196f8422) * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestMiniOzoneCluster.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/XceiverServerGrpc.java > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(Ni
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875064#comment-16875064 ] Elek, Marton commented on HDDS-1384: This is still a serious problem and I can see related (flaky) failures at every second day. I uploaded my second attempt to fix this (in a simplified way as earlier): I fixed the race condition in a way, which is similar how the port handling is used for hadoop rpc: in case of port=0, the port (which is reported to scm) should be updated based on the real socket address. > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLoca
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834102#comment-16834102 ] Eric Yang commented on HDDS-1384: - Maybe the better way to fix race condition of port binding is to ensure the ephemeral port range is 1 and above, where dynamic ports binding is in separated range from ephemeral port. Ephemeral ports {code} sudo sysctl -w net.ipv4.ip_local_port_range="1 65535" {code} Dynamic ports {code} sudo sysctl -w net.ipv4.ip_local_reserved_ports="6000, 9000" {code} Keep in mind that reserved ports are specified in number instead of ranges. > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.ap
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834101#comment-16834101 ] Hudson commented on HDDS-1384: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16507 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16507/]) Revert "HDDS-1384. TestBlockOutputStreamWithFailures is failing" (elek: rev fb7c1cad0ea93406a7272872c888d06e4e56620a) * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834083#comment-16834083 ] Elek, Marton commented on HDDS-1384: This is reverted as other problems are introduced: https://builds.apache.org/job/hadoop-multibranch/job/PR-773/3/testReport/ > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ... 1 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr..
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834085#comment-16834085 ] Arpit Agarwal commented on HDDS-1384: - Thanks for reverting this [~elek]! > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ... 1 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830268#comment-16830268 ] Hudson commented on HDDS-1384: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16481 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16481/]) HDDS-1384. TestBlockOutputStreamWithFailures is failing (elek: rev dead9b4049484c31e0608956e53a9ef07a45819d) * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ... 1 more > {noformat}
[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821021#comment-16821021 ] Elek, Marton commented on HDDS-1384: Thanks to the comment from [~shashikant] at HDDS-1282, I learned that problem can be a result of a concurrency problem. There could be a short time between identifying a free port in RATIS and the usage. So it's possible that the port was free at the time of the decision but it's not free any more when somebody starts to use it. I am trying to address this issue to use fixed incremental ports instead of random ports (but choose the next port if a port is not available from the range). > TestBlockOutputStreamWithFailures is failing > > > Key: HDDS-1384 > URL: https://issues.apache.org/jira/browse/HDDS-1384 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > TestBlockOutputStreamWithFailures is failing with the following error > {noformat} > 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for > org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker > (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for > volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a > 2019-04-04 18:52:43,241 ERROR server.GrpcService > (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to > start Grpc server > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) > at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.ja