[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-07-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883675#comment-16883675
 ] 

Hudson commented on HDDS-1384:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16900 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16900/])
HDDS-1384. TestBlockOutputStreamWithFailures is failing (elek: rev 
9119ed07ff32143b548316bf69c49695196f8422)
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestMiniOzoneCluster.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/XceiverServerGrpc.java


> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(Ni

[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-06-28 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875064#comment-16875064
 ] 

Elek, Marton commented on HDDS-1384:


This is still a serious problem and I can see related (flaky) failures at every 
second day.

I uploaded my second attempt to fix this (in a simplified way as earlier): I 
fixed the race condition in a way, which is similar how the port handling is 
used for hadoop rpc: in case of port=0, the port (which is reported to scm) 
should be updated based on the real socket address.

> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLoca

[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-05-06 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834102#comment-16834102
 ] 

Eric Yang commented on HDDS-1384:
-

Maybe the better way to fix race condition of port binding is to ensure the 
ephemeral port range is 1 and above, where dynamic ports binding is in 
separated range from ephemeral port.

Ephemeral ports
{code}
sudo sysctl -w net.ipv4.ip_local_port_range="1 65535"
{code}

Dynamic ports
{code}
sudo sysctl -w net.ipv4.ip_local_reserved_ports="6000, 9000"
{code}

Keep in mind that reserved ports are specified in number instead of ranges.

> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.ap

[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-05-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834101#comment-16834101
 ] 

Hudson commented on HDDS-1384:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16507 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16507/])
Revert "HDDS-1384. TestBlockOutputStreamWithFailures is failing" (elek: rev 
fb7c1cad0ea93406a7272872c888d06e4e56620a)
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java


> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable

[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-05-06 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834083#comment-16834083
 ] 

Elek, Marton commented on HDDS-1384:


This is reverted as other problems are introduced: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-773/3/testReport/

> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr..

[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-05-06 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834085#comment-16834085
 ] 

Arpit Agarwal commented on HDDS-1384:
-

Thanks for reverting this [~elek]!

> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org


[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-04-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830268#comment-16830268
 ] 

Hudson commented on HDDS-1384:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16481 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16481/])
HDDS-1384. TestBlockOutputStreamWithFailures is failing (elek: rev 
dead9b4049484c31e0608956e53a9ef07a45819d)
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java


> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   ... 1 more
> {noformat}



[jira] [Commented] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-04-18 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821021#comment-16821021
 ] 

Elek, Marton commented on HDDS-1384:


Thanks to the comment from [~shashikant] at HDDS-1282, I learned that problem 
can be a result of a concurrency problem. There could be a short time between 
identifying a free port in RATIS and the usage. So it's possible that the port 
was free at the time of the decision but it's not free any more when somebody 
starts to use it.

I am trying to address this issue to use fixed incremental ports instead of 
random ports (but choose the next port if a port is not available from the 
range). 

> TestBlockOutputStreamWithFailures is failing
> 
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start Grpc server
> java.io.IOException: Failed to bind
>   at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
>   at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
>   at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
>   at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
>   at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
>   at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.ja