[ https://issues.apache.org/jira/browse/CASSANDRA-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Brown updated CASSANDRA-14362: ------------------------------------ Summary: Bind to correct local address in 4.0 streaming (was: Node unable to join cluster in trunk) > Bind to correct local address in 4.0 streaming > ---------------------------------------------- > > Key: CASSANDRA-14362 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14362 > Project: Cassandra > Issue Type: Bug > Reporter: Lerh Chuan Low > Assignee: Jason Brown > Priority: Major > Fix For: 4.0 > > > I'm not sure if anybody has tried using {{trunk}} and starting an actual EC2 > cluster, but with a 3 node setup that works on 3.11, it fails to work on > {{trunk}}. I mistakenly stumbled into it while testing my own code changes. > My setup is as follows: > broadcast_rpc_address: publicIP > broadcast_address: publicIP > listen_address: omitted. Ends up as privateIP. > Works on 3.11 just fine. > On {{trunk}} though, it never works. My node is never able to join the > cluster: > {code} > Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO [main] 2018-04-03 > 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range > (-128373781239966537,-122439194129870521] exists on 52.88.241.181:7000 for > keyspace system_traces > Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO [main] 2018-04-03 > 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range > (6968670424536541270,6973888347502882935] exists on 52.88.241.181:7000 for > keyspace system_traces > Apr 03 05:57:42 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 05:57:42,298 FailureDetector.java:324 - Not marking nodes down due > to local pause of 26215173446ns > 5000000000ns > Apr 03 05:57:53 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 05:57:53,035 FailureDetector.java:324 - Not marking nodes down due > to local pause of 10736485907ns > 5000000000ns > Apr 03 05:58:30 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 05:58:30,790 Gossiper.java:814 - Gossip stage has 28 pending > tasks; skipping status check (no nodes will be marked down) > Apr 03 05:58:33 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 05:58:33,060 Gossiper.java:814 - Gossip stage has 20 pending > tasks; skipping status check (no nodes will be marked down) > Apr 03 06:04:33 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 06:04:33,826 FailureDetector.java:324 - Not marking nodes down due > to local pause of 400790432954ns > 5000000000ns > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: WARN [GossipTasks:1] > 2018-04-03 06:04:49,133 Gossiper.java:814 - Gossip stage has 2 pending tasks; > skipping status check (no nodes will be marked down) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR > [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,138 > StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] > Streaming error occurred on session with peer 52.88.241.181:7000 > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.io.IOException: failed > to connect to 52.88.241.181:7000 (STREAM) for streaming data > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:98) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:57) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.async.NettyStreamingMessageSender.createChannel(NettyStreamingMessageSender.java:183) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.async.NettyStreamingMessageSender.setupControlMessageChannel(NettyStreamingMessageSender.java:165) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:222) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:146) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:271) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.lang.Thread.run(Thread.java:748) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: Caused by: > io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Cannot > assign requested address > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.unix.Errors.newIOException(Errors.java:117) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.unix.Socket.bind(Socket.java:266) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.epoll.AbstractEpollStreamChannel.doConnect(AbstractEpollStreamChannel.java:729) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.epoll.EpollSocketChannel.doConnect(EpollSocketChannel.java:184) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.connect(AbstractEpollStreamChannel.java:797) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1274) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:545) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:530) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:999) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.AbstractChannel.connect(AbstractChannel.java:260) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:254) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:312) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ... 1 common frames > omitted > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: INFO > [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,140 > StreamResultFuture.java:197 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] > Session with 52.88.241.181:7000 is complete > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR > [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,146 > StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] > Streaming error occurred on session with peer 52.88.241.181:7000 > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.lang.RuntimeException: > stream has been closed, cannot send Prepare SYN (3 requests, 0 files} > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:209) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.StreamSession.onInitializationComplete(StreamSession.java:495) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:272) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: at > java.lang.Thread.run(Thread.java:748) > {code} > Remote debugging it, it eventually seems to come to be due to trying to call > {{socket.bind}} in {{doConnect}} in {{AbstractEpollStreamChannel}}. Local > address is <publicIP>:0. I think port 0 is fine because it just means pick > the next ephemeral port available, but there's no way of binding to the > public IP. > I made a commit > https://github.com/apache/cassandra/commit/00222ddf0694f1c35c2bdd84fd7407174c3fc57a > that seems to have fixed it so I can continue on with my purposes, and that > is have {{StreamSession}} use listen address instead of broadcast address > which may be public. In the event listen address is public I guess that means > if it was able to bind to that interface to begin with then {{socket.bind}} > should also work. > Not sure if this is the right way though, and it seems to have been > introduced by the Streaming rework to netty. > https://issues.apache.org/jira/browse/CASSANDRA-12229. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org