Re: Streaming never completes during nodetool rebuild

2013-09-10 Thread Paulo Motta
Thanks for the reply Robert!

Actually increasing the property streaming_socket_timeout_in_ms fixed the
problem. :)

It seems 60 seconds is a too low value for this property for inter-region
streaming of very large files.

I increased it to 600 seconds, but a lower value should be enough.


2013/9/9 Robert Coli rc...@eventbrite.com

 On Mon, Sep 9, 2013 at 12:28 PM, Paulo Motta pauloricard...@gmail.comwrote:

 I've been trying to add a new data center to our Cassandra 1.1.10 cluster
 for the last few days, but I've been unable to successfully rebuild the
 nodes on the new DC due to streaming problems.


 There are some upstream streaming fixes in 1.2. However, I do not know
 whether they would help in this case. A brief glance at the CHANGES.txt is
 not suggestive.

  Unfortunately the only solution to hung streaming is to restart the
 affected nodes.

 https://issues.apache.org/jira/browse/CASSANDRA-3486
 https://issues.apache.org/jira/browse/CASSANDRA-5286

 =Rob




-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*


Streaming never completes during nodetool rebuild

2013-09-09 Thread Paulo Motta
Hello,

I've been trying to add a new data center to our Cassandra 1.1.10 cluster
for the last few days, but I've been unable to successfully rebuild the
nodes on the new DC due to streaming problems.

I have followed the procedure described in
http://www.datastax.com/docs/1.1/cluster_management#adding-capacity (section
Adding a Data Center to a Cluster), but some of the new nodes hang
forever during the nodetool rebuild operation. A nodetool netstats |
grep -v 0% on the node with a frozen rebuild (205.229.68.48) will show:

Mode: NORMAL
Not sending any streams.
*Streaming from: /253.126.57.150*
Pool NameActive   Pending  Completed
Commandsn/a 0 686479
Responses   n/a 07533195

But if I connect to the source node 253.126.57.150 and issue a nodetool
netstats, it will show:

Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0  879835258
Responses   n/a 0  611936734

If I check the C* logs of this machine (253.126.57.150), I will find:

2013 Sep  9 17:50:40 ip-10-177-14-80 ERROR [Streaming to /205.229.68.48:8]
2013-09-09 17:50:40,689 AbstractCassandraDaemon.java (line 135) Exception
in thread Thread[Streaming to /205.229.68.48,5,main]
2013 Sep  9 17:50:40 ip-10-177-14-80 java.lang.RuntimeException:
*java.net.SocketTimeoutException:
Read timed out*
[...]

So, it seems the streaming source (253.126.57.150) waits for a response
from the streaming destination (205.229.68.480), but the *
streaming_socket_timeout_in_ms* is reached (*60s* in our case) and the
source stops streaming. However, *the tricky part is that the destination
node never times out*, so it never retries re-streaming the problematic
file and hangs there forever, never completing the rebuild operation.

I'd appreciate if anyone could give me a hand on this. Could it be a bug or
is there some configuration tuning that could help this? I will try
increasing the *streaming_socket_timeout_in_ms* property, but if the
problem is on the destination this won't help (since the socket only times
out at the source).

I've investigated the source code of IncomingStreamReader
and StreamInSession, but didn't notice any blocking operation apart from
the DataInputStream.read, which should timeout on *
streaming_socket_timeout_in_ms*, but it doesn't.

Any help would be very much appreciated,

Thanks,

Paulo


Re: Streaming never completes during nodetool rebuild

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 12:28 PM, Paulo Motta pauloricard...@gmail.comwrote:

 I've been trying to add a new data center to our Cassandra 1.1.10 cluster
 for the last few days, but I've been unable to successfully rebuild the
 nodes on the new DC due to streaming problems.


There are some upstream streaming fixes in 1.2. However, I do not know
whether they would help in this case. A brief glance at the CHANGES.txt is
not suggestive.

Unfortunately the only solution to hung streaming is to restart the
affected nodes.

https://issues.apache.org/jira/browse/CASSANDRA-3486
https://issues.apache.org/jira/browse/CASSANDRA-5286

=Rob