subject:"\[jira\] \[Commented\] \(CASSANDRA\-8621\) For streaming operations, when a socket is closed\/reset, we should retry\/reinitiate that stream"

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2016-05-25 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300475#comment-15300475
 ] 

Paulo Motta commented on CASSANDRA-8621:


Closing this because the issue that originated this ticket was likely caused by 
CASSANDRA-11286 and stream sockets will no longer be idle after 
CASSANDRA-11841, so a closed/reset stream socket will generally mean the node 
is unreachable (see more details above).

> For streaming operations, when a socket is closed/reset, we should 
> retry/reinitiate that stream
> ---
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Jeremy Hanna
>Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will 
> timeout and retry the stream operation in the case where tcp is idle for a 
> period of time.  However in the case where the socket is closed or reset, we 
> do not retry the operation.  This can happen for a number of reasons, 
> including when a firewall sends a reset message on a socket during a 
> streaming operation, such as nodetool rebuild necessarily across DCs or 
> repairs.
> Doing a retry would make the streaming operations more resilient.  It would 
> be good to log the retry clearly as well (with the stream session ID and node 
> address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2016-03-11 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191015#comment-15191015
 ] 

Paulo Motta commented on CASSANDRA-8621:


Given that the stalled stream issue that originated this ticket was likely 
caused by CASSANDRA-11286, and with that in place and a properly configured 
network (ie. smaller keepalive interval) connections  won't die if there is no 
network partition, I think this feature loses relevance, as it will add more 
state/complexity to the streaming protocol without clear benefits. So I propose 
we close this a later and re-evaluate if there are still broken connections 
after CASSANDRA-11286. WDYT [~yukim] ?

> For streaming operations, when a socket is closed/reset, we should 
> retry/reinitiate that stream
> ---
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Jeremy Hanna
>Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will 
> timeout and retry the stream operation in the case where tcp is idle for a 
> period of time.  However in the case where the socket is closed or reset, we 
> do not retry the operation.  This can happen for a number of reasons, 
> including when a firewall sends a reset message on a socket during a 
> streaming operation, such as nodetool rebuild necessarily across DCs or 
> repairs.
> Doing a retry would make the streaming operations more resilient.  It would 
> be good to log the retry clearly as well (with the stream session ID and node 
> address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2016-03-11 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191010#comment-15191010
 ] 

Paulo Motta commented on CASSANDRA-8621:


You probably want to check your TCP keepalive settings: 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html

> For streaming operations, when a socket is closed/reset, we should 
> retry/reinitiate that stream
> ---
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Jeremy Hanna
>Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will 
> timeout and retry the stream operation in the case where tcp is idle for a 
> period of time.  However in the case where the socket is closed or reset, we 
> do not retry the operation.  This can happen for a number of reasons, 
> including when a firewall sends a reset message on a socket during a 
> streaming operation, such as nodetool rebuild necessarily across DCs or 
> repairs.
> Doing a retry would make the streaming operations more resilient.  It would 
> be good to log the retry clearly as well (with the stream session ID and node 
> address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2016-03-11 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191003#comment-15191003
 ] 

Paulo Motta commented on CASSANDRA-8621:


You probably want to check your TCP keepalive settings: 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html

> For streaming operations, when a socket is closed/reset, we should 
> retry/reinitiate that stream
> ---
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Jeremy Hanna
>Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will 
> timeout and retry the stream operation in the case where tcp is idle for a 
> period of time.  However in the case where the socket is closed or reset, we 
> do not retry the operation.  This can happen for a number of reasons, 
> including when a firewall sends a reset message on a socket during a 
> streaming operation, such as nodetool rebuild necessarily across DCs or 
> repairs.
> Doing a retry would make the streaming operations more resilient.  It would 
> be good to log the retry clearly as well (with the stream session ID and node 
> address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-10-30 Thread Nicholas Gaugler (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982842#comment-14982842
 ] 

Nicholas Gaugler commented on CASSANDRA-8621:
-

I constantly suffer from Broken Pipe issues.  Although I've attempted to tweak 
the value of streaming_socket_timeout_in_ms to work around it, rebuilds still 
completely fail.  Is this related?

> For streaming operations, when a socket is closed/reset, we should 
> retry/reinitiate that stream
> ---
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeremy Hanna
>Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will 
> timeout and retry the stream operation in the case where tcp is idle for a 
> period of time.  However in the case where the socket is closed or reset, we 
> do not retry the operation.  This can happen for a number of reasons, 
> including when a firewall sends a reset message on a socket during a 
> streaming operation, such as nodetool rebuild necessarily across DCs or 
> repairs.
> Doing a retry would make the streaming operations more resilient.  It would 
> be good to log the retry clearly as well (with the stream session ID and node 
> address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-07-21 Thread Paulo Motta (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635393#comment-14635393
]

Paulo Motta commented on CASSANDRA-8621:

I'd like to discuss/validate a possible solution before diving into
implementation.

Upon receiving a SocketException during a stablished StreamSession, the
reconnection initiator will:
# Mark its view of the StreamSession as isReconnecting;
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we
may ignore/log them while isReconnecting is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Send a StreamInitMessage to the session peer with isReconnecting flag set
to true.
# After the initialization is complete, the StreamSession.isReconnecting flag
is set to false and the onInitializationComplete() is called to resume the
streaming protocol.
# In case of failure during the process, the initiator will retry to stablish
the connection up to max_streaming_retries property, and fail the stream
session if it's not able to reconnect.

Upon receiving a StreamInitMessage with isReconnecting=true the reconnection
follower will:
# Fetch the StreamSession object for that session:
#* If StreamSession.isReconnecting is set to true on the reconnection follower,
it means that peer is also trying to act as a reconnection initiator, so we
have a conflict. We can use the node identifier or IP as a universal
tie-breaker. Only the peer with the lowest IP/ID will have it's
StreamInitMessage accepted by the other peer in case of a conflict. The other
peer will have its init socket closed.
#* Otherwise, it will set its StreamSession.isReconnecting flag to true.
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we
may ignore them while isReconnecting is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Attach the outgoing socket to the new outgoing message handler.
# After the incoming socket is attached to the incoming message handler, the
session is restablished and the StreamSession.isReconnecting is set to false.
# The session is restablished and everybody is happy.

What do you think of this approach [~yukim]?

For streaming operations, when a socket is closed/reset, we should
retry/reinitiate that stream
---

Key: CASSANDRA-8621
URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jeremy Hanna
Assignee: Paulo Motta

Currently we have a setting (streaming_socket_timeout_in_ms) that will
timeout and retry the stream operation in the case where tcp is idle for a
period of time. However in the case where the socket is closed or reset, we
do not retry the operation. This can happen for a number of reasons,
including when a firewall sends a reset message on a socket during a
streaming operation, such as nodetool rebuild necessarily across DCs or
repairs.
Doing a retry would make the streaming operations more resilient. It would
be good to log the retry clearly as well (with the stream session ID and node
address).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-01-15 Thread Jonathan Shook (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279768#comment-14279768
]

Jonathan Shook commented on CASSANDRA-8621:
---

For the scenario that prompted this ticket, it appeared that the streaming
process was completely stalled. One side of the stream (the sender side) had an
exception that appeared to be a connection reset. The receiving side appeared
to think that the connection was still active, at least in terms of the
netstats reported by nodetool. We were unable to verify whether this was
specifically the case in terms of connected sockets due to the fact that there
were multiple streams for those peers, and there is no simple way to correlate
a specific stream to a tcp session.

[~yukim]
If there is a diagnostic method that we can use to provide more information
about specific stalled streams, please let us know so that we can approach the
user to get more data.

For streaming operations, when a socket is closed/reset, we should
retry/reinitiate that stream
---

Key: CASSANDRA-8621
URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jeremy Hanna
Assignee: Yuki Morishita

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-01-15 Thread Jonathan Shook (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279774#comment-14279774
 ] 

Jonathan Shook commented on CASSANDRA-8621:
---

As well, there were no TCP level errors showing for the receiving side. So it 
is unclear whether exceptions are being omitted, or whether there was something 
really strange occurring with the network.

 For streaming operations, when a socket is closed/reset, we should 
 retry/reinitiate that stream
 ---

 Key: CASSANDRA-8621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jeremy Hanna
Assignee: Yuki Morishita

 Currently we have a setting (streaming_socket_timeout_in_ms) that will 
 timeout and retry the stream operation in the case where tcp is idle for a 
 period of time.  However in the case where the socket is closed or reset, we 
 do not retry the operation.  This can happen for a number of reasons, 
 including when a firewall sends a reset message on a socket during a 
 streaming operation, such as nodetool rebuild necessarily across DCs or 
 repairs.
 Doing a retry would make the streaming operations more resilient.  It would 
 be good to log the retry clearly as well (with the stream session ID and node 
 address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

8 matches

Site Navigation

Mail list logo

Footer information