[ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735349#comment-15735349
 ] 

Paulo Motta edited comment on CASSANDRA-12905 at 12/9/16 1:46 PM:
------------------------------------------------------------------

Thanks for your great investigative work and proposed patch. Please find a 
bellow a summary of the problems identified to make sure we're on the same page:
1 - Streaming of MVs may never complete due to lock contention throwing WTE 
during MV mutation  application (bug).
2 - MV hint delivery may throw WTE on MV lock contention, even if the node is 
not overloaded (bug)
3 - Streaming of MVs is slow due to MV mutation application (performance 
limitation)

Your original patch fixes 1 and 2 by retrying to acquire MV lock on contention 
instead of throwing WTE for mutations originating from streaming or hints.

While 3 is not actually a bug but a potential performance limitation, it seems 
from your results that the cost of performing bootstrap (and large stream 
sessions in general) of large base tables may become prohibitive due to a large 
amount of mutations going through the write path.

While your proposed approach of performing sstable-based streaming for base and 
MVs may be valid in the consistent range movement case like bootstrap or 
decommission, it is not correct in all scenarios for repair or inconsistent 
range movements like replace and removenode, because it may generate permanent 
inconsistencies in the views which are only fixable by dropping and re-creating 
the view. We may address this by allowing operators to disable mutation-based 
streaming for MVs and streaming if they know what they are doing and can deal 
with the inconsistencies that may arise with their own ways (for instance, by 
ensuring writes are append-only or dealing with it in the application layer), 
but in the general case I'm afraid we will have to live with this until we find 
a better way to deal with it globally, with things like CASSANDRA-9779, 
CASSANDRA-9308, providing a custom tool to repair inconsistencies in views, or 
even redesigning MV consistency ensuring mechanisms.

With this said, since this is blocking release I propose we fix 1 and 2 on this 
ticket with your original patch, and open a new ticket to discuss ways to 
mitigate and/or fix 3, without the pressure of having to block release on this 
and taking the risk of delivering a half-baked solution.

Overall your original patch (without the sstable-based streaming fix) and 
approach looks good, I have the following comments:
- invert the variable from {{dontTimeOut}} to {{droppable}} to facilitate 
understanding, since this is the term used for requests that can throw 
{{WriteTimeoutException}}
- I think we can make hint mutation requests {{deferrable}} and 
{{non-droppable}} to avoid blocking the thread on failure to acquire the lock - 
for this we need to do hint replying asynchronously on {{HintVerbHandler}} 
(similar to what is done on {{MutationVerbHandler}}). The right thing to do 
would be to make hints droppable similar to remote mutations, but this would 
require us to send the hint timestamp down to {{keyspace.mutateMV}}, so if you 
find a non-disruptive way to do this than even better, otherwise we can leave 
it as {{deferrable}} and {{non-droppable}} for the time being.
- Instead of making {{applyBlocking}} throw {{ExecutionException}}, I think we 
can handle this on {{applyBlocking}} itself instead of handling on consumers 
with {{Throwables.propagate(e.getCause())}}.

Since those are mostly cosmetic changes I will kick-off an initial round of 
tests with your initial patch to make sure everything is working as expected:
||3.0||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-12905]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-dtest/lastCompletedBuild/testReport/]|

Please let me if you need help or will not have time to address those and I 
will take care of the changes myself if you agree with those. Thanks again!


was (Author: pauloricardomg):
Thanks for your great investigative work and proposed patch. Please find a 
bellow a summary of the problems identified to make sure we're on the same page:
1 - Streaming of MVs may never complete due to lock contention throwing WTE 
during MV mutation  application (bug).
2 - MV hint delivery may throw WTE on MV lock contention, even if the node is 
not overloaded (bug)
3 - Streaming of MVs is slow due to MV mutation application (performance 
limitation)

Your original patch fixes 1 and 2 by retrying to acquire MV lock on contention 
instead of throwing WTE for mutations originating from streaming or hints.

While 3 is not actually a bug but a potential performance limitation, it seems 
from your results that the cost of performing bootstrap (and large stream 
sessions in general) of large base tables may become prohibitive due to a large 
amount of mutations going through the write path.

While your proposed approach of performing sstable-based streaming for base and 
MVs may be valid in the consistent range movement case like bootstrap or 
decommission, it is not correct in all scenarios for repair or inconsistent 
range movements like replace and removenode, because it may generate permanent 
inconsistencies in the views which are only fixable by dropping and re-creating 
the view. We may address this by allowing operators to disable mutation-based 
streaming for MVs if they know what they are doing and  can deal with the 
inconsistencies that may arise with their own ways (for instance, by ensuring 
writes are append-only or dealing with it in the application layer), but in the 
general case I'm afraid we will have to live with this until we find a better 
way to deal with it globally, with things like CASSANDRA-9779, CASSANDRA-9308, 
providing a custom tool to repair inconsistencies in views, or even redesigning 
MV consistency ensuring mechanisms.

With this said, since this is blocking release I propose we fix 1 and 2 on this 
ticket with your original patch, and open a new ticket to discuss ways to 
mitigate and/or fix 3, without the pressure of having to block release on this 
and taking the risk of delivering a half-baked solution.

Overall your original patch (without the sstable-based streaming fix) and 
approach looks good, I have the following comments:
- invert the variable from {{dontTimeOut}} to {{droppable}} to facilitate 
understanding, since this is the term used for requests that can throw 
{{WriteTimeoutException}}
- I think we can make hint mutation requests {{deferrable}} and 
{{non-droppable}} to avoid blocking the thread on failure to acquire the lock - 
for this we need to do hint replying asynchronously on {{HintVerbHandler}} 
(similar to what is done on {{MutationVerbHandler}}). The right thing to do 
would be to make hints droppable similar to remote mutations, but this would 
require us to send the hint timestamp down to {{keyspace.mutateMV}}, so if you 
find a non-disruptive way to do this than even better, otherwise we can leave 
it as {{deferrable}} and {{non-droppable}} for the time being.
- Instead of making {{applyBlocking}} throw {{ExecutionException}}, I think we 
can handle this on {{applyBlocking}} itself instead of handling on consumers 
with {{Throwables.propagate(e.getCause())}}.

Since those are mostly cosmetic changes I will kick-off an initial round of 
tests with your initial patch to make sure everything is working as expected:
||3.0||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-12905]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-dtest/lastCompletedBuild/testReport/]|

Please let me if you need help or will not have time to address those and I 
will take care of the changes myself if you agree with those. Thanks again!

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12905
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: centos 6.7 x86_64
>            Reporter: Nir Zilka
>            Assignee: Benjamin Roth
>            Priority: Critical
>             Fix For: 3.10
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, 
> private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> ----
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (86400000).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to