[jira] [Comment Edited] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

Benjamin Roth (JIRA) Tue, 29 Nov 2016 00:59:48 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704715#comment-15704715
 ]


Benjamin Roth edited comment on CASSANDRA-12905 at 11/29/16 8:58 AM:
---------------------------------------------------------------------

Problem 3.
All MV updates that happen during bootstrap are sent to batchlog (see 
StorageProxy.mutateMV). This puts so much pressure on the BatchlogManager and 
causes zillions of compactions of system.batches during bootstraps. The fact 
that the batchlog implementation is an antipattern (dont use CS as a queue) 
does not improve the situation. BL has to deal with more and more tombstones 
the larger the log gets. I observed batchlogs with 60GBs.

Not sending tables with MVs through regular write path on bootstrap would solve 
problem (1.) and (3.). (2.) still persists but can easily handled by disabling 
timeout for mutations from hints.

I would even go so far to say that sending streams through the regular write 
path for MVs is never a good idea. This would also alleviate other problems 
like incremental repairs for MVs (CASSANDRA-12888). But that is maybe a 
different story - but IMHO still worth a discussion.


was (Author: brstgt):
Problem 3.
All MV updates that happen during bootstrap are sent to batchlog (see 
StorageProxy.mutateMV). This puts so much pressure on the BatchlogManager and 
causes zillions of compactions of system.batches during bootstraps. The fact 
that the batchlog implementation is an antipattern (dont use CS as a queue) 
does not improve the situation. BL has to deal with more and more tombstones 
the larger the log gets. I observed batchlogs with 60GBs.

Not sending tables with MVs through regular write path on bootstrap would solve 
problem (1.) and (3.). (2.) still persists but can easily handled by disabling 
timeout for mutations from hints.

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12905
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: centos 6.7 x86_64
>            Reporter: Nir Zilka
>            Priority: Critical
>             Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, 
> private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> ----
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (86400000).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

Reply via email to