[jira] [Comment Edited] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC

Benjamin Roth (JIRA) Fri, 03 Mar 2017 01:58:11 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818955#comment-15818955
 ]


Benjamin Roth edited comment on CASSANDRA-12888 at 3/3/17 9:57 AM:
-------------------------------------------------------------------

Hi Victor,

We use MVs in Production with billions of records without known data loss.
Painful + slow refers to repairs and range movements (e.g. bootstrap +
decommission). Also (as mentioned in this ticket) incremental repairs dont
work, so full repair creates some overhead. Until 3.10 there are bugs
leading to write timeouts, even to NPEs and completely blocked mutation
stages. This could even bring your cluster down. In 3.10 some issues have
been resolved - actually we use a patched trunk version which is 1-2 months
old.

Depending on your model, MVs can help a lot from a developer perspective.
Some cases are very resource intensive to manage without MVs, requiring
distributed locks and/or CAS.
For append-only workloads, it may be simpler to NOT use MVs at the moment.
They aren't very complex and MVs wont help that much compared to the
problems that may raise with them.

Painful scenarios: There is no recipe for that. You may or may not
encounter performance issues, depending on your model and your workload.
I'd recommend not to use MVs that use a different partition key on the MV
than on the base table as this requires inter-node communication for EVERY
write operation. So you can easily kill your cluster with bulk operations
(like in streaming).

At the moment our cluster runs stable but it took months to find all the
bottlenecks, race conditions, resume from failures and so on. So my
recommendation: You can get it work but you need time and you should not
start with critical data, at least if it is not backed by another stable
storage. And you should use 3.10 when it is finally released or build your
own version from trunk. I would not recommend to use < 3.10 for MVs.

Btw.: Our own patched version does some dirty tricks, that may lead to
inconsistencies in some situations but we prefer some possible
inconsistencies (we can deal with) over performance bottlenecks. I created
several tickets to improve MV performance in some streaming situations but
it will take some time to really improve that situation.

Does this answer your question?


was (Author: brstgt):
Hi Victor,

We use MVs in Production with billions of records without known data loss.
Painful + slow refers to repairs and range movements (e.g. bootstrap +
decommission). Also (as mentioned in this ticket) incremental repairs dont
work, so full repair creates some overhead. Until 3.10 there are bugs
leading to write timeouts, even to NPEs and completely blocked mutation
stages. This could even bring your cluster down. In 3.10 some issues have
been resolved - actually we use a patched trunk version which is 1-2 months
old.

Depending on your model, MVs can help a lot from a developer perspective.
Some cases are very resource intensive to manage without MVs, requiring
distributed locks and/or CAS.
For append-only workloads, it may be simpler to NOT use MVs at the moment.
They aren't very complex and MVs wont help that much compared to the
problems that may raise with them.

Painful scenarios: There is no recipe for that. You may or may not
encounter performance issues, depending on your model and your workload.
I'd recommend not to use MVs that use a different partition key on the MV
than on the base table as this requires inter-node communication for EVERY
write operation. So you can easily kill your cluster with bulk operations
(like in streaming).

At the moment our cluster runs stable but it took months to find all the
bottlenecks, race conditions, resume from failures and so on. So my
recommendation: You can get it work but you need time and you should not
start with critical data, at least if it is not backed by another stable
storage. And you should use 3.10 when it is finally released or build your
own version from trunk. I would not recommend to use < 3.10 for MVs.

Btw.: Our own patched version does some dirty tricks, that may lead to
inconsistencies in some situations but we prefer some possible
inconsistencies (we can deal with) over performance bottlenecks. I created
several tickets to improve MV performance in some streaming situations but
it will take some time to really improve that situation.

Does this answer your question?






-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


> Incremental repairs broken for MVs and CDC
> ------------------------------------------
>
>                 Key: CASSANDRA-12888
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12888
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Stefan Podkowinski
>            Assignee: Benjamin Roth
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x
>
>
> SSTables streamed during the repair process will first be written locally and 
> afterwards either simply added to the pool of existing sstables or, in case 
> of existing MVs or active CDC, replayed on mutation basis:
> As described in {{StreamReceiveTask.OnCompletionRunnable}}:
> {quote}
> We have a special path for views and for CDC.
> For views, since the view requires cleaning up any pre-existing state, we 
> must put all partitions through the same write path as normal mutations. This 
> also ensures any 2is are also updated.
> For CDC-enabled tables, we want to ensure that the mutations are run through 
> the CommitLog so they can be archived by the CDC process on discard.
> {quote}
> Using the regular write path turns out to be an issue for incremental 
> repairs, as we loose the {{repaired_at}} state in the process. Eventually the 
> streamed rows will end up in the unrepaired set, in contrast to the rows on 
> the sender site moved to the repaired set. The next repair run will stream 
> the same data back again, causing rows to bounce on and on between nodes on 
> each repair.
> See linked dtest on steps to reproduce. An example for reproducing this 
> manually using ccm can be found 
> [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC

Reply via email to