[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

Nate McCall (JIRA) Tue, 03 Apr 2018 13:33:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424554#comment-16424554
 ]


Nate McCall commented on CASSANDRA-14346:
-----------------------------------------

bq. Do you think it is good for the community that every user is inventing this 
(complex) functionality again and again with different requirements on external 
tools?
 
Absolutely! This gets folks involved in the ecosystem, gaining an understanding 
of a critical piece of functionality while allowing them to do so in an 
environment  in which they are comfortable. 
 
We saw this with thrift-based drivers early on. There were at one point eight 
Java drivers, but Astyanax eventually won out because it was a better design 
that catered to the most common Java programming paradigms. The net effect of 
this is that we trained a whole lot of devs on how to effectively use the APIs 
and were at the point where we as a community answered thrift API and data 
modeling questions in minutes regardless of time of day or channel in which 
they came in. 
 
bq. We continue doing nothing and the community just solves this in different 
ways.
 
So we find ourselves again in a spot where opinionated designs are competing 
and a vendor is offering a commercial solution. I don't call that nothing. We 
have multiple working solutions _right now_ that folks will pick based on the 
needs of their environments. Operations of distsys (at any scale) is quite 
different from one shop to another. 
 
We _are_ duplicating effort but fundamentally, broader community effort, not 
necessarily core Cassandra development resources. The largest benefit of this 
is that we will be stressing/soak testing all the recent work done on repair so 
the mechanism itself will be solid. Once we figure out what works best for most 
users, we build from there, perhaps focusing our efforts in the meantime on a 
meaningful feedback and control mechanism to make this a whole lot easier when 
we do. 
 
I want to be clear that from what I have read and seen so far, I think 
[~jolynch] and [~vinaykumarcse] have done excellent work on thinking this 
through. I'm calling into question the timing and prioritization (vs. 
CASSANDRA-12944 and/or general purpose management plumbing revamp) and maybe 
still whether we are in process, side-car'ed or externally managed (or some 
combination?), but i'll admit there are a quite debatable set of pros and cons 
for each when these are all listed out.
 
My thoughts at this point are that we (for 4.0) ensure repair works really 
well, invoked similarly as it is today out of the box and provide links to 
external options, and we continue this ticket/general discussion targeting 
'trunk' in a post 4.0 released world. 

> Scheduled Repair in Cassandra
> -----------------------------
>
>                 Key: CASSANDRA-14346
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Repair
>            Reporter: Joseph Lynch
>            Priority: Major
>              Labels: CommunityFeedbackRequested
>             Fix For: 4.0
>
>         Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

Reply via email to