[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182634#comment-16182634 ] Jeremy Hanna commented on CASSANDRA-10070: -- Doesn't look like CASSANDRA-8911 is moving forward, so it seems like this or things that [~vinaykumarcse] was talking about at NGCC yesterday or a combination could move forward. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 4.x > > Attachments: Distributed Repair Scheduling.doc, Distributed Repair > Scheduling_V2.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324979#comment-15324979 ] Paulo Motta commented on CASSANDRA-10070: - After discussion on NGCC we decided to put this on hold while we have a better definition on mutation-based repairs (MBR) (CASSANDRA-8911), since if that moves forward we will deprecate merkle-tree based repair in favor of MBR removing the need for automatic repair scheduling, since MBR will be continuous. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc, Distributed Repair > Scheduling_V2.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270836#comment-15270836 ] Marcus Olsson commented on CASSANDRA-10070: --- @[~jbellis] bq. How closely does this match the design doc from February? Is it worth posting an updated design for those of us joining late? I'd say there have been enough changes for it to be a good idea to update the document, so I'll work on that! :) > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269618#comment-15269618 ] Jonathan Ellis commented on CASSANDRA-10070: How closely does this match the design doc from February? Is it worth posting an updated design for those of us joining late? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176519#comment-15176519 ] Paulo Motta commented on CASSANDRA-10070: - bq. We start with the resource locking and then move on to the maintenance scheduling API. And after that I think most tasks could be discussed in parallel. Also I removed the task for management commands since I think it would be easier to add them while implementing the features. +1 bq. I've now created the sub-tasks and linked them to this issue. I didn't include the node configuration since it might be redundant but we might add it later on if we feel the need to. Awesome! > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169498#comment-15169498 ] Marcus Olsson commented on CASSANDRA-10070: --- I've now created the sub-tasks and linked them to this issue. I didn't include the node configuration since it might be redundant but we might add it later on if we feel the need to. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163462#comment-15163462 ] Marcus Olsson commented on CASSANDRA-10070: --- For the basic implementation I think the tasks could be broken down as: - Resource locking API & implementation -- Maintenance scheduling API & basic repair scheduling --- Rejection policy interface & default implementations --- Configuration support Table configuration Global configuration (for pausing repairs in the basic implementation) Node configuration --- Aborting/interrupting repairs (Requires CASSANDRA-3486,CASSANDRA-11190) Polling and monitoring module Failure handling and retry So that we start with the resource locking and then move on to the maintenance scheduling API. And after that I think most tasks could be discussed in parallel. Also I removed the task for management commands since I think it would be easier to add them while implementing the features. WDYT? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159286#comment-15159286 ] Marcus Olsson commented on CASSANDRA-10070: --- bq. Sounds good! We could ask the user to pause, but I think doing that automatically via "system interrupts" is better. It just ocurred to me that both "the pause" or "system interrupts" will prevent new repairs from starting, but what about already running repairs? We will probably want to interrupt already running repairs as well in some situations. For this reason CASSANDRA-3486 is also relevant for this ticket (adding it as a dependency of this ticket). +1 bq. Then I think we should either have timeout, or add an ability to cancel/interrupt a running scheduled repair in the initial version, to avoid hanging repairs to render the automatic repair scheduling useless. I think the timeout would be good enough in the initial version. I guess the interruption of repairs would be handled by CASSANDRA-3486? Perhaps it would be possible to extend that feature later to be able to cancel a scheduled repair? Here I'm thinking that the interruption is stopping the running repair and allowing the scheduled job to retry it immediately, while cancelling it would prevent the scheduled job from retrying it immediately. bq. WDYT? Feel free to update or break-up into smaller or larger subtasks, and then create the actual subtasks to start work on them. Sounds good, I'll have a closer look on the subtasks tomorrow! I guess we will have sort of a dependency tree for some of the tasks. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158968#comment-15158968 ] Paulo Motta commented on CASSANDRA-10070: - bq. But in that case the pause/stop feature should be implemented as early as possible to avoid having an upgrade scenario that requires the user to upgrade to the version that introduces the pause feature before upgrading to the latest. Another way would be to have the "system interrupts" feature in place early, so that the repairs would be paused during an upgrade. Sounds good! We could ask the user to pause, but I think doing that automatically via "system interrupts" is better. It just ocurred to me that both "the pause" or "system interrupts" will prevent new repairs from starting, but what about already running repairs? We will probably want to interrupt already running repairs as well in some situations. For this reason CASSANDRA-3486 is also relevant for this ticket (adding it as a dependency of this ticket). bq. I think the timeout might be good to have to prevent a hang from stopping the entire repair process. But I think it would only work if the repair would only hang occasionally, otherwise the same repair would be retried until it is marked as a "fail". +1. Then I think we should either have timeout, or add an ability to cancel/interrupt a running scheduled repair in the initial version, to avoid hanging repairs to render the automatic repair scheduling useless. bq. Another option is to have a "slow repair"-detector that would log a warning if a repair session is taking too long time, to avoid aborting it if it's actually repairing and leaving it up to the user to handle it. Either way I'd say it's out of the scope of the initial version. bq. We might also want to be able to detect if it would be impossible to repair the whole cluster within gc grace and report it to the user. This could happen for multiple reasons like too many tables, too many nodes, too few parallel repairs or simply overload. I guess it would be hard to make accurate predictions with all of these variables so it might be good enough to check through the history of the repairs, do an estimation of the time and compare it to gc grace? I think this is something out of scope for the first version, but I thought I'd just mention it here to remember it. Nice! These could probably live in a separate repair metrics and alert module in the future, allowing users to track statistics, issue alerts/warnings based on history and allow the scheduler to perform more advanced adaptive scheduling. Some metrics to track: * Repair time per session ** Break up of time per phase (validation, sync, anticompaction, etc) * Repair time per node * Validation mismatch % * Fail count bq. Should we maybe compile a list of "features that should be in the initial version" and also a "improvements" list for future work to make the scope clear? Sounds good! Below is a suggested list of subtasks: * Basic functionality ** Resource locking API and implementation ** Maintenance scheduling API and metadata ** Basic scheduling support ** Polling and monitoring module ** Pausing and aborting support ** Rejection policies (includes system interrupts and maintenance windows) ** Failure handling and retry ** Configuration support ** Frontend support (table options, management commands) * Optional/deferred functionality ** Parallel repair session support ** Subrange repair support ** Maintenance history ** Timeout ** Metrics ** Alerts WDYT? Feel free to update or break-up into smaller or larger subtasks, and then create the actual subtasks to start work on them. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158634#comment-15158634 ] Marcus Olsson commented on CASSANDRA-10070: --- bq. We could probably replace the single resource lock ('RepairResource\-{dc}\-{i}') with global ('Global\-{dc}\-{i}') or mutually exclusive resources ('CleanupAndRepairResource-{dc}-{i}') later if necessary. We'll probably only need some special care during upgrades when we introduce these new locks, but other than that I don't see any problem that could arise with renaming the resources later if necessary. Do you see any issue with this approach? No that should probably work, so we can have it as 'RepairResource-\{dc\}-\{i\}' for now. For the upgrades we could add a release note that says something like "pause/stop all scheduled repairs while upgrading from x.y to x.z". But in that case the pause/stop feature should be implemented as early as possible to avoid having an upgrade scenario that requires the user to upgrade to the version that introduces the pause feature before upgrading to the latest. Another way would be to have the "system interrupts" feature in place early, so that the repairs would be paused during an upgrade. bq. Created CASSANDRA-11190 for failing repairs fast and linked as a requirement of this ticket. Great! bq. No unless there is a bug. Repair messages are undroppable, and the nodes report the coordinator on failure. bq. We could probably handle explicit failures in CASSANDRA-11190 making sure all nodes are properly informed and abort their operations in case of failures in any of the nodes. The timeout in this context could be helpful in case of hangs in streaming or validation. But I suppose that as the protocol becomes more mature/correct and with fail fast in place these hanging situations will become more rare so I'm not sure timeouts would be required if we assume there are no hangs. I guess we can leave them out of the initial version for simplicity and add them later if necessary. I think the timeout might be good to have to prevent a hang from stopping the entire repair process. But I think it would only work if the repair would only hang occasionally, otherwise the same repair would be retried until it is marked as a "fail". Another option is to have a "slow repair"-detector that would log a warning if a repair session is taking too long time, to avoid aborting it if it's actually repairing and leaving it up to the user to handle it. Either way I'd say it's out of the scope of the initial version. --- We might also want to be able to detect if it would be impossible to repair the whole cluster within gc grace and report it to the user. This could happen for multiple reasons like too many tables, too many nodes, too few parallel repairs or simply overload. I guess it would be hard to make accurate predictions with all of these variables so it might be good enough to check through the history of the repairs, do an estimation of the time and compare it to gc grace? I think this is something out of scope for the first version, but I thought I'd just mention it here to remember it. Should we maybe compile a list of "features that should be in the initial version" and also a "improvements" list for future work to make the scope clear? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154215#comment-15154215 ] Paulo Motta commented on CASSANDRA-10070: - bq. Another thing we should probably consider is whether or not multiple types of maintenance work should run simultaneously. If we need to add this constraint, should they use the same lock resources? We could probably replace the single resource lock ('RepairResource\-\{dc\}\-\{i\}') with global ('Global\-\{dc\}\-\{i\}') or mutually exclusive resources ('CleanupAndRepairResource-\{dc\}-\{i\}') later if necessary. We'll probably only need some special care during upgrades when we introduce these new locks, but other than that I don't see any problem that could arise with renaming the resources later if necessary. Do you see any issue with this approach? bq. Sounds good, let's start with the lockResource field in the repair session and move to scheduled repairs all together later on (maybe optionally scheduled via JMX at first?). +1 bq. But as you said, it should be done in a separate ticket. Created CASSANDRA-11190 for failing repairs fast and linked as a requirement of this ticket. bq. Would it be possible for a node to "drop" a validation/streaming without notifying the repair coordinator? No unless there is a bug. Repair messages are undroppable, and the nodes report the coordinator on failure. bq. Do we have any time out scenarios that we could foresee before they occur? If we could detect that, it would be good to abort the repair as early as possible, assuming that the timeout would be set rather high. We could probably handle explicit failures in CASSANDRA-11190 making sure all nodes are properly informed and abort their operations in case of failures in any of the nodes. The timeout in this context could be helpful in case of hangs in streaming or validation. But I suppose that as the protocol becomes more mature/correct and with fail fast in place these hanging situations will become more rare so I'm not sure timeouts would be required if we assume there are no hangs. I guess we can leave them out of the initial version for simplicity and add them later if necessary. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148842#comment-15148842 ] Marcus Olsson commented on CASSANDRA-10070: --- bq. Do we intend to reuse the lock table for other maintenance tasks as well? If so, we must add a generic "holder" column to the lock table so we can reuse to identify resources other than the parent repair session in the future. We could also add an "attributes" map in the lock table to store additional attributes such as status, or have a separate table to maintain status to keep the lock table simple. I think it could be reused, so it's probably better to do it generic from the start. I think that as long as we don't put too much data in the attributes map, it could be stored in the lock table. Another thing is that it's tightly bound to the lock itself, since we will use it to clean up repairs without a lock, which means keeping it in a single table is probably the easiest solution. Another thing we should probably consider is whether or not multiple types of maintenance work should run simultaneously. If we need to add this constraint, should they use the same lock resources? bq. Ideally all repairs would go through this interface, but this would probably add complexity at this stage. So we should probably just add a "lockResource" attribute to each repair session object, and each node would go through all repairs currently running checking if it still holds the lock in case the "lockResource" field is set. Sounds good, let's start with the lockResource field in the repair session and move to scheduled repairs all together later on (maybe optionally scheduled via JMX at first?). {quote} It would probably be safe to abort ongoing validation and stream background tasks and cleanup repair state on all involved nodes before starting a new repair session in the same ranges. This doesn't seem to be done currently. As far as I understood, if there are nodes A, B, C running repair, A is the coordinator. If validation or streaming fails on node B, the coordinator (A) is notified and fails the repair session, but node C will remain doing validation and/or streaming, what could cause problems (or increased load) if we start another repair session on the same range. We will probably need to extend the repair protocol to perform this cleanup/abort step on failure. We already have a legacy cleanup message that doesn't seem to be used in the current protocol that we could maybe reuse to cleanup repair state after a failure. This repair abortion will probably have intersection with CASSANDRA-3486. In any case, this is a separate (but related) issue and we should address it in an independent ticket, and make this ticket dependent on that. {quote} Right now it seems that the cleanup message is only used to remove the parent repair session from the ActiveRepairService's map. I guess that if we should use it we would have to rewrite it to stop validation and streaming as well. But as you said, it should be done in a separate ticket. bq. Another unrelated option that we should probably include in the future is a timeout, and abort repair sessions running longer than that. Agreed. Do we have any time out scenarios that we could foresee before they occur? Would it be possible for a node to "drop" a validation/streaming without notifying the repair coordinator? If we could detect that, it would be good to abort the repair as early as possible, assuming that the timeout would be set rather high. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147751#comment-15147751 ] Paulo Motta commented on CASSANDRA-10070: - Starting with a single repair per dc and adding support for parallel repair sessions later sounds like a good idea. bq. I agree and we could probably store the parent repair session id in an extra column of the lock table and have a thread wake up periodically to see if there are repair sessions without locks. Do we intend to reuse the lock table for other maintenance tasks as well? If so, we must add a generic "holder" column to the lock table so we can reuse to identify resources other than the parent repair session in the future. We could also add an "attributes" map in the lock table to store additional attributes such as status, or have a separate table to maintain status to keep the lock table simple. bq. But then we must somehow be able to differentiate user-defined and automatically scheduled repair sessions. It could be done by having all repairs go through this scheduling interface, which also would reduce user mistakes with multiple repairs in parallel. Another alternative is to have a custom flag in the parent repair that makes the garbage collector ignore it if it's user-defined. I think that the garbage collector/cancel repairs when unable to lock feature is something that should be included in the first pass. Ideally all repairs would go through this interface, but this would probably add complexity at this stage. So we should probably just add a "lockResource" attribute to each local repair session object (as opposed to only the parent repair object), and each node would go through all repairs currently running checking if it still holds the lock if the "lockResource" field is set. bq. The most basic failure scenarios should be covered by retrying a repair if it fails and log a warning/error based on how many times it failed. Could the retry behaviour cause some unexpected consequences? It would probably be safe to abort ongoing validation and stream background tasks and cleanup repair state on all involved nodes before starting a new repair session in the same ranges. This doesn't seem to be done currently. As far as I understood, if there are nodes A, B, C running repair, A is the coordinator. If validation or streaming fails on node B, the coordinator (A) is notified and fails the repair session, but node C will remain doing validation and/or streaming, what could cause problems (or increased load) if we start another repair session on the same range. We will probably need to extend the repair protocol to perform this cleanup/abort step on failure. We already have a legacy cleanup message that doesn't seem to be used in the current protocol that we could maybe reuse to cleanup repair state after a failure. This repair abortion will probably have intersection with CASSANDRA-3486. In any case, this is a separate (but related) issue and we should address it in an independent ticket, and make this ticket dependent on that. Another unrelated option that we should probably include in the future is a timeout, and abort repair sessions running longer than that. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147579#comment-15147579 ] Marcus Olsson commented on CASSANDRA-10070: --- {quote} All data centers involved in a repair must be available for a repair to start/succeed, so if we make the lock resource dc-aware and try to create the lock by contacting a node in each involved data center with LOCAL_SERIAL consistency that should be sufficient to ensure correctness without the need for a global lock. This will also play along well with both dc_parallelism global option and with the --local or --dcs table repair options. {quote} {quote} The second alternative is probably the most desireable. Actually dc_parallelism by itself might cause problems, since we can have a situation where all repairs run in a single node or range, overloading those nodes. If we are to support concurrent repairs in the first pass, I think we need both dc_parallelism and node_parallelism options together. {quote} {quote} This is becoming a bit complex and there probably are some edge cases and/or starvation scenarios so we should think carefully about before jumping into implementation. What do you think about this approach? Should we stick to a simpler non-parallel version in the first pass or think this through and already support parallelism in the first version? {quote} I like the approach with using local serial for each dc and having specialized keys. I think we could include the dc parallelism lock with "RepairResource-\{dc}-\{i}" but only allow one repair per data center by hardcoding "i" to 1 in the first pass. This should make the upgrades easier when we do allow parallel repairs. I like the node locks approach as well, but as you say there are probably some edge cases so we could wait with adding them until we allow parallel repairs and I don't think it would break the upgrades by introducing them later. {quote} We should also think better about possible failure scenarios and network partitions. What happens if the node cannot renew locks in a remote DC due to a temporary network partition but the repair is still running ? We should probably cancel a repair if not able to renew the lock and also have some kind of garbage collector to kill ongoing repair sessions without associated locks to protect from disrespecting the configured dc_parallelism and node_paralellism. {quote} I agree and we could probably store the parent repair session id in an extra column of the lock table and have a thread wake up periodically to see if there are repair sessions without locks. But then we must somehow be able to differentiate user-defined and automatically scheduled repair sessions. It could be done by having all repairs go through this scheduling interface, which also would reduce user mistakes with multiple repairs in parallel. Another alternative is to have a custom flag in the parent repair that makes the garbage collector ignore it if it's user-defined. I think that the garbage collector/cancel repairs when unable to lock feature is something that should be included in the first pass. The most basic failure scenarios should be covered by retrying a repair if it fails and log a warning/error based on how many times it failed. Could the retry behaviour cause some unexpected consequences? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142861#comment-15142861 ] Paulo Motta commented on CASSANDRA-10070: - Sorry for the delay, will try to be faster on next iterations. Below are some comments in your previous reply: bq. A problem with this table is that if we have a setup with two data centers and three replicas in each data center, then we have a total of six replicas and QUORUM would require four replicas to succeed. This would require that both data centers are available to be able to run repair. All data centers involved in a repair must be available for a repair to start/succeed, so if we make the lock resource dc-aware and try to create the lock by contacting a node in each involved data center with LOCAL_SERIAL consistency that should be sufficient to ensure correctness without the need for a global lock. This will also play along well with both dc_parallelism global option and with the {{\-\-local}} or {{\-\-dcs}} table repair options. I thought of something along those lines: {noformat} dc_locks = {} dcs = repair_dcs(keyspace, table) # this will depend on both keyspace settings and table repair settings (--local or --dcs) for dc in dcs: for i in 0..dc_parallelism(dc): if ((lock = get_node(dc).execute("INSERT INTO lock (resource) VALUES ('RepairResource-{dc}-{i}') IF NOT EXISTS USING TTL 30;", LOCAL_SERIAL) != nil) dc_locks[dc] = lock if len(dc_locks) != len(dcs): release_locks(dc_locks) else: start_repair(table) {noformat} bq. Just a questions regarding your suggestion with the node_repair_parallelism. Should it be used to specify the number of repairs a node can initiate or how many repairs the node can be an active part of in parallel? I guess the second alternative would be harder to implement, but it is probably what one would expect. The second alternative is probably the most desireable. Actually dc_parallelism by itself might cause problems, since we can have a situation where all repairs run in a single node or range, overloading those nodes. If we are to support concurrent repairs in the first pass, I think we need both dc_parallelism and node_parallelism options together. I thought we could extend the previous lock acquiring algorithm with: {noformat} dc_locks = previous algorithm if len(dc_locks) != len(dcs): release_locks(dc_locks) return; node_locks = {} nodes = repair_nodes(table, range) for node in nodes: for i in 0..node_parallelism(node): if ((lock = node.execute("INSERT INTO lock (resource) VALUES ('RepairResource-{node}-{i}') IF NOT EXISTS USING TTL 30;", LOCAL_SERIAL)) != nil) node_locks[node] = lock break; if len(node_locks) != len(nodes): release_locks(dc_locks) release_locks(node_locks) else: start_repair(table) {noformat} This is becoming a bit complex and there probably are some edge cases and/or starvation scenarios so we should think carefully about before jumping into implementation. What do you think about this approach? Should we stick to a simpler non-parallel version in the first pass or think this through and already support parallelism in the first version? bq. It should be possible to extend the repair scheduler with subrange repairs I like the token_division approach for supporting subrange repairs in addition to {{-pr}}, but we can think about this later. bq. Agreed, are there any other scenarios that we might have to take into account? I can only think of upgrades and range movements (bootstrap, move, removenode, etc) right now. We should also think better about possible failure scenarios and network partitions. What happens if the node cannot renew locks in a remote DC due to a temporary network partition but the repair is still running ? We should probably cancel a repair if not able to renew the lock and also have some kind of garbage collector to kill ongoing repair sessions without associated locks to protect from disrespecting the configured {{dc_parallelism}} and {{node_paralellism}}. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134231#comment-15134231 ] Marcus Olsson commented on CASSANDRA-10070: --- [~yukim] [~pauloricardomg] Thanks for the comments, great questions/suggestions! Regarding your questions about the locking: {quote} * What would "lock resource" be like for repair scheduling? I think the value controls number of repair jobs running at given time in the whole cluster, and we don't want to run as many repair jobs at once. * I second Yuki Morishita's first question above, in that we need to better specify how is cluster-wide repair parallelism handled: is it fixed or configurable? can a node run repair for multiple ranges in parallel? Perhaps we should have a node_repair_paralellism (default 1) and dc_repair_parallelism (default 1) global config and reject starting repairs above those thresholds. {quote} The thought with the lock resource was that it could be something simple, like a table defined as: {noformat} CREATE TABLE lock ( resource text PRIMARY KEY ) {noformat} And then the different nodes would try to get the lock using LWT with TTL: {noformat} INSERT INTO lock (resource) VALUES ('RepairResource') IF NOT EXISTS USING TTL 30; {noformat} After that the node would have to continue to update the locked resource while running the repair to prevent that someone else gets the locked resource. The value "RepairResource" could just as easily be defined as "RepairResource-N", so that it would be possible to allow repairs to run in parallel. A problem with this table is that if we have a setup with two data centers and three replicas in each data center, then we have a total of six replicas and QUORUM would require four replicas to succeed. This would require that both data centers are available to be able to run repair. Since some of the keyspaces might not be replicated across both data centers we would still have to be able to run repair even if one of the data centers is unavailable. This also applies if we should "force" local dc repairs if a data center has been unavailable too long. There are two options as I see it on how to solve this: * Get the lock with local_serial during these scenarios. * Have a separate lock table for each data center *and* a global one. I guess the easiest solution would be to use local_serial, but I'm not sure if it might cause some unexpected behavior. If we would go for the other option with separate tables it would probably increase the overall complexity, but it would make it easier to restrict the number of parallel repairs in a single data center. Just a questions regarding your suggestion with the node_repair_parallelism. Should it be used to specify the number of repairs a node can initiate or how many repairs the node can be an active part of in parallel? I guess the second alternative would be harder to implement, but it is probably what one would expect. --- {quote} * It seems the scheduling only makes sense for repairing primary range of the node ('nodetool -pr') since we end up repairing all nodes eventually. Are you considering other options like subrange ('nodetool -st -et') repair? * For subrange repair, we could maybe have something similar to reaper's segmentCount option, but since this would add more complexity we could leave for a separate ticket. {quote} It should be possible to extend the repair scheduler with subrange repairs, either by having it as an option per table or by having a separate scheduler for it. The separate scheduler would just be another plugin that could replace the default repair scheduler. If we go for a table configuration it could be that the user either specifies pr or the number of segments to divide the token range in, something like: {noformat} repair_options = {..., token_division='pr'}; // Use primary range repair or repair_options = {..., token_division='2048'}; // Divide the token range in 2048 slices {noformat} If we would have a separate scheduler it could just be a configuration for it. Personally I would prefer to have it all in a single scheduler and I agree that it should probably be a separate ticket to keep the complexity of the base scheduler to a minimum. But I think this is a feature that will be very much needed both with non-vnode token assignment and also with the possibility to reduce the number of vnodes as of CASSANDRA-7032. --- {quote} * While pausing repair is a nice future for user-based interruptions, we could probably embed system known interruptions (such as when a bootstrap or upgrade is going on) in the default rejection logic. {quote} Agreed, are there any other scenarios that we might have to take into account? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL:
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133253#comment-15133253 ] Paulo Motta commented on CASSANDRA-10070: - Nice work [~molsson]. Overall the design doc looks great and addresses most of the issues raised previously, just a few minor comments/questions: * I second [~yukim]'s first question above, in that we need to better specify how is cluster-wide repair parallelism handled: is it fixed or configurable? can a node run repair for multiple ranges in parallel? Perhaps we should have a {{node_repair_paralellism}} (default 1) and {{dc_repair_parallelism}} (default 1) global config and reject starting repairs above those thresholds. * For subrange repair, we could maybe have something similar to [reaper|https://github.com/spotify/cassandra-reaper]'s {{segmentCount}} option, but since this would add more complexity we could leave for a separate ticket. * While pausing repair is a nice future for user-based interruptions, we could probably embed system known interruptions (such as when a bootstrap or upgrade is going on) in the default rejection logic. Maybe the spotify reaper folks have something to add based on their experience with automatic repair scheduling (cc [~Bj0rn], [~zvo]). > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132680#comment-15132680 ] Yuki Morishita commented on CASSANDRA-10070: [~molsson] Thanks for the write up. I have couple of questions: * What would "lock resource" be like for repair scheduling? I think the value controls number of repair jobs running at given time in the whole cluster, and we don't want to run as many repair jobs at once. * It seems the scheduling only makes sense for repairing primary range of the node ('nodetool -pr') since we end up repairing all nodes eventually. Are you considering other options like subrange ('nodetool -st -et') repair? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128529#comment-15128529 ] Jonathan Ellis commented on CASSANDRA-10070: [~devdazed], you had some great suggestions above. Do you have time to look at the draft Marcus attached? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108704#comment-15108704 ] Marcus Olsson commented on CASSANDRA-10070: --- I completely agree, I should create a document describing these things. I've also thought about making a high level document for the whole proposal, so as to see if everyone agrees that this is the way to go about the distributed scheduling. Then we can take it from there and revise the proposal and hopefully later on break the JIRA into several tasks to make it easier to review and develop this feature. I think this document should contain: * High level description of proposal (flow charts, etc.) * Problems that could occur and possible solutions Any thoughts or ideas on this? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107570#comment-15107570 ] Jonathan Ellis commented on CASSANDRA-10070: Marcus, I think Russell has made some very valuable suggestions as to the kind of complications we need to be thinking about here. Before jumping back to another patch, I think it would be useful to put together a high level design document that thinks through these questions and proposes approaches to deal with them. Then we can get feedback to you faster than at the level of actual code. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050440#comment-15050440 ] Marcus Olsson commented on CASSANDRA-10070: --- {quote} While it may intuitively seem like you want to kick-off a repair as soon as a node comes back online, it can be very dangerous in a production environment. Starting the most resource intensive process on a node that is already problematic, in a cluster that is already having issues can exacerbate the issue and lead to a longer outage, or degradation, than anticipated. {quote} True, it should probably be a feature enabled by the user and maybe with a configurable delay before it actually performs the repair? {quote} Network reliability is also another aspect of this. Lets say you have 3 nodes, RF=3 and there is a partition dividing node A and node B. All nodes are still actually, up, but in this case node A will start a repair on B and B will start a repair on A. Now 2/3 of your cluster is un-needly repairing which can cause serious performance problems, especially when running a loaded cluster. {quote} The repairs are still executed with respect to the distributed locking, so there would only be one node running repair at a time. But they would send the job information to each other in parallel. {quote} Also: Other times you might not want a repair automatically started: * The cluster is in the middle of a rolling upgrade where streaming is broken between versions. * Heavily loaded clusters during normal operation (some users schedule repairs at night to not affect performance during normal hours of operation) * Clusters where the read-consistency is high enough to account for the hints beyond the window allowing the user to schedule the repair for a time that makes sense for their cluster and use-case. {quote} * This is something that the repair scheduler should be handling either way, to avoiding repairing if the cluster is unable to perform it. (version incompatibility, nodes are down, etc.) * There is a plug-in point for schedule policies that can be used to decide if repairs should run, so it would be possible to prevent repairs due to some condition(s). The conditions could be based on what the user wants, be it maintenance windows or resource usage. It would also be possible to prevent normal scheduled repairs during some hours, but allow manually scheduled repairs at all times. * This would be possible by making this feature optional. --- {quote} I don't know much about Cassandra internals, so one of the regular devs would know better, buy my thought would be during a restart, somewhere it figures out that it needs to replay part of the commit log to rebuild memtables that hadn't been flushed to disk. The timestamp of the last thing in the commit log might be a good estimate of when the node went down, and you could compare that to the current time to figure out how long the node was down. I wouldn't worry about the second case since it would be hard to get that right. {quote} Looking at the commitlog might be a good enough approach. I'll look in to that. --- Overall I'd say that if this feature(exceeding hint window repairs) should exist, it should probably be something that is enabled per table, but disabled by default. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044921#comment-15044921 ] Jim Meyer commented on CASSANDRA-10070: --- Wouldn't it be safer if node A checked itself how long it had been down and scheduled its own repairs? Why have node B guess that node A was down? I've seen cases where nodes couldn't communicate so they think the other node is down, when actually both nodes are up. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045189#comment-15045189 ] Marcus Olsson commented on CASSANDRA-10070: --- I agree that it would probably be safer for node A to check how long it has been down itself, but I'm not sure how that can be done reliably. But also if node A & B couldn't communicate for a time period longer than the hint window they will not have hints. So in that case they should do a repair even if both were up the whole time. Note that I'm not against having the check on the node that was down, it's just that I think that both the case that a node was down and that two nodes was unable to communicate should require a repair. If the second case is not required do you have any suggestions on how the self-check could be implemented? > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045293#comment-15045293 ] Russell Bradberry commented on CASSANDRA-10070: --- While it may intuitively seem like you want to kick-off a repair as soon as a node comes back online, it can be very dangerous in a production environment. Starting the most resource intensive process on a node that is already problematic, in a cluster that is already having issues can exacerbate the issue and lead to a longer outage, or degradation, than anticipated. Network reliability is also another aspect of this. Lets say you have 3 nodes, RF=3 and there is a partition dividing node A and node B. All nodes are still actually, up, but in this case node A will start a repair on B and B will start a repair on A. Now 2/3 of your cluster is un-needly repairing which can cause serious performance problems, especially when running a loaded cluster. Also: Other times you might not want a repair automatically started: - The cluster is in the middle of a rolling upgrade where streaming is broken between versions. - Heavily loaded clusters during normal operation (some users schedule repairs at night to not affect performance during normal hours of operation) - Clusters where the read-consistency is high enough to account for the hints beyond the window allowing the user to schedule the repair for a time that makes sense for their cluster and use-case. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044637#comment-15044637 ] Marcus Olsson commented on CASSANDRA-10070: --- [~zemeyer] I've added the possibility to schedule a job remotely, so that one node can tell another node to run a certain job. Right now it's used for when a node discovers that another node has been down longer than the possible hint window, and then tells that node to repair it's ranges ASAP. The remote scheduling is using the distributed locking mechanism to avoid that multiple nodes try to tell the same node to run the repair at the same time. So a simple flow could be: Node A goes down at 12:00 Node B recognizes it and saves "Node A DOWN @ 12:00" locally Node A comes back up at 16:00 Node B sees Node A as online again at 16:00 and sees that Node A has been down since 12:00, 4 hours. Node B sends a repair job to Node A for each table that has a hint window that is 4 hours or less. Node A runs all repairs --- I'll continue to work on the feature of pausing all repairs and also the prevention mechanism. I've done some work for the prevention mechanism for jobs in that it checks the job history for repairs and only returns that it *can* run a repair if any range hasn't been repaired within the hint window (it's still based on the interval though, so the repair shouldn't run more than once per interval in the normal case). To the prevention mechanism I should probably add a way for it to avoid doing multiple repairs for a single node at the same time. After that I'll add the possibility to run parallel repair tasks over the cluster. --- The git branch is [here|https://github.com/emolsson/cassandra/commits/10070]. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045336#comment-15045336 ] Jim Meyer commented on CASSANDRA-10070: --- I don't know much about Cassandra internals, so one of the regular devs would know better, buy my thought would be during a restart, somewhere it figures out that it needs to replay part of the commit log to rebuild memtables that hadn't been flushed to disk. The timestamp of the last thing in the commit log might be a good estimate of when the node went down, and you could compare that to the current time to figure out how long the node was down. I wouldn't worry about the second case since it would be hard to get that right. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045350#comment-15045350 ] Jim Meyer commented on CASSANDRA-10070: --- I think this is part of the motivation for building repair scheduling into Cassandra. When we write an external repair scheduler, it has no idea what the state of the cluster is, so it just blindly issues repairs based on a time schedule. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980310#comment-14980310 ] Jon Haddad commented on CASSANDRA-10070: [~amandava] I just opened CASSANDRA-10619 > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980078#comment-14980078 ] Marcus Olsson commented on CASSANDRA-10070: --- Just to clarify, the automatic scheduling is done on a node level. The way it distributes is by "competing" with the other nodes with regards to who has the highest need for a repair and then uses a CAS lock to obtain the right to run a repair. So the repair process would continue during upgrade, but I assume it would fail as it is right now and that the repair job would be retried. The problem here is that this job would try to run until it succeeded since it has the highest priority, even if there are other repair jobs that could run (e.g. if only a part of the cluster was upgraded). To allow repairs during an upgrade scenario I think we need to have both CASSANDRA-7530 & CASSANDRA-8110 in place. Until then I see two options: * Make it possible to "pause" all repair scheduling, e.g. during upgrade scenarios. * Make the repair job recognize that it cannot run at this time and allow another repair job to run instead. I wouldn't mind implementing both options, since there might be scenarios when both are needed, even if we can repair between versions. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980079#comment-14980079 ] Marcus Olsson commented on CASSANDRA-10070: --- Just to clarify, the automatic scheduling is done on a node level. The way it distributes is by "competing" with the other nodes with regards to who has the highest need for a repair and then uses a CAS lock to obtain the right to run a repair. So the repair process would continue during upgrade, but I assume it would fail as it is right now and that the repair job would be retried. The problem here is that this job would try to run until it succeeded since it has the highest priority, even if there are other repair jobs that could run (e.g. if only a part of the cluster was upgraded). To allow repairs during an upgrade scenario I think we need to have both CASSANDRA-7530 & CASSANDRA-8110 in place. Until then I see two options: * Make it possible to "pause" all repair scheduling, e.g. during upgrade scenarios. * Make the repair job recognize that it cannot run at this time and allow another repair job to run instead. I wouldn't mind implementing both options, since there might be scenarios when both are needed, even if we can repair between versions. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978802#comment-14978802 ] Avinash Mandava commented on CASSANDRA-10070: - Right now streaming between Cassandra versions isn't recommended, but I'm wondering how we would upgrade to a new version with automatic repairs running. If I'm doing a rolling upgrade, right now I have to stop the repair process to prevent streaming between nodes and then upgrade, and then resume the repair process. But if we are thinking of including automatic repairs, might it be valuable to allow people to keep the repair process going while they upgrade? I can see how an upgrade is infrequent enough for this suggestion to be overkill, but curious what people think. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718230#comment-14718230 ] Marcus Olsson commented on CASSANDRA-10070: --- This is an optional feature in the way it's implemented, you can disable this and do only manual/scripted repairs if you want. I think this is a fundamental functionality that should be a part of the codebase, otherwise the same argument could be held against compactions, those \*could\* also be handled in an external tool. Note that the actual repairs are always handled inside of C* and that this is just a way to schedule them. I think that data consistency management should be a part of a databases functionality. There are already hints and read-repairs that are handled inside of C*, so why shouldn't repairs be handled that way as well? Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718168#comment-14718168 ] Malcolm commented on CASSANDRA-10070: - Operational simplicity is nice, however anything that increases the surface area of what Cassandra does increases the chances of bugs. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718206#comment-14718206 ] Marcus Olsson commented on CASSANDRA-10070: --- As it is right now it would wait an hour after starting up(configurable) before starting to schedule repairs. At that point the repair priority would be based on when it last ran a repair and how often it can run (P = (H+1) * bP - described above). So it should have the highest priority if it was supposed to repair during the time it was down. Right now there is no such option for that, so a manual repair would be required. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718522#comment-14718522 ] Jim Meyer commented on CASSANDRA-10070: --- Would it be difficult to add an option like that? One of the advantages of building the scheduler into C* is that it could have insight into the state of the cluster and respond to node downtime. It could reduce the consistency gap between the hint window being exceeded and the next regularly scheduled repair for a critical table. Then one could set the hint window smaller, the regular schedule to once a week, and the recovery repair to queue a repair when downtime had exceeded the hint window. Separate question, is the 'parallelism' attribute scalable for large clusters? If I have a 1000 node cluster and want to allow up to 10% of my nodes to run repairs at the same time, how would I specify that? Would that be a system config param or a table level attribute? Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718527#comment-14718527 ] Jim Meyer commented on CASSANDRA-10070: --- +1 to including. This feature would help with multi-tenancy and extend the idea of tunable consistency since different use cases will have different repair requirements. Individual applications could self serve their repair frequency via the table properties instead of having an administrator guess what frequency is needed. It is a difficult and error prone chore for an application developer to devise a reliable external mechanism for scheduling repairs. It often ends up as a simple cron job that blindly repairs all keyspaces once a day. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14719321#comment-14719321 ] Marcus Olsson commented on CASSANDRA-10070: --- Good point, I can take a look into how difficult it would be to implement something like that. The parallelism attribute is a mapping to the repair parallelism(parallel or sequential). I have thought about the possibility to run multiple repairs at once as well, I guess it would be a system level configuration. I think the attribute should say the number of parallel maintenance tasks to perform. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717170#comment-14717170 ] sankalp kohli commented on CASSANDRA-10070: --- [~malcolm] With CASSANDRA-6434, it becomes more important to run repairs. It is not just to keep things in sync but to drop tombstones. I would vote for keeping this in C*. I am not sure if it is possible to keep it on the side as C* is a single process. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717204#comment-14717204 ] Jon Haddad commented on CASSANDRA-10070: As an operator, +1 to including. Anything that reduces the surface area for user mistake is a good thing. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716248#comment-14716248 ] Malcolm commented on CASSANDRA-10070: - Is there any strong reason to make this part of the Cassandra codebase? All of this work can be expressed and handled in an external tool, keeping the Cassandra codebase focused more on storing data. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713496#comment-14713496 ] Jim Meyer commented on CASSANDRA-10070: --- This sounds like a very useful feature. I'm wondering what the behavior will be when a node that has been down for a while comes back up. I assume it would see that it is overdue for some repairs and schedule them in a load friendly manner. Now suppose I have a table where consistency is very important. Would I be able to set table attributes to schedule a high priority repair if the node had been down longer than max_hint_window_in_ms, so that it can be made consistent as soon as possible? Or would that still need to be done manually? Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710966#comment-14710966 ] Marcus Olsson commented on CASSANDRA-10070: --- *An explanation of the patch so far* *Tables* Added a table property to specify that the table should be repaired automatically, the minimum delay between repairs, and some repair parameters. {noformat} CREATE TABLE example ( key text, value text, PRIMARY KEY(key) ) WITH repair_scheduling = {'enabled': true, 'min_delay': 86400, 'incremental': true, 'parallelism': 'sequential'}; {noformat} This would create a table that would get repaired at most once a day(86400 seconds between each repair) using incremental sequential repair. I added a package, scheduling, for all maintenance scheduling related classes. The main class in this package is the ScheduleManager which performs the updates and running of the scheduled jobs. This package has two pluggable interfaces and some abstract classes. Which schedulers/policies to use is configurable in cassandra.yaml *Interfaces* - *IScheduler* - Used to create new scheduled jobs in the background. - *ISchedulePolicy* - Used to deny jobs from running based on some conditions. *Implementations* - *RepairScheduler* - Implements the IScheduler interface and is responsible to check the tables for repair scheduling options. It also listens for schema changes. - *FileSchedulePolicy* - Implements the ISchedulePolicy interface and uses a configuration file,, The scheduler has one implementation, the *RepairScheduler*, which listens for schema changes and adds new ScheduledJobs for the ScheduleManager to run. For the schedule policy there is one implementation, *FileSchedulePolicy* which uses a configuration file to define when scheduled jobs should be prevented from running. The policy is not enabled by default. *Abstract classes* - *ScheduledJob* - The base class for all scheduled jobs which contains functionality to calculate priority, when to run next, etc. - *ScheduledTask* - The base class for all tasks. The priority of the jobs are calculated as *P = (H + 1) \* bP* where *P* is the priority, *H* is the number of hours that has passed since it could have been executed(based on min_delay) and *bP* is the base priority. *Other* - *DistributedLock* - Used to get the run lock. - *JobConfiguration* - Common configuration for the scheduled jobs like minimum delay between jobs, base priority, enabled and if it should only run once. - *ScheduledJobQueue* - Dynamically prioritized queue of jobs based on their priority. - *ScheduledRepairJob* - Implementation of the ScheduledJob that holds a list of repair tasks(repairs a single table). - *ScheduledRepairTask* - Implementation of the ScheduledTask used to repair a single range for a table. The DistributedLock uses two tables in the system_distributed keyspace. One for writing priority so that all nodes knows about the others priority. The other is for the lock which is accessed by LWT. It first writes it's own priority to the table, then it reads all nodes priorities. If the node has the highest priority it will try to take the lock. *Nodetool* Added the \-s(\-\-scheduled) \-sh(\-\-scheduled-high) flags to repair that indicates that it should be a scheduled(but only run once) repair and if it should have the highest priority. The reason for using interfaces and abstract classes is to have flexibility for the user in adding their own schedulers and perhaps having the possibility to schedule e.g. cleanups through nodetool. The branch is available here: https://github.com/emolsson/cassandra/commits/10070 Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704457#comment-14704457 ] Marcus Olsson commented on CASSANDRA-10070: --- Working on a first draft of the scheduler, will hopefully have a patch set up in the next week. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696755#comment-14696755 ] Marcus Olsson commented on CASSANDRA-10070: --- I have previously done some work on something similar to this, although it was an external Java process that used JMX. There it divides all tables as _Jobs_ that is prioritized against each other in a consistent manner. A Job is then further divided into _Tasks_, where each task is responsible to repair a certain range for that table. A Job can be seen as a atomic unit, where the success of the Job was based on the success of all it's Tasks. It used LWT with TTL to create a lock on which node has the right to run repair right now. Since it used TTL, the lock would disappear in case the node holding the lock would die. It also used a repair history to be able to continue from where it left of when restarted. --- *Some ideas for the implementation:* *Core* By reusing some of the concepts it would be possible to create a pluggable interface that can be used to prioritize these Jobs on a node level. By inserting this priority(an simple integer?) into a distributed table it would be possible for other nodes to see which node has the highest priority to run a repair, to avoid starvation. Then before running the repair there could be another pluggable interface that can prevent repairs from starting under certain circumstances(e.g. node load). The automatic repair of a table could be enabled/disabled by a table property. *Nodetool* * Add possibility to enable/disable all automatic repair. * Add possibility to run a repair of a table when possible(that uses this distributed scheduling). Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)