[jira] [Comment Edited] (ACCUMULO-4353) Stabilize tablet assignment during transient failure
[ https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348409#comment-15348409 ] Shawn Walker edited comment on ACCUMULO-4353 at 6/24/16 4:14 PM: - bq. Are you attempting to design a mechanism that could be used to avoid re-balancing and have the master keep assignments where they were previously, knowing that servers will come back into operation? That is the idea, yes. There is definitely a tradeoff here. bq. If this is really about trying to make rolling-restarts better, I'd encourage a look at ACCUMULO-1454. As I mentioned before, I hadn't seen ACCUMULO-1454 before starting this. I've now looked at the discussion of that ticket. What I implemented was approximately what Christopher Tubbs and David Medinets were suggesting. I also read through Keith Turner's design proposal summary. I have some reservations with it: * It requires that each planned restart involves tablet servers changing ports. While the recent changes to Accumulo to support a narrow port range during port search would make this more plausible, it might still prove difficult to establish firewall rules for Accumulo. (Sean Busby raises this issue in the discussion). * What happens if a tablet is split after migration starts? It seems to me there might be a race condition here which would lead to incomplete migration between sibling tablet servers. Do we block assignment during the rolling restart, too? That seems seems like a cure worse than the problem. * Even barring those two concerns, I again raise the spectre of ops complexity. To transition a single server, I need to know (a) which port the "old" tserver was running on, and (b) which port the "new" tserver is running on. If I'm using some sort of dynamic port assignment (which I would need to unless I pointed the "new" tserver at an entirely different configuration), it could be non-trivial to gather these pieces of information. While the burden on the operator of a cluster of 5 tservers might not be significant, the burden on the operator of a cluster of 200 tservers might make this approach infeasible. And the non-triviality of determining the correct port migration mapping would also make the process difficult to robustly automate. bq. While seeing a pull request accompanying the issue reported, It seems a bit premature to me to see code without some discussion on what the problems are and how best to solve them. Ahh, my mistake then. As a new contributor to Accumulo, I still don't have a full grasp of the rules, either written or unwritten. My feeling from watching the list was that primary modus operandi was to present a (fully implemented) solution along with a proposed problem, and then to discuss the merits of the solution. was (Author: shawnwalker): bq. Are you attempting to design a mechanism that could be used to avoid re-balancing and have the master keep assignments where they were previously, knowing that servers will come back into operation? That is the idea, yes. There is definitely a tradeoff here. bq. If this is really about trying to make rolling-restarts better, I'd encourage a look at ACCUMULO-1454. As I mentioned before, I hadn't seen ACCUMULO-1454 before starting this. I've now looked at the discussion of that ticket. What I implemented was approximately what Christopher Tubbs and David Medinets were suggesting. I also read through Keith Turner's design proposal summary. I have some reservations with it: * It requires that each planned restart involves tablet servers changing ports. While the recent changes to Accumulo to support a narrow port range during port search would make this more plausible, it might still prove difficult to establish firewall rules for Accumulo. (Sean Busby raises this issue in the discussion). * What happens if a tablet is split after migration starts? It seems to me there might be a race condition here which would lead to incomplete migration between sibling tablet servers. Do we block assignment during the rolling restart, too? That seems seems like a cure worse than the problem. * Even barring those two concerns, I again raise the spectre of ops complexity. To transition a single server, I need to know (a) which port the "old" tserver was running on, and (b) which port the "new" tserver is running on. If I'm using some sort of dynamic port assignment (which I would need to unless I pointed the "new" tserver at an entirely different configuration), it could be non-trivial to gather these pieces of information. While the burden on the operator of a cluster of 5 tservers might not be significant, the burden on the operator of a cluster of 200 tservers might make this approach infeasible. And the non-triviality of determining the correct port migration mapping would also make the process difficult to
[jira] [Comment Edited] (ACCUMULO-4353) Stabilize tablet assignment during transient failure
[ https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347595#comment-15347595 ] Shawn Walker edited comment on ACCUMULO-4353 at 6/24/16 3:04 AM: - I did have rolling restarts as a primary motivation for doing this work, though a few other scenarios did come to mind as potential applications: * tserver loses lock (possibly due to load), dies, and is restarted quickly via some external infrastructure, e.g. Puppet * temporary network connectivity loss I was thinking a `table.suspend.duration` on the order of 2-3 minutes might make sense for general purposes in a large cluster. Long enough to catch most truly transient problems, sufficiently short that many applications wouldn't be unduly impacted. Particularly seeing as any application already has to deal with a ~30 second wait before the master really notices a tablet server gone anyways. After all, Accumulo is ultimately a consistent+partition tolerant database, not an available+partition tolerant database. If availability is a user's top priority, other databases (e.g. Apache Cassandra) offer tradeoffs in that direction. I hadn't seen ACCUMULO-1454, I'll take a closer look in the morning. One concern that I had with some rolling-restart ideas was a matter of ops complexity. In my (admittedly limited) experience, orchestrating a rolling restart that needs to do much more than "kill daemon, restart daemon" over a large cluster can be a huge headache. was (Author: shawnwalker): I did have rolling restarts as a primary motivation for doing this work, though a few other scenarios did come to mind as potential applications: * tserver loses lock (possibly due to load), dies, and is restarted quickly via some external infrastructure, e.g. Puppet * temporary network connectivity loss I was thinking a `table.suspend.duration` on the order of 2-3 minutes might make sense for general purposes in a large cluster. Long enough to catch most truly transient problems, sufficiently short that many applications wouldn't be unduly impacted. Particularly seeing as any application already has to deal with a ~30 second wait before the master really notices a tablet server gone anyways. I hadn't seen ACCUMULO-1454, I'll take a closer look in the morning. One concern that I had with some rolling-restart ideas was a matter of ops complexity. In my (admittedly limited) experience, orchestrating a rolling restart that needs to do much more than "kill daemon, restart daemon" over a large cluster can be a huge headache. > Stabilize tablet assignment during transient failure > > > Key: ACCUMULO-4353 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4353 > Project: Accumulo > Issue Type: Improvement >Reporter: Shawn Walker >Assignee: Shawn Walker >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > When a tablet server dies, Accumulo attempts to reassign the tablets it was > hosting as quickly as possible to maintain availability. If multiple tablet > servers die in quick succession, such as from a rolling restart of the > Accumulo cluster or a network partition, this behavior can cause a storm of > reassignment and rebalancing, placing significant load on the master. > To avert such load, Accumulo should be capable of maintaining a steady tablet > assignment state in the face of transient tablet server loss. Instead of > reassigning tablets as quickly as possible, Accumulo should be await the > return of a temporarily downed tablet server (for some configurable duration) > before assigning its tablets to other tablet servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (ACCUMULO-4353) Stabilize tablet assignment during transient failure
[ https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347266#comment-15347266 ] marco polo edited comment on ACCUMULO-4353 at 6/23/16 9:46 PM: --- Are you attempting to design a mechanism that could be used to avoid re-balancing and have the master keep assignments where they were previously, knowing that servers will come back into operation? I only ask because I question why the load on the master is a problem. You will cause load since clients will persist for that length of time. Wont' you increase the number of thrift connections waiting since you may not re-balance for some time? was (Author: phrocker): Are you attempting to design a mechanism that could be used to avoid re-balancing and have the master keep assignments where they were previously, knowing that servers will come back into operation? > Stabilize tablet assignment during transient failure > > > Key: ACCUMULO-4353 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4353 > Project: Accumulo > Issue Type: Improvement >Reporter: Shawn Walker >Assignee: Shawn Walker >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > When a tablet server dies, Accumulo attempts to reassign the tablets it was > hosting as quickly as possible to maintain availability. If multiple tablet > servers die in quick succession, such as from a rolling restart of the > Accumulo cluster or a network partition, this behavior can cause a storm of > reassignment and rebalancing, placing significant load on the master. > To avert such load, Accumulo should be capable of maintaining a steady tablet > assignment state in the face of transient tablet server loss. Instead of > reassigning tablets as quickly as possible, Accumulo should be await the > return of a temporarily downed tablet server (for some configurable duration) > before assigning its tablets to other tablet servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)