Re: High Availability command line interface - future plans.
In the end I think it comes down to a philosophical difference. I believe in implementing systems from the bottom up out of well understood simple-as-possible components with easily understood properties. I am aware that another approach is to start with a partially implemented primitive that represents a design goal and fill out its implementation until it meets that goal. In this discussion, ensure-ha seems to me to epitomise the second approach and I do understand the arguments for it. With reference to Mark's mention of the inmates running the asylum, I realise that, by that analogy, I am most certainly an inmate here. My ideas about what might make for a solid and straightforward tool to use are biased by my knowledge of the structure of the system. William's response is clear, so ensure-ha it is. I'm afraid we're back where we started but I've found this conversation useful and hope that others have too. I don't have the will to bike-shed around the actual command we use, however I strongly suggest that we go with something that makes sense to Jorge and Marco (and to our CTS folks) as they are our people on the ground, using this tool. It would have been great to have had feedback from the CTS folks (possibly the biggest current operational users of Juju?) for their views. cheers, rog. On 11 November 2013 03:50, Tim Penhey tim.pen...@canonical.com wrote: On 09/11/13 03:04, roger peppe wrote: On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote: juju add-state-server --api-only-please-thanks And if we want to allow a machine that runs the environment-manager workers but not the api server or mongo server (not actually an unlikely thing given certain future possibilities) then add-state-server is a command that doesn't necessarily add a state server at all... That thought was the source of my doubt. I think that it is reasonable to think of just the db and the api server from the user's point of view. The fact that we may run other workers along side the api server is up to us, and not something we actually need to expose to people. Most of our users should have no problem at all understanding juju:db and juju:api (or whatever names we call them). That said, it's just a spelling. If there's general agreement on state-server, so be it - I'm very happy to move forward with that. I cringe whenever I see state used anywhere. I would like use to move towards namespaced services with a common understanding, but I'm happy to have that significantly down the line. Just remember that whatever command we come up with, it needs to be easily explained to our new users. I like the idea of a special command that handles the HA-ness of juju, because it means we can give meaningful error messages when people do things not quite right (like adding just one more mongo db thinking it is enough). I don't have the will to bike-shed around the actual command we use, however I strongly suggest that we go with something that makes sense to Jorge and Marco (and to our CTS folks) as they are our people on the ground, using this tool. Cheers, Tim -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On 09/11/13 03:04, roger peppe wrote: On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote: juju add-state-server --api-only-please-thanks And if we want to allow a machine that runs the environment-manager workers but not the api server or mongo server (not actually an unlikely thing given certain future possibilities) then add-state-server is a command that doesn't necessarily add a state server at all... That thought was the source of my doubt. I think that it is reasonable to think of just the db and the api server from the user's point of view. The fact that we may run other workers along side the api server is up to us, and not something we actually need to expose to people. Most of our users should have no problem at all understanding juju:db and juju:api (or whatever names we call them). That said, it's just a spelling. If there's general agreement on state-server, so be it - I'm very happy to move forward with that. I cringe whenever I see state used anywhere. I would like use to move towards namespaced services with a common understanding, but I'm happy to have that significantly down the line. Just remember that whatever command we come up with, it needs to be easily explained to our new users. I like the idea of a special command that handles the HA-ness of juju, because it means we can give meaningful error messages when people do things not quite right (like adding just one more mongo db thinking it is enough). I don't have the will to bike-shed around the actual command we use, however I strongly suggest that we go with something that makes sense to Jorge and Marco (and to our CTS folks) as they are our people on the ground, using this tool. Cheers, Tim -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from I know this is not following the add-machine format, but I think it would be better to migrate that to something more like this: juju machine --add --Mark Ramm On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote: On 6 November 2013 20:07, Kapil Thangavelu kapil.thangav...@canonical.com wrote: instead of adding more complexity and concepts, it would be ideal if we could reuse the primitives we already have. ie juju environments have three user exposed services, that users can add-unit / remove-unit etc. they have a juju prefix and therefore are omitted by default from status listing. That's a much simpler story to document. how do i scale my state server.. juju add-unit juju-db... my provisioner juju add-unit juju-provisioner. I have a lot of sympathy with this point of view. I've thought about it quite a bit. I see two possibilities for implementing it: 1) Keep something like the existing architecture, where machine agents can take on managerial roles, but provide a veneer over the top which specially interprets service operations on the juju built-in services and translates them into operations on machine jobs. 2) Actually implement the various juju services as proper services. The difficulty I have with 1) is that there's a significant mismatch between the user's view of things and what's going on underneath. For instance, with a built-in service, can I: - add a subordinate service to it? - see the relevant log file in the usual place for a unit? - see its charm metadata? - join to its juju-info relation? If it's a single service, how can its units span different series? (presumably it has got a charm URL, which includes the series) I fear that if we try this approach, the cracks show through and the result is a system that's hard to understand because too many things are not what they appear. And that's not even going into the plethora of special casing that this approach would require throughout the code. 2) is more attractive, as it's actually doing what's written on the label. But this has its own problems. - it's a highly significant architectural change. - juju managerial services are tightly tied into the operation of juju itself (not surprisingly). There are many chicken and egg problems here - we would be trying to use the system to support itself, and that could easily lead to deadlock as one part of the system tries to talk to another part of the system that relies on the
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from Sounds good to me. Similar to how I was thinking of doing it originally, but segregating it from add-machine etc. should prevent adding cognitive overhead for users that don't care. Also, not so much leakage of internals, and no magic (a good thing!) I know this is not following the add-machine format, but I think it would be better to migrate that to something more like this: juju machine --add --Mark Ramm On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote: On 6 November 2013 20:07, Kapil Thangavelu kapil.thangav...@canonical.com wrote: instead of adding more complexity and concepts, it would be ideal if we could reuse the primitives we already have. ie juju environments have three user exposed services, that users can add-unit / remove-unit etc. they have a juju prefix and therefore are omitted by default from status listing. That's a much simpler story to document. how do i scale my state server.. juju add-unit juju-db... my provisioner juju add-unit juju-provisioner. I have a lot of sympathy with this point of view. I've thought about it quite a bit. I see two possibilities for implementing it: 1) Keep something like the existing architecture, where machine agents can take on managerial roles, but provide a veneer over the top which specially interprets service operations on the juju built-in services and translates them into operations on machine jobs. 2) Actually implement the various juju services as proper services. The difficulty I have with 1) is that there's a significant mismatch between the user's view of things and what's going on underneath. For instance, with a built-in service, can I: - add a subordinate service to it? - see the relevant log file in the usual place for a unit? - see its charm metadata? - join to its juju-info relation? If it's a single service, how can its units span different series? (presumably it has got a charm URL, which includes the series) I fear that if we try this approach, the cracks show through and the result is a system that's hard to understand because too many things are not what they appear. And that's not even going into the plethora of special casing that this approach would require throughout the code. 2) is more attractive, as it's actually doing what's written on the label. But
Re: High Availability command line interface - future plans.
Given a bit of thought the reasons that I proposed the sub command remove-from rather than just remove are both obscure enough that I should have explained them, and wrong enough that I should not have proposed that syntax. I was thinking that remove always requires a machine ID, and that add did not which made them asymmetric enough to justify a different spelling, but a bit of further thinking leads me to think that this is already the case with add-unit and remove-unit, and therefore consistency is better than a new spelling. On Fri, Nov 8, 2013 at 5:15 PM, Andrew Wilkins andrew.wilk...@canonical.com wrote: On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from Sounds good to me. Similar to how I was thinking of doing it originally, but segregating it from add-machine etc. should prevent adding cognitive overhead for users that don't care. Also, not so much leakage of internals, and no magic (a good thing!) I know this is not following the add-machine format, but I think it would be better to migrate that to something more like this: juju machine --add --Mark Ramm On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote: On 6 November 2013 20:07, Kapil Thangavelu kapil.thangav...@canonical.com wrote: instead of adding more complexity and concepts, it would be ideal if we could reuse the primitives we already have. ie juju environments have three user exposed services, that users can add-unit / remove-unit etc. they have a juju prefix and therefore are omitted by default from status listing. That's a much simpler story to document. how do i scale my state server.. juju add-unit juju-db... my provisioner juju add-unit juju-provisioner. I have a lot of sympathy with this point of view. I've thought about it quite a bit. I see two possibilities for implementing it: 1) Keep something like the existing architecture, where machine agents can take on managerial roles, but provide a veneer over the top which specially interprets service operations on the juju built-in services and translates them into operations on machine jobs. 2) Actually implement the various juju services as proper services. The difficulty I have with 1) is that there's a significant mismatch between the user's view of things and what's going on underneath. For instance, with a built-in service, can I: - add a subordinate
Re: High Availability command line interface - future plans.
On 8 November 2013 08:47, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from This seems like a reasonable approach in principle (it's essentially isomorphic to the --jobs approach AFAICS which makes me happy). I have to say that I'm not keen on using flags to switch the basic behaviour of a command. The interaction between the flags can then become non-obvious (for example a --constraints flag might be appropriate with --add but not --remove-from). Ah, but your next message seems to go along with that. So, to couch your proposal in terms that are consistent with the rest of the juju commands, here's how I see it could look, in terms of possible help output from the commands: usage: juju add-management [options] purpose: Add Juju management functionality to a machine, or start a new machine with management functionality. Any Juju machine can potentially participate as a Juju manager - this command adds a new such manager. Note that there should always be an odd number of active management machines, otherwise the Juju environment is potentially vulnerable to network partitioning. If a management machine fails, a new one should be started to replace it. options: --constraints (= ) additional machine constraints. Ignored if --to is specified. -e, --environment (= local) juju environment to operate in --series (= ) the Ubuntu series of the new machine. Ignored if --to is specified. --to (=) the id of the machine to add management to. If this is not specified, a new machine is provisioned. usage: juju remove-management [options] machine-id purpose: Remove Juju management functionality from the machine with the given id. The machine itself is not destroyed. Note that if there are less than three management machines remaining, the operation of the Juju environment will be vulnerable to the failure of a single machine. It is not possible to remove the last management machine. options: -e, --environment (= local) juju environment to operate in As a start, we could implement only the add-management command, and not implement the --to flag. That would be sufficient for our HA deliverable, I believe. The other features could be added in time or according to customer demand. I know this is not following the add-machine format, but I think it would be better to migrate that to something more like this: juju machine --add If we are going to do that, I think we should probably change
Re: High Availability command line interface - future plans.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2013-11-08 14:15, roger peppe wrote: On 8 November 2013 08:47, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from This seems like a reasonable approach in principle (it's essentially isomorphic to the --jobs approach AFAICS which makes me happy). I have to say that I'm not keen on using flags to switch the basic behaviour of a command. The interaction between the flags can then become non-obvious (for example a --constraints flag might be appropriate with --add but not --remove-from). Ah, but your next message seems to go along with that. So, to couch your proposal in terms that are consistent with the rest of the juju commands, here's how I see it could look, in terms of possible help output from the commands: usage: juju add-management [options] purpose: Add Juju management functionality to a machine, or start a new machine with management functionality. Any Juju machine can potentially participate as a Juju manager - this command adds a new such manager. Note that there should always be an odd number of active management machines, otherwise the Juju environment is potentially vulnerable to network partitioning. If a management machine fails, a new one should be started to replace it. I would probably avoid putting such an emphasis on any machine can be a manager machine. But that is my personal opinion. (If you want HA you probably want it on dedicated nodes.) options: --constraints (= ) additional machine constraints. Ignored if --to is specified. -e, --environment (= local) juju environment to operate in --series (= ) the Ubuntu series of the new machine. Ignored if --to is specified. --to (=) the id of the machine to add management to. If this is not specified, a new machine is provisioned. usage: juju remove-management [options] machine-id purpose: Remove Juju management functionality from the machine with the given id. The machine itself is not destroyed. Note that if there are less than three management machines remaining, the operation of the Juju environment will be vulnerable to the failure of a single machine. It is not possible to remove the last management machine. I would probably also remove the machine if the only thing on it was the management. Certainly that is how people want us to do juju remove-unit. options: -e, --environment (= local) juju environment to operate in
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel j...@arbash-meinel.com wrote: I would probably avoid putting such an emphasis on any machine can be a manager machine. But that is my personal opinion. (If you want HA you probably want it on dedicated nodes.) Resource waste holds juju back for the small users. Being able to share a state server with other resources does sound attractive from that perspective. It may be the difference between running 3 machines or 6. I would probably also remove the machine if the only thing on it was the management. Certainly that is how people want us to do juju remove-unit. If there are other units in the same machine, we should definitely not remove the machine on remove-unit. The principle sounds the same with state servers. The main problem with this is that it feels slightly too easy to add just 1 machine and then not actually have HA (mongo stops allowing writes if you have a 2-node cluster and lose one, right?) +1 gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 6:34 AM, Gustavo Niemeyer gust...@niemeyer.netwrote: On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel j...@arbash-meinel.com wrote: I would probably avoid putting such an emphasis on any machine can be a manager machine. But that is my personal opinion. (If you want HA you probably want it on dedicated nodes.) Resource waste holds juju back for the small users. Being able to share a state server with other resources does sound attractive from that perspective. It may be the difference between running 3 machines or 6. If you only have 3 machines, do you really need HA from juju? You don't have HA from your machines that are actually *running your service*. I would probably also remove the machine if the only thing on it was the management. Certainly that is how people want us to do juju remove-unit. If there are other units in the same machine, we should definitely not remove the machine on remove-unit. The principle sounds the same with state servers. The main problem with this is that it feels slightly too easy to add just 1 machine and then not actually have HA (mongo stops allowing writes if you have a 2-node cluster and lose one, right?) +1 Yeah, same here. I still think we need a turn on HA mode command that'll bring you to 3 servers. It doesn't have to be the swiss army knife that we said before... just something to go from non-HA to valid HA environment. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
These are *very* good points, Mark. Taking them to heart will definitely lead into a good direction for the overall feature development. It sounds like we should avoid using a management command for anything in juju, though. Most things in juju are about management one way or the other, so juju management becomes very unclear and hard to search for. Instead, the command might be named after what we've been calling them: juju add-state-server -n 2 For implementation convenience sake, it would be okay to only ever accept -n 2 when this is first released. I can also imagine the behavior of this command resembling add-unit in a few aspects, since a state server is in fact code that just needs a home to run in. This may yield other common options across them, such as machine selection. On Fri, Nov 8, 2013 at 6:47 AM, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from I know this is not following the add-machine format, but I think it would be better to migrate that to something more like this: juju machine --add --Mark Ramm On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.com wrote: On 6 November 2013 20:07, Kapil Thangavelu kapil.thangav...@canonical.com wrote: instead of adding more complexity and concepts, it would be ideal if we could reuse the primitives we already have. ie juju environments have three user exposed services, that users can add-unit / remove-unit etc. they have a juju prefix and therefore are omitted by default from status listing. That's a much simpler story to document. how do i scale my state server.. juju add-unit juju-db... my provisioner juju add-unit juju-provisioner. I have a lot of sympathy with this point of view. I've thought about it quite a bit. I see two possibilities for implementing it: 1) Keep something like the existing architecture, where machine agents can take on managerial roles, but provide a veneer over the top which specially interprets service operations on the juju built-in services and translates them into operations on machine jobs. 2) Actually implement the various juju services as proper services. The difficulty I have with 1) is that there's a significant mismatch between the user's view of things and what's going on underneath. For instance, with a built-in service, can I: - add a subordinate service to it? - see the relevant log file in the usual place for a unit? - see its
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 9:39 AM, Nate Finch nate.fi...@canonical.com wrote: If you only have 3 machines, do you really need HA from juju? You don't have HA from your machines that are actually running your service. Why not? I have three machines.. Yeah, same here. I still think we need a turn on HA mode command that'll bring you to 3 servers. It doesn't have to be the swiss army knife that we said before... just something to go from non-HA to valid HA environment. This looks fine: juju add-state-server -n 2 It's easy to error if current + n is not a good number. gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On 8 November 2013 11:31, Gustavo Niemeyer gust...@niemeyer.net wrote: These are *very* good points, Mark. Taking them to heart will definitely lead into a good direction for the overall feature development. It sounds like we should avoid using a management command for anything in juju, though. Most things in juju are about management one way or the other, so juju management becomes very unclear and hard to search for. Instead, the command might be named after what we've been calling them: juju add-state-server -n 2 I'm not sure that state-server is the right name here. For a start there are two kinds of state servers, mongo and API, which we may want to scale independently as they have totally different characteristics, and the management workers (provisioner, etc) also fall under the same umbrella. Management has been the best I've seen so far, though I do realise it is overly generic. Other suggestions? Are you suggesting that we also have destroy-state-server, BTW? It's easy to error if current + n is not a good number. That seems reasonable. Do you think this needs to be transactional? That is, if current is 2 and two people concurrently do add-state-server -n 1, should one of those requests necessarily fail? My inclination is we don't need to worry too much - but YMMV. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On 8 November 2013 10:31, John Arbash Meinel j...@arbash-meinel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2013-11-08 14:15, roger peppe wrote: On 8 November 2013 08:47, Mark Canonical Ramm-Christensen mark.ramm-christen...@canonical.com wrote: I have a few high level thoughts on all of this, but the key thing I want to say is that we need to get a meeting setup next week for the solution to get hammered out. First, conceptually, I don't believe the user model needs to match the implementation model. That way lies madness -- users care about the things they care about and should not have to understand how the system works to get something basic done. See: http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for reasons why I call this madness. For that reason I think the path of adding a --jobs flag to add-machine is not a move forward. It is exposing implementation detail to users and forcing them into a more complex conceptual model. Second, we don't have to boil the ocean all at once. An ensure-ha command that sets up additional server nodes is better than what we have now -- nothing. Nate is right, the box need not be black, we could have an juju ha-status command that just shows the state of HA. This is fundamentally different than changing the behavior and meaning of add-machines to know about juju jobs and agents and forcing folks to think about that. Third, we I think it is possible to chart a course from ensure-ha as a shortcut (implemented first) to the type of syntax and feature set that Kapil is talking about. And let's not kid ourselves, there are a bunch of new features in that proposal: * Namespaces for services * support for subordinates to state services * logging changes * lifecycle events on juju jobs * special casing the removal of services that would kill the environment * special casing the stats to know about HA and warn for even state server nodes I think we will be adding a new concept and some new syntax when we add HA to juju -- so the idea is just to make it easier for users to understand, and to allow a path forward to something like what Kapil suggests in the future. And I'm pretty solidly convinced that there is an incremental path forward. Fourth, the spelling ensure-ha is probably not a very good idea, the cracks in that system (like taking a -n flag, and dealing with failed machines) are already apparent. I think something like Nick's proposal for add-manager would be better. Though I don't think that's quite right either. So, I propose we add one new idea for users -- a state-server. then you'd have: juju management --info juju management --add juju management --add --to 3 juju management --remove-from This seems like a reasonable approach in principle (it's essentially isomorphic to the --jobs approach AFAICS which makes me happy). I have to say that I'm not keen on using flags to switch the basic behaviour of a command. The interaction between the flags can then become non-obvious (for example a --constraints flag might be appropriate with --add but not --remove-from). Ah, but your next message seems to go along with that. So, to couch your proposal in terms that are consistent with the rest of the juju commands, here's how I see it could look, in terms of possible help output from the commands: usage: juju add-management [options] purpose: Add Juju management functionality to a machine, or start a new machine with management functionality. Any Juju machine can potentially participate as a Juju manager - this command adds a new such manager. Note that there should always be an odd number of active management machines, otherwise the Juju environment is potentially vulnerable to network partitioning. If a management machine fails, a new one should be started to replace it. I would probably avoid putting such an emphasis on any machine can be a manager machine. But that is my personal opinion. (If you want HA you probably want it on dedicated nodes.) options: --constraints (= ) additional machine constraints. Ignored if --to is specified. -e, --environment (= local) juju environment to operate in --series (= ) the Ubuntu series of the new machine. Ignored if --to is specified. --to (=) the id of the machine to add management to. If this is not specified, a new machine is provisioned. usage: juju remove-management [options] machine-id purpose: Remove Juju management functionality from the machine with the given id. The machine itself is not destroyed. Note that if there are less than three management machines remaining, the operation of the Juju environment will be vulnerable to the failure of a single machine. It is not possible to remove the last management machine. I would probably also remove the machine if the only thing on it was the management. Certainly that is how people want us to do juju remove-unit. That seems
Re: High Availability command line interface - future plans.
On 8 November 2013 12:03, Gustavo Niemeyer gust...@niemeyer.net wrote: Splitting API and db at some point sounds sensible, but it may be easy and convenient to think about a state server as API+db for the time being. I'd prefer to start with a command name that implies that possibility; otherwise we'll end up either with a command that doesn't describe what it actually does, or more very similar commands where one could be sufficient. Hence my discomfort with add-state-server as a command name. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
We'll end up with a command that adds a state server, with a replica of the database and an API server. That's the notion of state server we've been using all along, and sounds quite reasonable, easy to explain and understand. On Fri, Nov 8, 2013 at 10:15 AM, roger peppe roger.pe...@canonical.com wrote: On 8 November 2013 12:03, Gustavo Niemeyer gust...@niemeyer.net wrote: Splitting API and db at some point sounds sensible, but it may be easy and convenient to think about a state server as API+db for the time being. I'd prefer to start with a command name that implies that possibility; otherwise we'll end up either with a command that doesn't describe what it actually does, or more very similar commands where one could be sufficient. Hence my discomfort with add-state-server as a command name. -- gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
juju add-state-server --api-only-please-thanks On Fri, Nov 8, 2013 at 11:43 AM, roger peppe roger.pe...@canonical.com wrote: On 8 November 2013 13:33, Gustavo Niemeyer gust...@niemeyer.net wrote: We'll end up with a command that adds a state server, with a replica of the database and an API server. That's the notion of state server we've been using all along, and sounds quite reasonable, easy to explain and understand. And when we want to split API and db, as you thought perhaps might be sensible at some point, what then? -- gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote: juju add-state-server --api-only-please-thanks And if we want to allow a machine that runs the environment-manager workers but not the api server or mongo server (not actually an unlikely thing given certain future possibilities) then add-state-server is a command that doesn't necessarily add a state server at all... That thought was the source of my doubt. That said, it's just a spelling. If there's general agreement on state-server, so be it - I'm very happy to move forward with that. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
Reminds me of one of my favorite quotes: Knobs are distracting, confusing and annoying. Personally, I'd rather things be 90% good 100% of the time than see 90 knobs. - Brad Fitzpatrick on having more than one Go scheduler. https://groups.google.com/forum/#!msg/golang-dev/eu0WzsTtNPo/pcD-zS3JkTYJ On Fri, Nov 8, 2013 at 9:32 AM, Gustavo Niemeyer gust...@niemeyer.netwrote: On Fri, Nov 8, 2013 at 12:04 PM, roger peppe roger.pe...@canonical.com wrote: On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote: juju add-state-server --api-only-please-thanks And if we want to allow a machine that runs the environment-manager workers but not the api server or mongo server (not actually an unlikely thing given certain future possibilities) then add-state-server is a command that doesn't necessarily add a state server at all... That thought was the source of my doubt. The fact you can organize things a thousand ways doesn't mean we should offer a thousand knobs. A state server is a good abstraction for there are management routines running there. You can define what that means, as long as you don't let things fall down when N/2-1 machines fall down. gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 12:04 PM, roger peppe roger.pe...@canonical.com wrote: On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote: juju add-state-server --api-only-please-thanks And if we want to allow a machine that runs the environment-manager workers but not the api server or mongo server (not actually an unlikely thing given certain future possibilities) then add-state-server is a command that doesn't necessarily add a state server at all... That thought was the source of my doubt. The fact you can organize things a thousand ways doesn't mean we should offer a thousand knobs. A state server is a good abstraction for there are management routines running there. You can define what that means, as long as you don't let things fall down when N/2-1 machines fall down. gustavo @ http://niemeyer.net -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
I'm concerned that we're (1) rehashing decisions made during the sprint and (2) deviating from requirements in doing so. In particular, abstracting HA away into management manipulations -- as roger notes, pretty much isomorphic to the jobs proposal -- doesn't give users HA so much as it gives them a limited toolkit with which they can more-or-less construct their own HA; in particular, allowing people to use an even number of state servers is strictly a bad thing [0], and I'm extremely suspicious of any proposal that opens that door. Of course, some will argue that mongo should be able to scale separately from the api servers and other management tasks, and this is a worthy goal; but in this context it sucks us down into the morass of exposing different types of management on different machines, and ends up approaching the jobs proposal still closer, in that it requires users to assimilate a whole load of extra terminology in order to perform a conceptually simple function. Conversely, ensure-ha (with possible optional --redundancy=N flag, defaulting to 1) is a simple model that can be simply explained: the command's sole purpose is to ensure that juju management cannot fail as a result to the simultaneous failure of =N machines. It's a *user-level* construct that will always be applicable even in the context of a more sophisticated future language (no matter what's going on with this complicated management/jobs business, you can run that and be assured you'll end up with at least enough manager machines to fulfil the requirement you clearly stated in the command line). I haven't seen anything that makes me think that redesigning from scratch is in any way superior to refining what we already agreed upon; and it's distracting us from the questions of reporting and correcting manager failure when it occurs. I assert the following series of arguments: * users may discover at any time that they need to make an existing environment HA, so ensure-ha is *always* a reasonable user action * users who *don't* need an HA environment can, by definition, afford to take the environment down and reconstruct it without HA if it becomes unimportant * therefore, scaling management *down* is not the highest priority for us (but is nonetheless easily amenable to future control via the ensure-ha command -- just explicitly set a lower redundancy number) * similarly, allowing users to *directly* destroy management machines enables exciting new failure modes that don't really need to exist * the notion of HA is somewhat limited in worth when there's no way to make a vulnerable environment robust again * the more complexity we shovel onto the user's plate, the less likely she is to resolve the situation correctly under stress * the most obvious, and foolproof, command for repairing HA would be ensure-ha itself, which could very reasonably take it upon itself to replace manager nodes detected as down -- assuming a robust presence implementation, which we need anyway, this (1) works trivially for machines that die unexpectedly and (2) allows a backdoor for resolution of weird situations: the user can manually shutdown a misbehaving manager out-of-band, and run ensure-ha to cause a new one to be spun up in its place; once HA is restored, the old machine will no longer be a manager, no longer be indestructible, and can be cleaned up at leisure * the notion is even more limited when you can't even tell when something goes wrong * therefore, HA state should *at least* be clearly and loudly communicated in status * but that's not very proactive, and I'd like to see a plan for how we're going to respond to these situations when we detect them * the data accessible to a manager node is sensitive, and we shouldn't generally be putting manager nodes on dirty machines; but density is an important consideration, and I don't think it's confusing to allow preferred machines to be specified in ensure-ha, such that *if* management capacity needs to be added it will be put onto those machines before finding clean ones or provisioning new ones * strawman syntax: juju ensure-ha --prefer-machines 11,37 to place any additional manager tasks that may be required on the supplied machines in order of preference -- but even this falls far behind the essential goal, which is make HA *easy* for our users. * (ofc, we should continue not to put units onto manager machines by default, but allow them when forced with --to as before) I don't believe that any of this precludes more sophisticated management of juju's internal functions *when* the need becomes pressing -- whether via jobs, or namespaced pseudo-services, or whatever -- but at this stage I think it is far better to expose the policies we're capable of supporting, and thus allow ourselves wiggle room to allow the mechanism to evolve, than to define a user-facing model that is, at best, a woolly reflection of an internal model that's likely to change as we explore the solution space in
Re: High Availability command line interface - future plans.
Scaling jobs independently doesn't really get you much. If you need 7 machines of redundancy for mongo... why would you not just also want the API on all 7 machines? It's 100% upside... now your API is that much more redundant/scaled, and we already know the API and mongo run just fine together on a single machine. The only point at which it makes sense to break out of just make N copies of the whole state server is: 1. if you need to go beyond mongo's 12 node maximum, or 2. if you want to somehow have HA without using up N extra machines by putting bits and pieces on machines also hosting units. Neither of those seem like critical things we need to support in v1 of HA. And we should probably only try to do what is critical for v1. On Fri, Nov 8, 2013 at 11:00 AM, William Reade william.re...@canonical.comwrote: I'm concerned that we're (1) rehashing decisions made during the sprint and (2) deviating from requirements in doing so. In particular, abstracting HA away into management manipulations -- as roger notes, pretty much isomorphic to the jobs proposal -- doesn't give users HA so much as it gives them a limited toolkit with which they can more-or-less construct their own HA; in particular, allowing people to use an even number of state servers is strictly a bad thing [0], and I'm extremely suspicious of any proposal that opens that door. Of course, some will argue that mongo should be able to scale separately from the api servers and other management tasks, and this is a worthy goal; but in this context it sucks us down into the morass of exposing different types of management on different machines, and ends up approaching the jobs proposal still closer, in that it requires users to assimilate a whole load of extra terminology in order to perform a conceptually simple function. Conversely, ensure-ha (with possible optional --redundancy=N flag, defaulting to 1) is a simple model that can be simply explained: the command's sole purpose is to ensure that juju management cannot fail as a result to the simultaneous failure of =N machines. It's a *user-level* construct that will always be applicable even in the context of a more sophisticated future language (no matter what's going on with this complicated management/jobs business, you can run that and be assured you'll end up with at least enough manager machines to fulfil the requirement you clearly stated in the command line). I haven't seen anything that makes me think that redesigning from scratch is in any way superior to refining what we already agreed upon; and it's distracting us from the questions of reporting and correcting manager failure when it occurs. I assert the following series of arguments: * users may discover at any time that they need to make an existing environment HA, so ensure-ha is *always* a reasonable user action * users who *don't* need an HA environment can, by definition, afford to take the environment down and reconstruct it without HA if it becomes unimportant * therefore, scaling management *down* is not the highest priority for us (but is nonetheless easily amenable to future control via the ensure-ha command -- just explicitly set a lower redundancy number) * similarly, allowing users to *directly* destroy management machines enables exciting new failure modes that don't really need to exist * the notion of HA is somewhat limited in worth when there's no way to make a vulnerable environment robust again * the more complexity we shovel onto the user's plate, the less likely she is to resolve the situation correctly under stress * the most obvious, and foolproof, command for repairing HA would be ensure-ha itself, which could very reasonably take it upon itself to replace manager nodes detected as down -- assuming a robust presence implementation, which we need anyway, this (1) works trivially for machines that die unexpectedly and (2) allows a backdoor for resolution of weird situations: the user can manually shutdown a misbehaving manager out-of-band, and run ensure-ha to cause a new one to be spun up in its place; once HA is restored, the old machine will no longer be a manager, no longer be indestructible, and can be cleaned up at leisure * the notion is even more limited when you can't even tell when something goes wrong * therefore, HA state should *at least* be clearly and loudly communicated in status * but that's not very proactive, and I'd like to see a plan for how we're going to respond to these situations when we detect them * the data accessible to a manager node is sensitive, and we shouldn't generally be putting manager nodes on dirty machines; but density is an important consideration, and I don't think it's confusing to allow preferred machines to be specified in ensure-ha, such that *if* management capacity needs to be added it will be put onto those machines before finding clean ones or provisioning new ones *
Re: High Availability command line interface - future plans.
On Fri, Nov 8, 2013 at 7:31 PM, Gustavo Niemeyer gust...@niemeyer.netwrote: It sounds like we should avoid using a management command for anything in juju, though. Most things in juju are about management one way or the other, so juju management becomes very unclear and hard to search for. I'd also considered this spelling at one point in my doodling on CLI API yesterday: juju ha setup --to [list, of, machines] creates 3 servers (optionally on the specified machines) juju ha status tells me details about the state server status juju ha add-servers -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
It doesn't feel like the difference between juju ensure-ha --prefer-machines 11,37 and juju add-state-server --to 11,37 is worth the amount of reasoning there. I'm clearly in favor of the latter, but I wouldn't argue so much for it. On Fri, Nov 8, 2013 at 2:00 PM, William Reade william.re...@canonical.com wrote: I'm concerned that we're (1) rehashing decisions made during the sprint and (2) deviating from requirements in doing so. In particular, abstracting HA away into management manipulations -- as roger notes, pretty much isomorphic to the jobs proposal -- doesn't give users HA so much as it gives them a limited toolkit with which they can more-or-less construct their own HA; in particular, allowing people to use an even number of state servers is strictly a bad thing [0], and I'm extremely suspicious of any proposal that opens that door. Of course, some will argue that mongo should be able to scale separately from the api servers and other management tasks, and this is a worthy goal; but in this context it sucks us down into the morass of exposing different types of management on different machines, and ends up approaching the jobs proposal still closer, in that it requires users to assimilate a whole load of extra terminology in order to perform a conceptually simple function. Conversely, ensure-ha (with possible optional --redundancy=N flag, defaulting to 1) is a simple model that can be simply explained: the command's sole purpose is to ensure that juju management cannot fail as a result to the simultaneous failure of =N machines. It's a *user-level* construct that will always be applicable even in the context of a more sophisticated future language (no matter what's going on with this complicated management/jobs business, you can run that and be assured you'll end up with at least enough manager machines to fulfil the requirement you clearly stated in the command line). I haven't seen anything that makes me think that redesigning from scratch is in any way superior to refining what we already agreed upon; and it's distracting us from the questions of reporting and correcting manager failure when it occurs. I assert the following series of arguments: * users may discover at any time that they need to make an existing environment HA, so ensure-ha is *always* a reasonable user action * users who *don't* need an HA environment can, by definition, afford to take the environment down and reconstruct it without HA if it becomes unimportant * therefore, scaling management *down* is not the highest priority for us (but is nonetheless easily amenable to future control via the ensure-ha command -- just explicitly set a lower redundancy number) * similarly, allowing users to *directly* destroy management machines enables exciting new failure modes that don't really need to exist * the notion of HA is somewhat limited in worth when there's no way to make a vulnerable environment robust again * the more complexity we shovel onto the user's plate, the less likely she is to resolve the situation correctly under stress * the most obvious, and foolproof, command for repairing HA would be ensure-ha itself, which could very reasonably take it upon itself to replace manager nodes detected as down -- assuming a robust presence implementation, which we need anyway, this (1) works trivially for machines that die unexpectedly and (2) allows a backdoor for resolution of weird situations: the user can manually shutdown a misbehaving manager out-of-band, and run ensure-ha to cause a new one to be spun up in its place; once HA is restored, the old machine will no longer be a manager, no longer be indestructible, and can be cleaned up at leisure * the notion is even more limited when you can't even tell when something goes wrong * therefore, HA state should *at least* be clearly and loudly communicated in status * but that's not very proactive, and I'd like to see a plan for how we're going to respond to these situations when we detect them * the data accessible to a manager node is sensitive, and we shouldn't generally be putting manager nodes on dirty machines; but density is an important consideration, and I don't think it's confusing to allow preferred machines to be specified in ensure-ha, such that *if* management capacity needs to be added it will be put onto those machines before finding clean ones or provisioning new ones * strawman syntax: juju ensure-ha --prefer-machines 11,37 to place any additional manager tasks that may be required on the supplied machines in order of preference -- but even this falls far behind the essential goal, which is make HA *easy* for our users. * (ofc, we should continue not to put units onto manager machines by default, but allow them when forced with --to as before) I don't believe that any of this precludes more sophisticated management of juju's internal functions *when* the need becomes
Re: High Availability command line interface - future plans.
On 6 November 2013 20:07, Kapil Thangavelu kapil.thangav...@canonical.com wrote: instead of adding more complexity and concepts, it would be ideal if we could reuse the primitives we already have. ie juju environments have three user exposed services, that users can add-unit / remove-unit etc. they have a juju prefix and therefore are omitted by default from status listing. That's a much simpler story to document. how do i scale my state server.. juju add-unit juju-db... my provisioner juju add-unit juju-provisioner. I have a lot of sympathy with this point of view. I've thought about it quite a bit. I see two possibilities for implementing it: 1) Keep something like the existing architecture, where machine agents can take on managerial roles, but provide a veneer over the top which specially interprets service operations on the juju built-in services and translates them into operations on machine jobs. 2) Actually implement the various juju services as proper services. The difficulty I have with 1) is that there's a significant mismatch between the user's view of things and what's going on underneath. For instance, with a built-in service, can I: - add a subordinate service to it? - see the relevant log file in the usual place for a unit? - see its charm metadata? - join to its juju-info relation? If it's a single service, how can its units span different series? (presumably it has got a charm URL, which includes the series) I fear that if we try this approach, the cracks show through and the result is a system that's hard to understand because too many things are not what they appear. And that's not even going into the plethora of special casing that this approach would require throughout the code. 2) is more attractive, as it's actually doing what's written on the label. But this has its own problems. - it's a highly significant architectural change. - juju managerial services are tightly tied into the operation of juju itself (not surprisingly). There are many chicken and egg problems here - we would be trying to use the system to support itself, and that could easily lead to deadlock as one part of the system tries to talk to another part of the system that relies on the first. I think it *might* be possible, but it's not gonna be easy and I suspect nasty gotchas at the end of a long development process. - again there are inevitably going to be many special cases throughout the code - for instance, how does a unit acquire the credentials it needs to talk to the API server? It may be that a hybrid approach is possible - for example implementing the workers as a service and still having mongo and the API server as machine workers. I think that's a reasonable evolutionary step from the approach I'm proposing. The reasoning behind my proposed approach perhaps comes from the fact that (I'm almost ashamed to admit it) I'm a lazy programmer. I don't like creating mountains of code where a small amount will do almost as well. Adding the concept of jobs on machines maps very closely to the architecture that we have today. It is a single extra concept for the user to understand - all the other features (e.g. add-machine and destroy-machine) are already exposed. I agree that in an ideal world we would scale juju meta-services just as we would scale normal services, but I think it's actually reasonable to have a special case here. Allowing the user to know that machines can take on juju managerial roles doesn't seem to be a huge ask. And we get just as much functionality with considerably less code, which seems like a significant win to me in terms of ongoing maintainability and agility for the future. cheers, rog. PS apologies; my last cross-post, honest! followups to juju-dev@lists.ubuntu.com only. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
The answer to how does the user know how to X? is the same as it always has been. Documentation. Now, that's not to say that we still don't need to do some work to make it intuitive... but I think that for something that is complicated like HA, leaning on documentation a little more is ok. More inline: On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? - As above, documentation about what it means when you see servers in juju status labelled as Juju State Server (or whatever). - Have actual feedback from commands: $ juju bootstrap --high-availability Machines 0, 1, and 2 provisioned as juju server nodes. Juju successfully bootstrapped environment Foo in high availability mode. or $ juju bootstrap Machine 0 provisioned as juju server node. Juju successfully bootstrapped environment Foo. $ juju ensure-ha -n 7 Enabling high availability mode with 7 juju servers. Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes. $ juju ensure-ha -n 5 Reducing number of Juju server nodes to 5. Machines 2 and 6 destroyed. b) fix the system when a machine dies? $ juju destroy-machine 5 Destroyed machine/5. Automatically replacing destroyed Juju server node. Machine/8 created as new Juju server node. c) scale up the system to x thousand nodes Hopefully 12 machines is plenty of Juju servers for 5000 nodes. We will need to revisit this if it's not, but it seems like it should be plenty. As above, I think a simple -n is fine for both raising and lowering the number of state servers. If we get to the point of needing more than d) scale down the system? $ juju disable-ha -y Destroyed machine/1 and machine/2. The Juju server node for environment Foo is machine/0. High availability mode disabled for Juju environment Foo. -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: High Availability command line interface - future plans.
On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? b) fix the system when a machine dies? c) scale up the system to x thousand nodes? d) scale down the system? For a), we could tag a machine in the status as a state server, and hope that the user knows what that means. For b) the suggestion is that the user notice that a state server machine is non-responsive (as marked in status) and runs destroy-machine on it, which will notice that it's a state server machine and automatically start another one to replace it. Destroy-machine would refuse to work on a state server machine that seems to be alive. For c) we could add a flag to ensure-ha-state suggesting a desired number of state-server nodes. I'm not sure what the suggestion is for d) given that we refuse to destroy live state-server machines. Although ensure-ha-state might be a fine way to turn on HA initially I'm not entirely happy with expanding it to cover all the above cases. It seems to me like we're going to create a leaky abstraction that purports to be magic (just wave the HA wand!) and ends up being limiting, and in some cases confusing (Huh? I asked to destroy that machine and there's another one just been created) I believe that any user that's using HA will need to understand that some machines are running state servers, and when things fail, they will need to manage those machines individually (for example by calling destroy-machine). I also think that the solution to c) is limiting, because there is actually no such thing as a state server - we have at least three independently scalable juju components (the database servers (mongodb), the API servers and the environment managers) with different scaling characteristics. I believe that in any sufficiently large environment, the user will not want to scale all of those at the same rate. For example MongoDB will allow at most 12 members of a replica set, but a caching API server could potentially usefully scale up much higher than that. We could add more flags to ensure-ha-state (e.g.--state-server-count) but we then we'd lack the capability to suggest which might be grouped with which. PROPOSAL My suggestion is that we go for a slightly less magic approach. that provides the user with the tools to manage their own high availability set up, adding appropriate automation in time. I suggest that we let the user know that machines can run as juju server nodes, and provide them with the capability to *choose* which machines will run as server nodes and which can host units - that is, what *jobs* a machine will run. Here's a possible proposal: We already have an add-machine command. We'd add a --jobs flag to allow the user to specify the jobs that the new machine(s) will run. Initially we might have just two jobs, manager and unit - the machine can either host service units, or it can manage the juju environment (including running the state server database), or both. In time we could add finer levels of granularity to allow separate scalability of juju server components, without losing backwards compatibility. If the new machine is marked as a manager, it would run a mongo replica set peer. This *would* mean that it would be possible to have an even number of mongo peers, with the potential for a split vote if the nodes were partitioned evenly, and resulting database stasis. I don't *think* that would actually be a severe problem in practice. We would make juju status point out the potential problem very clearly, just as it should point out the potential problem if one of an existing odd-sized replica set dies. The potential problems are the same in both cases, and are straightforward for even a relatively naive user to avoid. Thus, juju ensure-ha-state is almost equivalent to: juju add-machine --jobs manager -n 2 In my view, this command feels less magic than ensure-ha-state - the runtime implication (e.g. cost) of what's going on are easier for the user to understand and it requires no new entities in a user's model of the system. In addition to the new add-machine flag, we'd add a single new command, juju machine-jobs, which would allow the user to change the jobs associated with an existing machine. That could be a later addition - it's not necessary in the first cut. With these primitives, I *think* the responsibilities of the system and the model to the user become clearer. Looking back to the original user questions:
Re: High Availability command line interface - future plans.
+1 (million), this solution keeps coming up, and I still feel it is the right one. On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu kapil.thangav...@canonical.com wrote: On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? b) fix the system when a machine dies? c) scale up the system to x thousand nodes? d) scale down the system? For a), we could tag a machine in the status as a state server, and hope that the user knows what that means. For b) the suggestion is that the user notice that a state server machine is non-responsive (as marked in status) and runs destroy-machine on it, which will notice that it's a state server machine and automatically start another one to replace it. Destroy-machine would refuse to work on a state server machine that seems to be alive. For c) we could add a flag to ensure-ha-state suggesting a desired number of state-server nodes. I'm not sure what the suggestion is for d) given that we refuse to destroy live state-server machines. Although ensure-ha-state might be a fine way to turn on HA initially I'm not entirely happy with expanding it to cover all the above cases. It seems to me like we're going to create a leaky abstraction that purports to be magic (just wave the HA wand!) and ends up being limiting, and in some cases confusing (Huh? I asked to destroy that machine and there's another one just been created) I believe that any user that's using HA will need to understand that some machines are running state servers, and when things fail, they will need to manage those machines individually (for example by calling destroy-machine). I also think that the solution to c) is limiting, because there is actually no such thing as a state server - we have at least three independently scalable juju components (the database servers (mongodb), the API servers and the environment managers) with different scaling characteristics. I believe that in any sufficiently large environment, the user will not want to scale all of those at the same rate. For example MongoDB will allow at most 12 members of a replica set, but a caching API server could potentially usefully scale up much higher than that. We could add more flags to ensure-ha-state (e.g.--state-server-count) but we then we'd lack the capability to suggest which might be grouped with which. PROPOSAL My suggestion is that we go for a slightly less magic approach. that provides the user with the tools to manage their own high availability set up, adding appropriate automation in time. I suggest that we let the user know that machines can run as juju server nodes, and provide them with the capability to *choose* which machines will run as server nodes and which can host units - that is, what *jobs* a machine will run. Here's a possible proposal: We already have an add-machine command. We'd add a --jobs flag to allow the user to specify the jobs that the new machine(s) will run. Initially we might have just two jobs, manager and unit - the machine can either host service units, or it can manage the juju environment (including running the state server database), or both. In time we could add finer levels of granularity to allow separate scalability of juju server components, without losing backwards compatibility. If the new machine is marked as a manager, it would run a mongo replica set peer. This *would* mean that it would be possible to have an even number of mongo peers, with the potential for a split vote if the nodes were partitioned evenly, and resulting database stasis. I don't *think* that would actually be a severe problem in practice. We would make juju status point out the potential problem very clearly, just as it should point out the potential problem if one of an existing odd-sized replica set dies. The potential problems are the same in both cases, and are straightforward for even a relatively naive user to avoid. Thus, juju ensure-ha-state is almost equivalent to: juju add-machine --jobs manager -n 2 In my view, this command feels less magic than ensure-ha-state - the runtime implication (e.g. cost) of what's going on are easier for the user to understand and it requires no new entities in a user's model of the system. In addition to the new add-machine flag, we'd add a single new command, juju machine-jobs, which would allow the user to change the jobs associated with an existing machine. That could be a later addition - it's not necessary
Re: High Availability command line interface - future plans.
just my tuppence... Would it not be clearer to add an additional command to implement your proposal? E.g. add-manager and possibly destroy/remove-manager This could also support switches for later fine control, and possibly be less open to misinterpretation than overloading the add-machine command? Nick On Wed, Nov 6, 2013 at 6:49 PM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? b) fix the system when a machine dies? c) scale up the system to x thousand nodes? d) scale down the system? For a), we could tag a machine in the status as a state server, and hope that the user knows what that means. For b) the suggestion is that the user notice that a state server machine is non-responsive (as marked in status) and runs destroy-machine on it, which will notice that it's a state server machine and automatically start another one to replace it. Destroy-machine would refuse to work on a state server machine that seems to be alive. For c) we could add a flag to ensure-ha-state suggesting a desired number of state-server nodes. I'm not sure what the suggestion is for d) given that we refuse to destroy live state-server machines. Although ensure-ha-state might be a fine way to turn on HA initially I'm not entirely happy with expanding it to cover all the above cases. It seems to me like we're going to create a leaky abstraction that purports to be magic (just wave the HA wand!) and ends up being limiting, and in some cases confusing (Huh? I asked to destroy that machine and there's another one just been created) I believe that any user that's using HA will need to understand that some machines are running state servers, and when things fail, they will need to manage those machines individually (for example by calling destroy-machine). I also think that the solution to c) is limiting, because there is actually no such thing as a state server - we have at least three independently scalable juju components (the database servers (mongodb), the API servers and the environment managers) with different scaling characteristics. I believe that in any sufficiently large environment, the user will not want to scale all of those at the same rate. For example MongoDB will allow at most 12 members of a replica set, but a caching API server could potentially usefully scale up much higher than that. We could add more flags to ensure-ha-state (e.g.--state-server-count) but we then we'd lack the capability to suggest which might be grouped with which. PROPOSAL My suggestion is that we go for a slightly less magic approach. that provides the user with the tools to manage their own high availability set up, adding appropriate automation in time. I suggest that we let the user know that machines can run as juju server nodes, and provide them with the capability to *choose* which machines will run as server nodes and which can host units - that is, what *jobs* a machine will run. Here's a possible proposal: We already have an add-machine command. We'd add a --jobs flag to allow the user to specify the jobs that the new machine(s) will run. Initially we might have just two jobs, manager and unit - the machine can either host service units, or it can manage the juju environment (including running the state server database), or both. In time we could add finer levels of granularity to allow separate scalability of juju server components, without losing backwards compatibility. If the new machine is marked as a manager, it would run a mongo replica set peer. This *would* mean that it would be possible to have an even number of mongo peers, with the potential for a split vote if the nodes were partitioned evenly, and resulting database stasis. I don't *think* that would actually be a severe problem in practice. We would make juju status point out the potential problem very clearly, just as it should point out the potential problem if one of an existing odd-sized replica set dies. The potential problems are the same in both cases, and are straightforward for even a relatively naive user to avoid. Thus, juju ensure-ha-state is almost equivalent to: juju add-machine --jobs manager -n 2 In my view, this command feels less magic than ensure-ha-state - the runtime implication (e.g. cost) of what's going on are easier for the user to understand and it requires no new entities in a user's model of the system. In addition to the new add-machine flag, we'd add a single new command, juju machine-jobs, which
Re: High Availability command line interface - future plans.
The answer to how does the user know how to X? is the same as it always has been. Documentation. Now, that's not to say that we still don't need to do some work to make it intuitive... but I think that for something that is complicated like HA, leaning on documentation a little more is ok. More inline: On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? - As above, documentation about what it means when you see servers in juju status labelled as Juju State Server (or whatever). - Have actual feedback from commands: $ juju bootstrap --high-availability Machines 0, 1, and 2 provisioned as juju server nodes. Juju successfully bootstrapped environment Foo in high availability mode. or $ juju bootstrap Machine 0 provisioned as juju server node. Juju successfully bootstrapped environment Foo. $ juju ensure-ha -n 7 Enabling high availability mode with 7 juju servers. Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes. $ juju ensure-ha -n 5 Reducing number of Juju server nodes to 5. Machines 2 and 6 destroyed. b) fix the system when a machine dies? $ juju destroy-machine 5 Destroyed machine/5. Automatically replacing destroyed Juju server node. Machine/8 created as new Juju server node. c) scale up the system to x thousand nodes Hopefully 12 machines is plenty of Juju servers for 5000 nodes. We will need to revisit this if it's not, but it seems like it should be plenty. As above, I think a simple -n is fine for both raising and lowering the number of state servers. If we get to the point of needing more than d) scale down the system? $ juju disable-ha -y Destroyed machine/1 and machine/2. The Juju server node for environment Foo is machine/0. High availability mode disabled for Juju environment Foo. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
Oops, missed the end of a thought there. If we get to the point of needing more than 12 server nodes (not unfathomable), then we have to start doing some more work for our hyperscale customers, which will probably involve much more customization and require much more knowledge of the system. I think one of the points of making HA simple is that we don't want people to have to learn how Juju works before they can deploy their own stuff in a robust manner. Keep the barrier of entry as low as possible. We can give general guidelines about how many Juju servers you need for N unit agents, and then people will know what to set N to, when they do juju ensure-ha -n. I think most people will be happy knowing there are N servers out there, and if one goes down, another will take its place. They don't want to know about this job and that job. Just make it work and let me get on with my life. That's kind of the whole point of Juju, right? On Wed, Nov 6, 2013 at 2:56 PM, Nate Finch nate.fi...@canonical.com wrote: The answer to how does the user know how to X? is the same as it always has been. Documentation. Now, that's not to say that we still don't need to do some work to make it intuitive... but I think that for something that is complicated like HA, leaning on documentation a little more is ok. More inline: On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? - As above, documentation about what it means when you see servers in juju status labelled as Juju State Server (or whatever). - Have actual feedback from commands: $ juju bootstrap --high-availability Machines 0, 1, and 2 provisioned as juju server nodes. Juju successfully bootstrapped environment Foo in high availability mode. or $ juju bootstrap Machine 0 provisioned as juju server node. Juju successfully bootstrapped environment Foo. $ juju ensure-ha -n 7 Enabling high availability mode with 7 juju servers. Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes. $ juju ensure-ha -n 5 Reducing number of Juju server nodes to 5. Machines 2 and 6 destroyed. b) fix the system when a machine dies? $ juju destroy-machine 5 Destroyed machine/5. Automatically replacing destroyed Juju server node. Machine/8 created as new Juju server node. c) scale up the system to x thousand nodes Hopefully 12 machines is plenty of Juju servers for 5000 nodes. We will need to revisit this if it's not, but it seems like it should be plenty. As above, I think a simple -n is fine for both raising and lowering the number of state servers. If we get to the point of needing more than d) scale down the system? $ juju disable-ha -y Destroyed machine/1 and machine/2. The Juju server node for environment Foo is machine/0. High availability mode disabled for Juju environment Foo. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote: The current plan is to have a single juju ensure-ha-state juju command. This would create new state server machines if there are less than the required number (currently 3). Taking that as given, I'm wondering what we should do in the future, when users require more than a single big On switch for HA. How does the user: a) know about the HA machines so the costs of HA are not hidden, and that the implications of particular machine failures are clear? b) fix the system when a machine dies? c) scale up the system to x thousand nodes? d) scale down the system? For a), we could tag a machine in the status as a state server, and hope that the user knows what that means. For b) the suggestion is that the user notice that a state server machine is non-responsive (as marked in status) and runs destroy-machine on it, which will notice that it's a state server machine and automatically start another one to replace it. Destroy-machine would refuse to work on a state server machine that seems to be alive. For c) we could add a flag to ensure-ha-state suggesting a desired number of state-server nodes. I'm not sure what the suggestion is for d) given that we refuse to destroy live state-server machines. Although ensure-ha-state might be a fine way to turn on HA initially I'm not entirely happy with expanding it to cover all the above cases. It seems to me like we're going to create a leaky abstraction that purports to be magic (just wave the HA wand!) and ends up being limiting, and in some cases confusing (Huh? I asked to destroy that machine and there's another one just been created) I believe that any user that's using HA will need to understand that some machines are running state servers, and when things fail, they will need to manage those machines individually (for example by calling destroy-machine). I also think that the solution to c) is limiting, because there is actually no such thing as a state server - we have at least three independently scalable juju components (the database servers (mongodb), the API servers and the environment managers) with different scaling characteristics. I believe that in any sufficiently large environment, the user will not want to scale all of those at the same rate. For example MongoDB will allow at most 12 members of a replica set, but a caching API server could potentially usefully scale up much higher than that. We could add more flags to ensure-ha-state (e.g.--state-server-count) but we then we'd lack the capability to suggest which might be grouped with which. PROPOSAL My suggestion is that we go for a slightly less magic approach. that provides the user with the tools to manage their own high availability set up, adding appropriate automation in time. I suggest that we let the user know that machines can run as juju server nodes, and provide them with the capability to *choose* which machines will run as server nodes and which can host units - that is, what *jobs* a machine will run. Here's a possible proposal: We already have an add-machine command. We'd add a --jobs flag to allow the user to specify the jobs that the new machine(s) will run. Initially we might have just two jobs, manager and unit - the machine can either host service units, or it can manage the juju environment (including running the state server database), or both. In time we could add finer levels of granularity to allow separate scalability of juju server components, without losing backwards compatibility. If the new machine is marked as a manager, it would run a mongo replica set peer. This *would* mean that it would be possible to have an even number of mongo peers, with the potential for a split vote if the nodes were partitioned evenly, and resulting database stasis. I don't *think* that would actually be a severe problem in practice. We would make juju status point out the potential problem very clearly, just as it should point out the potential problem if one of an existing odd-sized replica set dies. The potential problems are the same in both cases, and are straightforward for even a relatively naive user to avoid. Thus, juju ensure-ha-state is almost equivalent to: juju add-machine --jobs manager -n 2 In my view, this command feels less magic than ensure-ha-state - the runtime implication (e.g. cost) of what's going on are easier for the user to understand and it requires no new entities in a user's model of the system. In addition to the new add-machine flag, we'd add a single new command, juju machine-jobs, which would allow the user to change the jobs associated with an existing machine. That could be a later addition - it's not necessary in the first cut. With these primitives, I *think* the responsibilities of the system and the model to the user become clearer. Looking back to the original user questions:
Re: High Availability command line interface - future plans.
On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com wrote: So, I haven't been involved directly in a lot of the discussion, but my 2c is: +1 to juju ensure-ha Users don't give a f*ck about how Juju achieves HA, they just want to know their data will survive a node outage. What Juju does under the covers to make that happen, what jobs are run on what nodes etc - that's for Juju to care about. I'm not so sure about that. I expect there'll be users who wants to know *exactly* how it works, because otherwise they won't feel they can trust it with their services. That's not to say that ensure-ha can't be trusted - just that some users will want to know what it's doing under the covers. Speculative, but based on past experience with banks, insurance companies, etc. Another thing to consider is that one person's HA is not the next person's. I may want to disperse my state servers across multiple regions (were that supported); you might find this costs too much in inter-region traffic. What happens if I have a temporary outage in one region - where does ensure-ha automatically spin up a new one? What happens when the original comes back? Each of these things are things people may want to do differently, because they each have different trade-offs. I'm not really keen on ensure-ha due to the magical nature, but if it's just a stop gap... I guess. +1 to high level, namespaced services (juju:api, juju:db etc) This is a step above ensure-ha for more advanced users, but one which still presents the solution space in terms any IS person involved in managing things like scalable web services understands. ie there's the concept of services which process requests and those which store data, and those which insert role here. If the volume of incoming requests are such that the load on the api servers is high while the database is still coping ok, juju add-unit juju:api -n 3 can be used to solve that efficiently, and vice versa. So it's all about mapping what Juju does to terms and concepts already understood, and getting the level of abstraction correct so the solution is usable by the target audience. Anything that involves exposing things like jobs etc is not the right way of looking at it IMO. I had suggested something very similar (add-machine --state) at SFO to what Roger's suggested, but I can see the arguments against it. Overloading add-unit seems like a decent alternative. Cheers, Andrew -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
On 07/11/13 15:00, Andrew Wilkins wrote: On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com mailto:ian.bo...@canonical.com wrote: So, I haven't been involved directly in a lot of the discussion, but my 2c is: +1 to juju ensure-ha Users don't give a f*ck about how Juju achieves HA, they just want to know their data will survive a node outage. What Juju does under the covers to make that happen, what jobs are run on what nodes etc - that's for Juju to care about. I'm not so sure about that. I expect there'll be users who wants to know *exactly* how it works, because otherwise they won't feel they can trust it with their services. That's not to say that ensure-ha can't be trusted - just that some users will want to know what it's doing under the covers. Speculative, but based on past experience with banks, insurance companies, etc. I think if we gave no feedback at all, then yes, this would feel like magic. However, I'd expect us to at least say what we are doing on the command line :-) I think ensure-ha is sufficient for a first cut, and a way to get ha on a running system. For the record, we discussed the default behaviour for ensure-ha was to make three nodes of manager services. The user could override this by specifying -n 5 or -n 7, or some other odd number. Another thing to consider is that one person's HA is not the next person's. I may want to disperse my state servers across multiple regions (were that supported); you might find this costs too much in inter-region traffic. What happens if I have a temporary outage in one region - where does ensure-ha automatically spin up a new one? What happens when the original comes back? Each of these things are things people may want to do differently, because they each have different trade-offs. I agree that support over regions is an important idea, but this is way outside the scope of this HA discussion. AFAIK, our cross-region story is still all about cross-environment relations, not spanning regions with one environment. Tim -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: High Availability command line interface - future plans.
Hi guys, I'm glad j...@lists.ubuntu.com got accidentally looped in because I may not have caught wind of this. I can understand both sides of the discussion, one where we provide more magic and the users trust that it works and the other where we leverage existing components and command structures of juju to provide this magic. I have to agree with Kapil's point about add-unit/remove-unit syntax for Juju HA. Having had to teach and demonstrate juju to quite a few people now, juju is not an easy concept to grasp. Orchestration is really something that people are just now starting to think about in general, never mind how to wrap their heads around the concept and then furthermore how we envision that concept, which is distilled in our product - juju. At the end of the day I get it, we get it, it's easy for us because we're here building it, but for the people out there it's a whole new language. If we start off by saying Oh, hey there, just run this ensure-ha command and things will just be fantastic is fine, but once you open up that route it's going to be hard to back-peddle. We already teach Oh, your services is popular? Just `juju add-unit service` magic will happen, units will fire, and you've scaled up. You've added an additional available unit and you're safer than you were before. Being able to convey the same strategy for when you want to safeguard and make Juju's bootstrap a highly available service, the natural logic would be to `juju add-unit`. In fact I was even asked this in a Charm School recently, I'm paraphrasing but it was to some extent Can I juju add-unit bootstrap. Since the majority of people seem to believe having the ultimate goal of adding and removing juju specific services via a unique and reserved namespace is a great goal to have it seems not shooting for that first would simply introduce another awkward period of time which we have this great feature but it's going to change soon so videos, blog posts, content we produce to promote this shear awesomeness becomes stale and out of date just as soon as the more permanent method of HA lands. For new users learning a language this just becomes another hurtle to overcome in order to be an expert and one more reason to look at something else other than Juju. Therefore, I (who really has no major say in this, simply because I'm not capable of helping produce a solution) believe it's best to work for the ultimate goal now instead of having to build a stop gap just to say we have HA. On a final note, if namespacing does become a thing, can we *please *use a unique character for the separation of namespace:service? A : would be fantastic as calling something juju-db could very well be mistaken or deployed as another service? `juju deploy some-random-thing juju-*` now we have things sharing a special namespace that aren't actually special. (Like juju-gui, though juju-gui is quite special and awesome, it's not juju- core namespace special). Thanks for all the awesome work you all do. I look forward to a solution, whatever it may be, in the future! Marco Ceppi On Wed, Nov 6, 2013 at 9:22 PM, Tim Penhey tim.pen...@canonical.com wrote: On 07/11/13 15:00, Andrew Wilkins wrote: On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com mailto:ian.bo...@canonical.com wrote: So, I haven't been involved directly in a lot of the discussion, but my 2c is: +1 to juju ensure-ha Users don't give a f*ck about how Juju achieves HA, they just want to know their data will survive a node outage. What Juju does under the covers to make that happen, what jobs are run on what nodes etc - that's for Juju to care about. I'm not so sure about that. I expect there'll be users who wants to know *exactly* how it works, because otherwise they won't feel they can trust it with their services. That's not to say that ensure-ha can't be trusted - just that some users will want to know what it's doing under the covers. Speculative, but based on past experience with banks, insurance companies, etc. I think if we gave no feedback at all, then yes, this would feel like magic. However, I'd expect us to at least say what we are doing on the command line :-) I think ensure-ha is sufficient for a first cut, and a way to get ha on a running system. For the record, we discussed the default behaviour for ensure-ha was to make three nodes of manager services. The user could override this by specifying -n 5 or -n 7, or some other odd number. Another thing to consider is that one person's HA is not the next person's. I may want to disperse my state servers across multiple regions (were that supported); you might find this costs too much in inter-region traffic. What happens if I have a temporary outage in one region - where does ensure-ha automatically spin up a new one? What happens when the original comes back? Each of these things are things people may want