On 18 November 2014 12:23, Ian Booth <ian.bo...@canonical.com> wrote:
> On 17/11/14 15:47, Stuart Bishop wrote: >> On 17 November 2014 07:13, Ian Booth <ian.bo...@canonical.com> wrote: >> >>> The new Juju Status work planned for this cycle will hopefully address the >>> main >>> concern about knowing when a deployed charm is fully ready to do the work >>> for >>> which it was installed. ie the current situation whereby a unit is marked as >>> Started but it not ready. Charms are able to mark themselves as Busy and >>> also >>> set a status message to indicate they are churning and not ready to run. >>> Charms >>> can also indicate that they are Blocked and require manual intervention (eg >>> a >>> service needs a database and no relation has been established yet to >>> provide the >>> database), or Waiting (the database on which the service relies is busy but >>> will >>> resolve automatically when the database is available again). >> >> As long as the 'ready' state is managed by juju and not the unit, I'll >> stand happily corrected :-) The focus I'd seen had been on the unit >> declaring its own status, and there is no way for a unit to know that >> is ready because it has no way of knowing that, for example, there are >> another 10 peer units being provisioned that will need to be related. >> > > You are correct that the initial scope of work is more about the unit, and > less > about the deployment as a whole. There are plans though to address the issue. > We're throwing around the concept of a "goal state", which is conceptually > akin > to looking forward in time to be able to inform units what relations they will > expect to participate in and what units will be deployed. They'd likely be > something like a relation-goals hook tool (to compliment relation-list and > relation-ids), as well as hook(s) for when the goal state changes. There's > ongoing work in the uniter by William to get the architecture right so this > work > can be considered. There's still a lot of value in the current Juju Status > work, > but as you point out, it's not the full story. Ok. If there is a goal state, and I am able to wait until the goal state is the actual state, then my needs (and amulet and juju-deployer needs) will be met. It does seem a rather lengthy and long winded way of getting there though. The question I have always needed juju to answer is 'are there any hooks running or are there any hooks queued to run?'. I've always assumed that juju must already know this (or it would be unable to function), but refuses to communicate this single bit of information in any way. >>> So although there are not currently plans to show the number of running >>> hooks in >>> the first phase of this work, mechanisms are being provided to allow charm >>> authors to better communicate the state of their charms to give much >>> clearer and >>> more accurate feedback as to 1) when a charm is fully ready to do work, 2) >>> if a >>> charm is not ready to do work, why not. >> >> A charm declaring itself ready is part of the picture. What is more >> important is when the system is ready. You don't want to start pumping >> requests through your 'ready' webserver, only to have it torn away as >> a new block device is mounted on your database when its storage-joined >> hook is invoked and returned to 'ready' state again once the >> storage-changed hook has completed successfully. >> > > Also being thrown around is the concept of a new agent-state called "Idle", > which would be used when there are no pending hooks to run. There are plans as That would work too. If all units are in idle state, then the system has reached a steady state and my question answered. > well for the next phase of the Juju status work to allow collaborating > services > to notify when they are busy, and mark relationships as down. So if the > database > had it's storage-attached hook invoked, it would mark itself as Busy, mark its > relation to the webserver as Down, thus allowing the webserver to put itself > into Waiting. Or, if we are talking about the initial install phase, the > database would not initially mark itself as Running until its declared storage > requirements were met, so the webserver would go from Installing to Waiting > and > then to Running one the database became Running. I'm not entirely sure how useful this feature is, given the inherent race conditions with serialized hooks. Right now, you need to write charms that gracefully cope with dependent services that have gone down without notice. With this feature, you will need to write charms that gracefully cope with dependent services that have gone down and the notification hasn't reached you yet. Or if the outage is for non-juju reasons, like a network partition. The window of time waiting for hooks to bubble through could easily be minutes when you have a simple chain of services (eg. postgresql -> pgbouncer -> django -> haproxy -> apache seems common enough). Your example with storage is particularly interesting, as I was just dealing with this yesterday in my rewrite of the Cassandra charm. The existing mechanism in the charm is broken. If you add a new unit to the service, it runs its install and configure hooks and is READY. It then joins the peer relation, and is still READY. The peer units start spewing data at it, as the replication ring is rebalanced. We now have a race. Will the storage hooks fire in time? The new unit unaware that storage is due to be attached, and does not know that, unless the storage is attached and the data migrated from local disk soon, the local disk will fill and the unit will fall over. To solve this with the current storage-broker subordinate, I could require the operator to set an 'wait_for_block_storage' boolean in the service configuration before deploy. But requiring people to read and follow the documentation is an error prone solution :-( I'm wondering if I should simply not bother fixing this race, and trust that the block storage broker hooks will be invoked and completed before local disk is filled. I understand that work is underway to replace the block storage broker so it won't be an issue long term, or your goal state would be useful here if a unit can ask questions like 'is storage going to be attached' or 'will peers be joining me'. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev