On Thu, Mar 17, 2016 at 8:41 AM, Ian Booth <ian.bo...@canonical.com> wrote:

>
> Machines, services and units all now support recording status history. Two
> issues have come up:
>
> 1. https://bugs.launchpad.net/juju-core/+bug/1530840
>
> For units, especially in steady state, status history is spammed with
> update-status hook invocations which can obscure the hooks we really care
> about
>
> 2. https://bugs.launchpad.net/juju-core/+bug/1557918
>
> We now have the concept of recording a machine provisioning status. This is
> great because it gives observability to what is happening as a node is
> being
> allocated in the cloud. With LXD, this feature has been used to give
> visibility
> to progress of the image downloads (finally, yay). But what happens is
> that the
> machine status history gets filled with lots of "Downloading x%" type
> messages.
>
> We have a pruner which caps the history to 100 entries per entity. But we
> need a
> way to deal with the spam, and what is displayed when the user asks for
> juju
> status-history.
>
> Options to solve bug 1
>
> A.
> Filter out duplicate status entries when presenting to the user. eg say
> "update-status (x43)". This still allows the circular buffer for that
> entity to
> fill with "spam" though. We could make the circular buffer size much
> larger. But
> there's still the issue of UX where a user ask for the X most recent
> entries.
> What do we give them? The X most recent de-duped entries?
>
> B.
> If the we go to record history and the current previous entry is the same
> as
> what we are about to record, just update the timestamp. For update status,
> my
> view is we don't really care how many times the hook was run, but rather
> when
> was the last time it ran.
>

The problem is that it isn't the same as the "last" message. Going to the
original paste:

TIME                    TYPE    STATUS          MESSAGE
26 Dec 2015 13:51:59Z   agent   idle
26 Dec 2015 13:56:57Z   agent   executing       running update-status hook
26 Dec 2015 13:56:59Z   agent   idle
26 Dec 2015 14:01:57Z   agent   executing       running update-status hook
26 Dec 2015 14:01:59Z   agent   idle

Which means there is an "running update-status" *and* a "idle" message. So
we can't just say "is the last message == this message". It would have to
look deeper in history, and how deep should we be looking? what happens if
a given charm does one more "status-set" during its update-status hook to
set the status of the unit to "still happy". Then we would have 3. (agent
executing, unit happy, agent idle)


> Options to solve bug 2
>
> A.
> Allow a flag when setting status to say "this status value is transient"
> and so
> it is recorded in status but not logged in history.
>
> B.
> Do not record machine provisioning status in history. It could be argued
> this
> info is more or less transient and once the machine comes up, we don't
> care so
> much about it anymore. It was introduced to give observability to machine
> allocation.
>

Isn't this the same as (A)? We need a way to say that *this* message should
be showed but not saved forever. Or are you saying that until a machine
comes up as "running" we shouldn't save any of the messages? I don't think
we want that, because when provisioning fails you want to know what steps
were achieved.


>
> Any other options?
> Opinions on preferred solutions?
>
> I really want to get this fixed before Juju 2.0
>

We could do a "log level" rather than just "transient or not", and that
would decide what would get displayed by default. (so you can ask for
'update-status' messages but they wouldn't be shown by default). The
problem is that we want to keep status messages pruned at a sane level and
with 2 updates for every 'update-status' call history of 100 is only
100/2*5/60 ~ 4hours of history. If something interesting happened
yesterday, you're SOL.

What if we added a "interesting lifetime" to status messages. So the
status-set could indicate how long the message would be preserved?
"update-status" and "idle" could be flagged as preserved for only 1 hour,
and "dowloading %" could be flagged at say 5 minutes. Too complicated? It
certainly complicates the pruner (not terribly, when we record them we just
record an expire time that is indexed and the pruner just removes
everything that is over its expiry time.)

Alternatively we could have some sort of UUID for messages to indicate that
"this message is actually similar to other messages with this UUID" and we
prune them based on that. (UUIDs get flagged with a different number of
messages to keep than the global 100 for otherwise untagged messages.)

"Transient" is the easiest to understand, but doesn't really solve bug #1.

If we think of the "UUID" version as something like a named "status pocket"
maybe its actually tasteful. You'd have the "global" pocket that has our
default 100 most-recent-messages, and then you can create any new pocket
that has a default of say 10 messages. So you would be doing:
 status-set --pocket hook-execution update-status
 status-set --pocket download Downloading X% done

That also lets charms do nice things at hook execution time when they're
downloading large resources, without spamming the status-history log.

It does complicate the model....

John
=:->
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to