On Mon, Dec 3, 2018 at 7:31 PM Kody Kantor <kody.kan...@joyent.com> wrote:

> Hi folks,
>
> illumos-joyent includes per-zone VFS kstats that emulate a histogram by
> aggregating the latency of individual VFS operations into buckets.
>

Interesting! Thanks for putting this together.


> Here's a sample of what that looks like:
>
> [root@coke ~]# kstat -m zone_vfs
> module: zone_vfs                        instance: 0
> name:   global                          class:    zone_vfs
>         100ms_ops                       164
>         10ms_ops                        1734
>         10s_ops                         0
>         1s_ops                          23
>         ...
>         zonename                        global
>
> Using this kstat we can pretty easily see that there were 23 VFS
> operations issued by the global zone that took at least one second but not
> more than 10 seconds, for example. This seems like a useful pattern, and
> one that could be used in many places in the kernel. For those curious,
> this was implemented as part of OS-338 (https://smartos.org/bugview/OS-338,
> illumos-joyent commit 278c9a7a00cbf7a53131a08ae5e86c28de7d2c8e).
>
> If we'd like to use the zone_vfs module's histogram pattern elsewhere in
> the kernel we currently have to copy most of the code used by the zone_vfs
> kstat module. I'd like to add a minimal histogram abstraction to the kstat
> framework so that other kernel systems can easily create and use histograms
> to track latency of longer-running operations (in the order of tens to
> hundreds of milliseconds).
>
> Specifically, a kernel component would call a function like `(kstat_t *)
> kstat_histogram_create(module, instance, name)` which would create a kstat
> that has the same latency buckets as zone_vfs.
>

This is quite limiting. Generally:

 - Not all cases will have the same latency buckets
 - Not all cases will have the same number of buckets
 - Not all cases will be based on latency

On the last point, I can imagine some other use cases:

1. Distribution of network packet sizes. (I've seen network cards where the
hardware provides this information - eg bge.)

2. Distribution of I/O sizes

3. And I even wonder whether the kmem_alloc kstats might be better thought
of
as a set of per-operation kstats with histograms by size, rather than as
per-size
kstats each with a list of operations.

Then the kernel component would call `(void) kstat_histogram_observe(kstat,
> hrtime)` to increment the relevant latency buckets for the given kstat
> histogram and duration.
>

Would you ever want to increase the counter by a number other than 1?

Would you ever want to replace the counters wholesale?

What happens if you were to pass a random kstat (one that isn't a
histogram) in as the
first argument?

More generally, why put the binning logic into the kstat framework rather
than
devolve it to the individual module?


> Here's a simplified example of what I had in mind:
>
> kstat_t *mystat = kstat_histogram_create("testmodule", 0,
> "my_latency_stat");
> if (mystat != NULL) {
>     kstat_install(mystat);
> }
>
> htime_t start = gethrtime();
> do_work(); // a longer-running task
> hrtime_t duration = gethrtime() - start;
> kstat_histogram_observe(mystat, duration);
>
> which could look something like this on the front-end after a few rounds
> of calling do_work():
>
> $ kstat -m testmodule
> module: testmodule                      instance: 0
> name:   my_latency_stat                 class:    hist
> bucket_10ms                     3
> bucket_100ms                    2
> bucket_1s                       1
> bucket_10s                      0
> total_ms                        1210
> count                           3
> crtime                          138.923698981
> snaptime                        144.456348556
>

Looking at that, the data includes not just the buckets, but also ancillary
data
(the total_ms and count fields). As someone who writes tools to consume
this data, that implies a lot of additional logic, rather than just being
able to assume
that every data item is a bucket.

Furthermore, changing units in the names (from ms to s here) is also
undesirable -
it would be better if the field names sorted naturally, which is easiest if
the units
stay the same.

You've used "hist" as the class here. Generally class would be used to
select items
by logical functionality, not data layout.


> This would be more abstract than what's currently included in the kstat
> framework. My intent is that we could make something that's easy to use for
> the majority of users in the kernel. The above approach would trade
> flexibility for simplicity. Maybe there's a way to retain more flexibility
> without adding too much complexity.
>

I think the fundamental thing here is whether this is just a convenience
wrapper
for injecting data into what is otherwise a standard KSTAT_TYPE_NAMED kstat,
or whether there's something more fundamental - possibly even a new kstat
type,
that would be useful.


> The initial use case I'm considering for this would be to add a few
> histogram kstats for longer-running ZFS operations, like spa_sync,
> dmu_tx_delay, and zil_commit. For this use case I think the above design
> works well, but there are definitely drawbacks to such a simple design.
>
> Thoughts on this?
>

As a final thought, if you were to go with your initial design, then using
the name
"histogram" is inappropriate, as what you have is far more specific.

Thanks again,

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T0d45cb43ed98ec4c-Mbce26a92057d1081921ce072
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to