Re: [discuss] double data written to slog?

Andrew Kinney via illumos-discuss Mon, 17 Nov 2014 19:54:31 -0800


Richard Elling via illumos-discuss wrote:

- the data vdev (c1t5000CCA05C68D505d0) takes fewer bigger writes
(~116KiB average block size) totaling 100.29MiB
- clearly, it isn't doubling the data written to the slog, though
checksums create huge overhead on small writes
- the slog is much less efficient than the data vdev because of the
smaller IO, no IO aggregation, and extra checksums


Normally, there is aggregation, but since you don't see it, the
aggregation doesn't affect your load or
your load is not sufficient to cause aggregation. In particular, we
don't expect a single-threaded dd workload
to benefit from ZIL aggregation.

With 32 writers, fio showed that there is definitely aggregation thathappens when the workload gains some parallelism.

1. Why is 200MiB allocated in the slog for 100MiB of data? Shouldn't
it be bounded to data + checksums?


No. The space is pre-allocated so we don't have to wait on aggregation
in the critical path. So the
aggregation size is a guess. These guesses are divided into
zil_block_buckets. By default, for most
illumos-based distros, there is a 36KB bucket for (32KB + 4KB) which, in
theory, fits NFS workloads
(though it really doesn't). The next biggest bucket is max size, or
132KB (again, for most distros).
So for your 64KB blocks, we expect ZIL allocations to be 132KB, unless
we know more about the
distro.

In light of your comment and a brief read of the zil.c code, I think Iunderstand this better now. Thank you.

2. Can/should we change the 64KiB max block size for the slog to
better use high bandwidth slog devices?


If you'd like to experiment, you can change the zil_block_buckets array.
This can be done on a live
system using mdb for illumos-based distros. See the code around:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zil.c#897

With more writer threads via fio, I was able to get ~91% of manufacturerthroughput specs and ~79% of manufacturer IOPS specs out of the slogdevices. The remaining gap can probably be chalked up to sync commands,checksums, and related latency.

Finally, I've read repeatedly that slog devices are always queue depth
1 and I understand why. That said, with two slogs and two synchronous
writers to the pool, do we get qd=1 + qd=1 from slogs and writers
operating in parallel? With N slogs and N synchronous writers, do we
get N*(qd1) for total capability? Is there an upper bound to that
scaling mode (presuming it works that way) other than CPU time?


AFAIK, there is no fixed upper bound, but there might be a practical
limit lurking. I'm not aware of
anyone trying to identify such limits.

With the additional writer threads via fio, I was able to see thataggregate 4KiB random write IOPS to the slog devices increased onlyabout 6% when adding a second slog.

However, with 1MiB random writes, there was a 91% increase in throughputwhen adding a second slog, which is exceeding my expectations.

We'll probably scale by adding another storage box long before weencounter a need to add a third slog device, so it was more a point ofcuriosity than practical use for anything beyond two slogs.

The moral of the story is that slogs scale well for high concurrencyloads and exhibit some odd performance characteristics with lowconcurrency loads.

The allocations are still a bit hinky with low concurrency loads, butunderstanding that it is just a guess and that it probably gets betterwith high concurrency loads, I'm not too worried about it. We'll justaccount for that in how much slog space we allocate.


Overall, I'm going to chalk this up in the win column.

Thanks for helping me to understand what I was seeing.

Sincerely,
Andrew Kinney


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] double data written to slog?

Reply via email to