Re: [discuss] double data written to slog?

Andrew Kinney via illumos-discuss Sun, 16 Nov 2014 15:43:21 -0800


Richard Elling via illumos-discuss wrote:

On Nov 14, 2014, at 5:44 PM, Andrew Kinney<[email protected]>  wrote:

Richard Elling via illumos-discuss wrote:

Is there a known reason why I'm seeing double writes to the slog? Am I alone or 
are others also seeing the same data amplification for sync writes with a slog?

You are seeing allocations, not the same thing as writes. zpool iostat is not 
the best tool
for understanding performance, for this and other reasons. What do you measure 
at the
device itself?


Fair enough. What would be the best way to measure the quantity of data 
actually written to the device?


Most commonly used is:
        iostat -x
or
        iostat -xn


test command:

dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deletemebs=1M oflag=sync count=100

100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.346287 s, 303 MB/s

For the interval in which the test was done (no other activity on thepool), "iostat -Inx 30" shows:


    r/i    w/i   kr/i   kw/i wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 rpool
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0 1600.0    0.0 108800.0  0.0  0.1    0.0    1.0   0   1 c5t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c5t1d0

0.0 884.0 0.0 102702.0 0.0 0.0 0.0 0.8 0 2c1t5000CCA05C68D505d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA05C681EB9d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B1007E5d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B10C085d0

    0.0 2480.0    0.0 211502.0  4.7  0.1   56.8    0.9   2   3 testpool

Note that this is total for the interval, not per second.
---------------------------------------------------------

test command:

dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deletemebs=1M oflag=sync count=100

100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.2314 s, 453 MB/s

After adding a second identical slog (c5t1d0, not mirrored) andrepeating the test, "iostat -Inx 30" shows:


    r/i    w/i   kr/i   kw/i wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 rpool
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0  800.0    0.0 54400.0  0.0  0.0    0.0    0.5   0   0 c5t0d0
    0.0  800.0    0.0 54400.0  0.0  0.0    0.0    0.5   0   0 c5t1d0

0.0 896.0 0.0 102736.5 0.0 0.0 0.0 0.9 0 3c1t5000CCA05C68D505d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA05C681EB9d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B1007E5d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B10C085d0

    0.0 2492.0    0.0 211536.5  4.5  0.1   53.8    0.7   2   3 testpool

---------------------------------------------------------

test command

dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deletemebs=4K oflag=sync count=25600

25600+0 records in
25600+0 records out
104857600 bytes (105 MB) copied, 2.64324 s, 39.7 MB/s

In the degenerate case of 4KiB sync writes, we do get twice the datawritten to the slogs, but only because of checksums:


    r/i    w/i   kr/i   kw/i wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 rpool
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0 12800.0    0.0 102400.0  0.0  0.0    0.0    0.0   0   1 c5t0d0
    0.0 12800.0    0.0 102400.0  0.0  0.0    0.0    0.0   0   1 c5t1d0

0.0 942.0 0.0 103493.0 0.0 0.0 0.0 0.7 0 2c1t5000CCA05C68D505d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA05C681EB9d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B1007E5d00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0c1t5000CCA03B10C085d0

    0.0 26538.0    0.0 308293.0  4.3  0.1    4.9    0.1   2   6 testpool

---------------------------------------------------------

Some interesting data here:

- for the original test, the log device (c5t0d0) takes 1600 68KiBwrites (64KiB data, 4KiB checksum?) totaling 106.25MiB- the data vdev (c1t5000CCA05C68D505d0) takes fewer bigger writes(~116KiB average block size) totaling 100.29MiB- clearly, it isn't doubling the data written to the slog, thoughchecksums create huge overhead on small writes- the slog is much less efficient than the data vdev because of thesmaller IO, no IO aggregation, and extra checksums

While this does assuage my initial concern about writing the data to theslog twice, it does raise a couple questions:

1. Why is 200MiB allocated in the slog for 100MiB of data? Shouldn't itbe bounded to data + checksums?2. Can/should we change the 64KiB max block size for the slog to betteruse high bandwidth slog devices?

We expect most of our synchronous writes will be under 64KiB, but theseparticular devices really hit their stride at 1MiB blocks so it would benice if we could make that maximum IO size to the slog 1MiB. For therare synchronous IO above 64KiB, it would improve performance andefficiency.

Finally, I've read repeatedly that slog devices are always queue depth 1and I understand why. That said, with two slogs and two synchronouswriters to the pool, do we get qd=1 + qd=1 from slogs and writersoperating in parallel? With N slogs and N synchronous writers, do we getN*(qd1) for total capability? Is there an upper bound to that scalingmode (presuming it works that way) other than CPU time?


Sincerely,
Andrew Kinney




-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] double data written to slog?

Reply via email to