Re: [discuss] 4k log/cache ashift=12 SSD with 512B ashift=9 pool ?

Udo Grabowski (IMK) Tue, 20 Aug 2013 06:17:01 -0700

On 15/08/2013 15:34, Udo Grabowski (IMK) wrote:

Hi,
as a sidenote to the recent discussion on
  [developer] [zfs] zpool should not accept devices with a
                    larger (incompatible) ashift
we just had the case that adding a SSD (Sun F20) to an traditional
ashift=9 pool (40 disks, 8 x raidz1 (5 disks) vdevs stripe) results
in a 4k blocked ashift=12 SSD (oi_151a7) used as log and cache
(both 2 disks stripe). Machine is a Sun X4540 (48 2TB SATA disks on
6 LSI SAS1068E JBOD controllers), mpt driver. About 800 processes on 96
machines fetch and write data (in smaller 10k chunks) over a 10 Gbe
(ixgbe) network, plus a few local processes usually reading 10k files
with ~12 MB/s. This config worked with more than 100 MB/s sustained
inflow on opensolaris 2009.06 with an ashift=9 log/cache (the same F20,
two disks mirrored, disk is marked as battery-nvcache enabled in sd.conf).


The question is: Does this new configuration harm pool performance ?

What we saw the last two days was a heavy impact on write performance
that took especially zfs_dirent_unlock to a grinding halt, regardless
of delivering it over nfsv4 or locally (see below). We also saw long
times in txg_hold_open (incomplete txgs ?). Users could not write in
time, even simple remove of an empty directory took minutes. Nevertheless,
the zpool-* workers where reading and writing heavily, driving disks
to over 200 IOPs (cheaper desktop variants,not that powerfull..) constantly,
but mean read/write was only ~12-15MB/s. So a major drawback compared
to the good old OSOL 2009.06. Nothing peculiar otherwise, disks were
in even use, 8-45ms, no outliers, FMA entries, just unbelievably slow,
even the zpool iostats where hard to watch, one line per two secs
instead of 50 lines per sec....

Today, we removed log and cache, exported the pool, rebooted the
machine, reimported the pool, but left out log/cache. And, surprise
surprise, performance is back in the 100ths MB/s read/write, and
longest lockstat entries are down by a factor of ten from seconds to
tenths of seconds (see picked list below) !
...... [long lockstat statistics deleted]
......


Anyone any ideas ?

Experimented again:

 - If I export, boot and import with 4k log and cache (512B pool),
   things run smooth until some point (this time nearly 4 days of
   mostly continuous write load), then suddenly, write performance
   goes down and does not recover (a few MB/s on a 1 GB/s capable
   pool over 10gb/s net).

 - If I then remove log and cache, pool recovers after some time.

 - If I add back cache, performance drops again to low levels, removed,
   recovered again

 - If I add back 4k log without cache, pool gets speed again.

So as a workaround for that problem, I dropped the cache configured the
F20 in "logzilla" mode (4 dev stripe), this gives at least a decent NFS
write performance.

So it seems that a 4k cache attached to a 512B pool will lead to
a problem after a while, zpool iostat shows that the cache crawls
at a continuous 300-500k/s/dev and does not recover from that
workmode.

The interesting thing is that the trigger seemed to be a DNS
hickup, I got a couple these enigmatic TLI transport errors at the
same time the performance dropped:

Aug 20 02:14:24 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 791759 daemon.error] t_rcvrel(file descriptor 240/transport tcp) Resource temporarily unavailable

Aug 20 04:35:06 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 396295 daemon.error] t_rcvrel(file descriptor 252/transport tcp) TLI error 17

Aug 20 04:42:36 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 396295 daemon.error] t_rcvrel(file descriptor 240/transport tcp) TLI error 17

Aug 20 05:08:06 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 791759 daemon.error] t_rcvrel(file descriptor 246/transport tcp) Resource temporarily unavailable

Aug 20 05:14:30 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 396295 daemon.error] t_rcvrel(file descriptor 244/transport tcp) TLI error 17

Aug 20 05:43:04 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 396295 daemon.error] t_rcvrel(file descriptor 248/transport tcp) TLI error 17

Aug 20 07:08:24 imksunth4 /usr/lib/nfs/nfsd[1548]: [ID 396295 daemon.error] t_rcvrel(file descriptor 231/transport tcp) TLI error 17


Not sure if this was actually the trigger or a symptom of another
problem or just nothing but a pure coincidence...
--
Dr.Udo Grabowski    Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology            http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany  T:(+49)721 608-26026 F:-926026




-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [discuss] 4k log/cache ashift=12 SSD with 512B ashift=9 pool ?

Reply via email to