Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
I read the whole thread and it looks like the write cache should always be disabled as in the worst case, the performance is the same(?). This is based on this discussion. I will test some WD4002FYYZ which don't mention "media cache". Kevin Am Di., 13. Nov. 2018 um 09:27 Uhr schrieb Виталий Филиппов < vita...@yourcmc.ru>: > This may be the explanation: > > > https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and > > Other manufacturers may have started to do the same, I suppose. > -- > With best regards, > Vitaliy Filippov___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Looks like it as the Toshiba drives I use have their own version of that it seems. So would explain the same kind of results. On Tue, 13 Nov 2018 at 4:26 PM, Виталий Филиппов wrote: > This may be the explanation: > > > https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and > > Other manufacturers may have started to do the same, I suppose. > -- > With best regards, > Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
This may be the explanation: https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and Other manufacturers may have started to do the same, I suppose. -- With best regards, Vitaliy Filippov___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Either more weird then, what drives is in the other cluster? Desktop Toshiba and Seagate Constellation 7200rpm As I understand by now the main impact is for SSD+HDD clusters. Enabled HDD write cache causes kernel to send flush requests for it (when write cache is disabled it doesn't bother about that) and probably it affects something else and causes some extra waits for SSD journal (although it's strange and looks like a bug to me). I tried to check latencies in `ceph daemon osd.xx perf dump` and both kv_commit_lat and commit_lat decreased ~10 times when I disabled HDD write cache (although both are SSD-related as I understand). Maybe your HDD are connected via some RAID controller and when you disable cache it doesn't really get disabled, but the kernels just stops to issue flush requests and makes some writes unsafe? -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Mixture of Toshiba drivers here all enterprise rated, cache 128 - 256MB I have tried turning the write cache on and off a few times across the cluster using hdparm, ever time can see a huge change from on (40ms average) to off (1-3ms average) Vitaliy what drives are you using? Maybe a particular brand / firmware? On Sun, Nov 11, 2018 at 8:54 PM Marc Roos wrote: > > WD Red here > > > > > > > -Original Message- > From: Ashley Merrick [mailto:singap...@amerrick.co.uk] > Sent: zondag 11 november 2018 13:47 > To: Vitaliy Filippov > Cc: Marc Roos; ceph-users > Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces > write latency 7 times > > Either more weird then, what drives is in the other cluster? > > On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov > wrote: > > > It seems no, I've just tested it on another small cluster with > HDDs > only - > no change > > > Does it make sense to test disabling this on hdd cluster only? > > -- > With best regards, >Vitaliy Filippov > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
WD Red here -Original Message- From: Ashley Merrick [mailto:singap...@amerrick.co.uk] Sent: zondag 11 november 2018 13:47 To: Vitaliy Filippov Cc: Marc Roos; ceph-users Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times Either more weird then, what drives is in the other cluster? On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov wrote: It seems no, I've just tested it on another small cluster with HDDs only - no change > Does it make sense to test disabling this on hdd cluster only? -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Either more weird then, what drives is in the other cluster? On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov wrote: > It seems no, I've just tested it on another small cluster with HDDs only > - > no change > > > Does it make sense to test disabling this on hdd cluster only? > > -- > With best regards, >Vitaliy Filippov > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
It seems no, I've just tested it on another small cluster with HDDs only - no change Does it make sense to test disabling this on hdd cluster only? -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
I just did very very short test and don’t see any difference with this cache on or off, so I am leaving it on for now. -Original Message- From: Ashley Merrick [mailto:singap...@amerrick.co.uk] Sent: zondag 11 november 2018 11:43 To: Marc Roos Cc: ceph-users; vitalif Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times Don’t have any SSD in the cluster to test. Also without knowing the exact reason why it being enabled has such a negative effect I wouldn’t be sure if also would be the same on SSD’s. On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos wrote: Does it make sense to test disabling this on hdd cluster only? -Original Message- From: Ashley Merrick [mailto:singap...@amerrick.co.uk] Sent: zondag 11 november 2018 6:24 To: vita...@yourcmc.ru Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times I've just worked out I had the same issue, been trying to work out the cause for the past few days! However I am using brand new enterprise Toshiba drivers with 256MB write cache, was seeing I/O wait peaks of 40% even during a small writing operation to CEPH and commit / apply latency's in the 40ms+. Just went through and disabled the write cache on each drive, and done a few tests with the exact same write performance, but I/O wait in the <1% and commit / apply latency's in the 1-3ms max. Something somewhere definitely doesn't seem to like the write cache being enabled on the disks, this is a EC Pool in the latest Mimic version. On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov wrote: Hi A weird thing happens in my test cluster made from desktop hardware. The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases single-thread write iops (reduces latency) 7 times! It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the fastest one though, average RTT according to flood-ping is 0.098ms). Ceph and OpenNebula are installed on the same hosts, OSDs are prepared with ceph-volume and bluestore with default options. SSDs have capacitors ('power-loss protection'), write cache is turned off for them since the very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of them is capable of delivering ~22000 iops in journal mode (fio -sync=1 -direct=1 -iodepth=1 -bs=4k -rw=write). However, RBD single-threaded random-write benchmark originally gave awful results - when testing with `fio -ioengine=libaio -size=10G -sync=1 -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=./testfile` from inside a VM, the result was only 58 iops average (17ms latency). This was not what I expected from the HDD+SSD setup. But today I tried to play with cache settings for data disks. And I was really surprised to discover that just disabling HDD write cache (hdparm -W 0 /dev/sdX for all HDD devices) increases single-threaded performance ~7 times! The result from the same VM (without even rebooting it) is iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms seems sort of an expected number. As I understand 4k writes are always deferred at the default setting of prefer_deferred_size_hdd=32768, this means they should only get written to the journal device before OSD acks the write operation. So my question is WHY? Why does HDD write cache affect commit latency with WAL on an SSD? I would also appreciate if anybody with similar setup (HDD+SSD with desktop SATA controllers or HBA) could test the same thing... -- With best r
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Don’t have any SSD in the cluster to test. Also without knowing the exact reason why it being enabled has such a negative effect I wouldn’t be sure if also would be the same on SSD’s. On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos wrote: > > > Does it make sense to test disabling this on hdd cluster only? > > > -Original Message- > From: Ashley Merrick [mailto:singap...@amerrick.co.uk] > Sent: zondag 11 november 2018 6:24 > To: vita...@yourcmc.ru > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces > write latency 7 times > > I've just worked out I had the same issue, been trying to work out the > cause for the past few days! > > However I am using brand new enterprise Toshiba drivers with 256MB write > cache, was seeing I/O wait peaks of 40% even during a small writing > operation to CEPH and commit / apply latency's in the 40ms+. > > Just went through and disabled the write cache on each drive, and done a > few tests with the exact same write performance, but I/O wait in the <1% > and commit / apply latency's in the 1-3ms max. > > Something somewhere definitely doesn't seem to like the write cache > being enabled on the disks, this is a EC Pool in the latest Mimic > version. > > On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov > wrote: > > > Hi > > A weird thing happens in my test cluster made from desktop > hardware. > > The command `for i in /dev/sd?; do hdparm -W 0 $i; done` > increases > > single-thread write iops (reduces latency) 7 times! > > It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs > + > 1x > SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for > block.db/wal in each host. Hosts are linked by 10gbit ethernet > (not > the > fastest one though, average RTT according to flood-ping is > 0.098ms). Ceph > and OpenNebula are installed on the same hosts, OSDs are prepared > with > ceph-volume and bluestore with default options. SSDs have > capacitors > ('power-loss protection'), write cache is turned off for them > since > the > very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each > of them > is capable of delivering ~22000 iops in journal mode (fio -sync=1 > -direct=1 -iodepth=1 -bs=4k -rw=write). > > However, RBD single-threaded random-write benchmark originally > gave > awful > results - when testing with `fio -ioengine=libaio -size=10G > -sync=1 > > -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 > -filename=./testfile` from inside a VM, the result was only 58 > iops > > average (17ms latency). This was not what I expected from the > HDD+SSD > setup. > > But today I tried to play with cache settings for data disks. And > I > was > really surprised to discover that just disabling HDD write cache > (hdparm > -W 0 /dev/sdX for all HDD devices) increases single-threaded > performance > ~7 times! The result from the same VM (without even rebooting it) > is > iops=405, avg lat=2.47ms. That's a magnitude faster and in fact > 2.5ms > seems sort of an expected number. > > As I understand 4k writes are always deferred at the default > setting of > prefer_deferred_size_hdd=32768, this means they should only get > written to > the journal device before OSD acks the write operation. > > So my question is WHY? Why does HDD write cache affect commit > latency with > WAL on an SSD? > > I would also appreciate if anybody with similar setup (HDD+SSD > with > > desktop SATA controllers or HBA) could test the same thing... > > -- > With best regards, >Vitaliy Filippov > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Does it make sense to test disabling this on hdd cluster only? -Original Message- From: Ashley Merrick [mailto:singap...@amerrick.co.uk] Sent: zondag 11 november 2018 6:24 To: vita...@yourcmc.ru Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times I've just worked out I had the same issue, been trying to work out the cause for the past few days! However I am using brand new enterprise Toshiba drivers with 256MB write cache, was seeing I/O wait peaks of 40% even during a small writing operation to CEPH and commit / apply latency's in the 40ms+. Just went through and disabled the write cache on each drive, and done a few tests with the exact same write performance, but I/O wait in the <1% and commit / apply latency's in the 1-3ms max. Something somewhere definitely doesn't seem to like the write cache being enabled on the disks, this is a EC Pool in the latest Mimic version. On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov wrote: Hi A weird thing happens in my test cluster made from desktop hardware. The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases single-thread write iops (reduces latency) 7 times! It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the fastest one though, average RTT according to flood-ping is 0.098ms). Ceph and OpenNebula are installed on the same hosts, OSDs are prepared with ceph-volume and bluestore with default options. SSDs have capacitors ('power-loss protection'), write cache is turned off for them since the very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of them is capable of delivering ~22000 iops in journal mode (fio -sync=1 -direct=1 -iodepth=1 -bs=4k -rw=write). However, RBD single-threaded random-write benchmark originally gave awful results - when testing with `fio -ioengine=libaio -size=10G -sync=1 -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=./testfile` from inside a VM, the result was only 58 iops average (17ms latency). This was not what I expected from the HDD+SSD setup. But today I tried to play with cache settings for data disks. And I was really surprised to discover that just disabling HDD write cache (hdparm -W 0 /dev/sdX for all HDD devices) increases single-threaded performance ~7 times! The result from the same VM (without even rebooting it) is iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms seems sort of an expected number. As I understand 4k writes are always deferred at the default setting of prefer_deferred_size_hdd=32768, this means they should only get written to the journal device before OSD acks the write operation. So my question is WHY? Why does HDD write cache affect commit latency with WAL on an SSD? I would also appreciate if anybody with similar setup (HDD+SSD with desktop SATA controllers or HBA) could test the same thing... -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
I've just worked out I had the same issue, been trying to work out the cause for the past few days! However I am using brand new enterprise Toshiba drivers with 256MB write cache, was seeing I/O wait peaks of 40% even during a small writing operation to CEPH and commit / apply latency's in the 40ms+. Just went through and disabled the write cache on each drive, and done a few tests with the exact same write performance, but I/O wait in the <1% and commit / apply latency's in the 1-3ms max. Something somewhere definitely doesn't seem to like the write cache being enabled on the disks, this is a EC Pool in the latest Mimic version. On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov wrote: > Hi > > A weird thing happens in my test cluster made from desktop hardware. > > The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases > single-thread write iops (reduces latency) 7 times! > > It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x > SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for > block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the > fastest one though, average RTT according to flood-ping is 0.098ms). Ceph > and OpenNebula are installed on the same hosts, OSDs are prepared with > ceph-volume and bluestore with default options. SSDs have capacitors > ('power-loss protection'), write cache is turned off for them since the > very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of > them > is capable of delivering ~22000 iops in journal mode (fio -sync=1 > -direct=1 -iodepth=1 -bs=4k -rw=write). > > However, RBD single-threaded random-write benchmark originally gave awful > results - when testing with `fio -ioengine=libaio -size=10G -sync=1 > -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 > -filename=./testfile` from inside a VM, the result was only 58 iops > average (17ms latency). This was not what I expected from the HDD+SSD > setup. > > But today I tried to play with cache settings for data disks. And I was > really surprised to discover that just disabling HDD write cache (hdparm > -W 0 /dev/sdX for all HDD devices) increases single-threaded performance > ~7 times! The result from the same VM (without even rebooting it) is > iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms > seems sort of an expected number. > > As I understand 4k writes are always deferred at the default setting of > prefer_deferred_size_hdd=32768, this means they should only get written > to > the journal device before OSD acks the write operation. > > So my question is WHY? Why does HDD write cache affect commit latency > with > WAL on an SSD? > > I would also appreciate if anybody with similar setup (HDD+SSD with > desktop SATA controllers or HBA) could test the same thing... > > -- > With best regards, >Vitaliy Filippov > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times
Hi A weird thing happens in my test cluster made from desktop hardware. The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases single-thread write iops (reduces latency) 7 times! It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the fastest one though, average RTT according to flood-ping is 0.098ms). Ceph and OpenNebula are installed on the same hosts, OSDs are prepared with ceph-volume and bluestore with default options. SSDs have capacitors ('power-loss protection'), write cache is turned off for them since the very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of them is capable of delivering ~22000 iops in journal mode (fio -sync=1 -direct=1 -iodepth=1 -bs=4k -rw=write). However, RBD single-threaded random-write benchmark originally gave awful results - when testing with `fio -ioengine=libaio -size=10G -sync=1 -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=./testfile` from inside a VM, the result was only 58 iops average (17ms latency). This was not what I expected from the HDD+SSD setup. But today I tried to play with cache settings for data disks. And I was really surprised to discover that just disabling HDD write cache (hdparm -W 0 /dev/sdX for all HDD devices) increases single-threaded performance ~7 times! The result from the same VM (without even rebooting it) is iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms seems sort of an expected number. As I understand 4k writes are always deferred at the default setting of prefer_deferred_size_hdd=32768, this means they should only get written to the journal device before OSD acks the write operation. So my question is WHY? Why does HDD write cache affect commit latency with WAL on an SSD? I would also appreciate if anybody with similar setup (HDD+SSD with desktop SATA controllers or HBA) could test the same thing... -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com