Re: priocscan vs fcfs
On Thu, Dec 03, 2015 at 07:51:28PM -0500, Thor Lancelot Simon wrote: > > What strategy are you using to sort inside RAIDframe? This is a property > of the RAID set; you can see it with raidctl I believe for autoconfigured > sets. > > I wouldn't be terribly surprised to see bad interactions between > priocscan and RAIDframe's scan or cscan strategies. What about the "fifo" > strategy? It has been the fifo queuing method (queue size: 100) since 2010, when I created the set (based on the NetBSD RAIDframe guide). > It's important, though, to understand that there are *very* few pure bulk > read applications in this world (except single-stream video) and very > few pure bulk write applications except database logs. That means that > single-stream pure read or write tests are really pretty awful predictors > of disk performance for real workloads. I also tried a 5-stream dd test, based on one of your previous mails: https://mail-index.netbsd.org/netbsd-users/2014/12/01/msg015503.html All five streams got ~10MB/s each (bs=16k), more or less consistent with the bonnie++ output. I should retry that with priocscan. > The "priocscan" strategy, in particular, limits pure read/pure write > performance *by design* in order to achieve lower latency under real > world mixed workloads. Intuitively, I'd always expect some penalty by any kind of queueing, but for a simple sequential write, the numbers are just way off. These disks can sustain 100MB/s seq. reads and I assume it's not much worse when writing sequentially (unfortunately I can't test that now).. so 35MB/s seems broken and even 55MB/s kind of low. signature.asc Description: PGP signature
Re: priocscan vs fcfs
On Fri, Dec 04, 2015 at 10:33:15AM +, Stephen Borrill wrote: > > Am I missing something here? Your figures suggest that Input (i.e. reading) > is pretty much the same, but it is Output (i.e. writing) that has higher > throughput You're right, I mixed it up. Write throughput is what I meant. signature.asc Description: PGP signature
Re: priocscan vs fcfs
On Thu, 3 Dec 2015, Petar Bogdanovic wrote: On Wed, Dec 02, 2015 at 08:23:28PM +, Michael van Elst wrote: That's probably why setting the queues all to fcfs is the best for you. Not as dramatic as Emile's numbers but significantly higher read throughput: Am I missing something here? Your figures suggest that Input (i.e. reading) is pretty much the same, but it is Output (i.e. writing) that has higher throughput # for i in disksort fcfs priocscan; do for j in wd0 wd1 raid0; do dkctl $j strategy $i; done; bonnie++ -d /tmp -m $i -s 4g -n 0 -u build -f -D; done /dev/rwd0d: disksort -> disksort /dev/rwd1d: disksort -> disksort /dev/rraid0d: disksort -> disksort (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP disksort 4G 33550 13 25694 8 64922 13 238.2 11 Latency1023ms 309ms 161ms4111ms /dev/rwd0d: disksort -> fcfs /dev/rwd1d: disksort -> fcfs /dev/rraid0d: disksort -> fcfs (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP fcfs 4G 55254 22 27584 9 64438 13 244.5 9 Latency 844ms 331ms 233ms6095ms /dev/rwd0d: fcfs -> priocscan /dev/rwd1d: fcfs -> priocscan /dev/rraid0d: fcfs -> priocscan (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP priocscan4G 32649 13 17028 6 60478 13 247.6 8 Latency1983ms1167ms 236ms6198ms That's on a simple RAID1 setup: # dmesg | grep SAM wd0: wd1: # mount | grep raid /dev/raid0a on / type ffs (log, NFS exported, local)
Re: priocscan vs fcfs
On Fri, Dec 04, 2015 at 12:13:06PM +0100, Petar Bogdanovic wrote: > > I also tried a 5-stream dd test, based on one of your previous mails: > > https://mail-index.netbsd.org/netbsd-users/2014/12/01/msg015503.html > > All five streams got ~10MB/s each (bs=16k), more or less consistent with > the bonnie++ output. > > I should retry that with priocscan. ~5.5MB/s per stream: $ for dev in $(sysctl -n hw.disknames); do sudo dkctl $dev strategy priocscan; done /dev/rcd0d: fcfs -> priocscan /dev/rwd0d: fcfs -> priocscan /dev/rwd1d: fcfs -> priocscan /dev/rraid0d: fcfs -> priocscan $ date ; for i in 0 1 2 3 4; do dd if=/dev/zero of=test$i bs=16k count=1048576 ; wait ; date Fri Dec 4 12:19:07 CET 2015 [1] 9484 [2] 9716 [3] 6843 [4] 9129 [5] 9250 1048576+0 records in 1048576+0 records out 17179869184 bytes transferred in 3047.215 secs (5637892 bytes/sec) 1048576+0 records in 1048576+0 records out 17179869184 bytes transferred in 3069.532 secs (5596901 bytes/sec) 1048576+0 records in 1048576+0 records out 17179869184 bytes transferred in 3082.440 secs (5573464 bytes/sec) 1048576+0 records in 1048576+0 records out 17179869184 bytes transferred in 3092.244 secs (793 bytes/sec) 1048576+0 records in 1048576+0 records out 17179869184 bytes transferred in 3103.526 secs (5535596 bytes/sec) Fri Dec 4 13:10:50 CET 2015 signature.asc Description: PGP signature
Re: priocscan vs fcfs
On Wed, Dec 02, 2015 at 08:23:28PM +, Michael van Elst wrote: > > That's probably why setting the queues all to fcfs is the best > for you. Not as dramatic as Emile's numbers but significantly higher read throughput: # for i in disksort fcfs priocscan; do for j in wd0 wd1 raid0; do dkctl $j strategy $i; done; bonnie++ -d /tmp -m $i -s 4g -n 0 -u build -f -D; done /dev/rwd0d: disksort -> disksort /dev/rwd1d: disksort -> disksort /dev/rraid0d: disksort -> disksort (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP disksort 4G 33550 13 25694 8 64922 13 238.2 11 Latency1023ms 309ms 161ms 4111ms /dev/rwd0d: disksort -> fcfs /dev/rwd1d: disksort -> fcfs /dev/rraid0d: disksort -> fcfs (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP fcfs 4G 55254 22 27584 9 64438 13 244.5 9 Latency 844ms 331ms 233ms 6095ms /dev/rwd0d: fcfs -> priocscan /dev/rwd1d: fcfs -> priocscan /dev/rraid0d: fcfs -> priocscan (...) Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP priocscan4G 32649 13 17028 6 60478 13 247.6 8 Latency1983ms1167ms 236ms 6198ms That's on a simple RAID1 setup: # dmesg | grep SAM wd0: wd1: # mount | grep raid /dev/raid0a on / type ffs (log, NFS exported, local) signature.asc Description: PGP signature
Re: priocscan vs fcfs
On Thu, Dec 03, 2015 at 07:43:10PM +0100, Petar Bogdanovic wrote: > On Wed, Dec 02, 2015 at 08:23:28PM +, Michael van Elst wrote: > > > > That's probably why setting the queues all to fcfs is the best > > for you. > > Not as dramatic as Emile's numbers but significantly higher read > throughput: What strategy are you using to sort inside RAIDframe? This is a property of the RAID set; you can see it with raidctl I believe for autoconfigured sets. I wouldn't be terribly surprised to see bad interactions between priocscan and RAIDframe's scan or cscan strategies. What about the "fifo" strategy? It's important, though, to understand that there are *very* few pure bulk read applications in this world (except single-stream video) and very few pure bulk write applications except database logs. That means that single-stream pure read or write tests are really pretty awful predictors of disk performance for real workloads. The "priocscan" strategy, in particular, limits pure read/pure write performance *by design* in order to achieve lower latency under real world mixed workloads. I would not be so quick to discard it -- though I would not use it on something like an SSD, and I am skeptical it would perform well when stacked on one of RAIDframe's elevator-sort variants. Thor
priocscan vs fcfs
Hi, I've never been really happy with my NetBSD RAIDframe NAS, never really got the speed I was supposed to even with the right alignment / raid layout etc. Today I dug into `dkctl(8)' while searching if cache was enabled for read and write, and I came across the "strategy" command. Long story short, changing from priocscan to fcfs strategy multiplied my NAS's write speed by 6! I changed the strategy for all disk members: # dkctl wd0 strategy fcfs # dkctl wd1 strategy fcfs # dkctl wd2 strategy fcfs and also for the RAIDframe: # dkctl raid0 strategy fcfs /dev/rraid0d: priocscan -> fcfs as changing it only for the disk members was apparently counter-productive. And there we go, from a 40/50MB/s write average to a stunning 200 to 300MB/s, which is more like what the disks can theroically do. Could anyone with some background on these strategies explain what's behind the curtain? I couldn't really find precise documentation on this matter... Thanks, Emile `iMil' Heitor *_ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \
Re: priocscan vs fcfs
i...@home.imil.net ("Emile `iMil' Heitor") writes: >as changing it only for the disk members was apparently counter-productive. >And there we go, from a 40/50MB/s write average to a stunning 200 to 300MB/s, >which is more like what the disks can theroically do. >Could anyone with some background on these strategies explain what's behind the >curtain? I couldn't really find precise documentation on this matter... disksort - execute requests in order of increasing block numbers, then continue with the lowest block number. I.e. do a one-way scan over the disk. fcfs - execute requests in the order they were issued. priocscan - The filesystem tags each I/O request (buffer) with a priority. Priorities are mapped to multiple queues, the highest priority queue is executed first, each queue is executed in block number order up to a fixed number of requests (burst) to prevent the lower priority queues from starving. "block number order" can be just "cylinder number order" depending on the disk driver. buffer priority is time-noncritical (low), time-limited (medium) and time-critical (high). Usually synchronous operations and the journal are time-critical, everything else is time-limited and time-noncritical isn't used. Accessing the raw device is also time-critical. So, disksort is the traditional method for filesystems on dumb disk devices. fcfs can be good for smart disks with caches and their own queuing and for streaming a raw disk. Priocscan tries to optimize for concurrent filesystem accesses better than disksort. fcfs is also the "neutral" queue for drivers stacking on top of each other. The queue sorting should really only be done at one level. But raidframe is more complicated because it does its own queuing and sorting outside of this schema, in particular when it has to read-modify-write stripe sets for small I/O. That's probably why setting the queues all to fcfs is the best for you. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: priocscan vs fcfs
On Wed, 2 Dec 2015, Michael van Elst wrote: fcfs is also the "neutral" queue for drivers stacking on top of each other. The queue sorting should really only be done at one level. But raidframe is more complicated because it does its own queuing and sorting outside of this schema, in particular when it has to read-modify-write stripe sets for small I/O. That's probably why setting the queues all to fcfs is the best for you. Thanks a lot for this clear analysis Michael. I can now confirm the results I've witnessed earlier, I've ran a couple of benchmarks, including bonnie++ and iozone, the latter shows a ratio of x5 in favor of the fsfc strategy for every type of operation. For those interested, iozone spreadsheet output is available here (OOo / LibreOffice): https://home.imil.net/tmp/coruscant-iozone-priocscan.ods https://home.imil.net/tmp/coruscant-iozone-fsfc.ods For each subset, first column is the amount of data written (from 64K to 4M) and first row is the block size. Emile `iMil' Heitor *_ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \