Re: [osol-discuss] WriteBack versus SSD-ZIL

Erik Trimble Fri, 05 Mar 2010 07:06:00 -0800

Edward Ned Harvey wrote:

In this email, when I say PERC, I really mean either a PERC, or anyother hardware WriteBack buffered raid controller with BBU.
For future server purchases, I want to know which is faster: (a) Abunch of hard disks with PERC and WriteBack enabled, or (b) A bunch ofhard disks, plus one SSD for ZIL. Unfortunately, I don’t have an SSDavailable for testing. So here is what I was able to do:
I measured the write speed of the naked disks (PERC set toWriteThrough). Results were around 350 ops/sec.
I measured the write speed with the WriteBack enabled. Results werearound 1250 ops/sec.
So right from the start, we can see there’s a huge performance boostby enabling the WriteBack. Even for large sequential writes, bufferingallows the disks to operate much more continuously. The next questionis how it compares against the SSD ZIL.
Since I don’t have an SSD available, I created a ram device and putthe ZIL there. This is not a measure of the speed if I had an SSD;rather, it is a measure which the SSD cannot possibly achieve. So itserves to establish an upper bound. If the upper and lower bounds arenear each other, then we have a good estimate of the speed with SSDZIL … but if the upper bound and lower bound are far apart, we haven’tgotten much information.
With the ZIL in RAM, results were around 1700 ops/sec.
This measure is very far from the lower bound. But it still serves toprovide some useful knowledge. The take-home knowledge is:
· There’s a lot of performance to be gained by accelerating the ZIL.Potentially up to 6x or 7x faster than naked disks.
· The WriteBack raid controller achieves a lot of this performanceincrease. About 3x or 4x acceleration.
· I don’t know how much an SSD would help. I don’t know if it’sbetter, the same, or worse than the PERC. I don’t know if thecombination of PERC and SSD together would go faster than either oneindividually.
I have a hypothesis. I think the best configuration will be to use aPERC, with WriteBack enabled on all the spindle hard drives, butinclude an SSD for ZIL, and set the PERC for WriteThrough on the SSD.This has yet to be proven or disproven.

Cache on controllers is almost always battery-backed DRAM (NVRAM havingmade an exit from the scene awhile ago). As such, it has fabulouslatency and throughput, compared to anything else. From a performancestandpoint, it will beat a SSD ZIL on a 1-for-1 basis. However, it'salmost never the case that you find a HBA cache even a fraction of thesize of a small SSD. So, what happens when you flood the HBA with moreI/O than the on-board cache can handle? It reduces performance back tothe level of NO cache.

From everything I've seen, an SSD wins simply because it's 20-100x thesize. HBAs almost never have more than 512MB of cache, and even fancySAN boxes generally have 1-2GB max. So, HBAs are subject to beingoverwhelmed with heavy I/O. The SSD ZIL has a much better chance ofbeing able to weather a heavy I/O period without being filled. Thus,SSDs are better at "average" performance - they provide a relativelysteady performance profile, whereas HBA cache is very spiky.

The other real advantage of SSD ZIL is that it covers the entire pool.Most larger pools spread their disks over multiple controllers, each ofwhich must have a cache in order for the whole pool to perform evenly.

If you know you're going to be doing very intermittent or modest levelof I/O, and that I/O is likely to fit within a HBA's cache, then it willoutperform an SSD. For a continuous heavy load, or for extremely spikyloads (which are considerably in excess of the HBA's cache), then an SSDwill win handily. The key here is the hard drives - the faster they are,the faster the HBA will be able to empty it's cache to disk, and thelower likelihood it will get overwhelmed by new I/O.

All that said, I'm pretty sure most I/O patterns heavily favor SSD ZILover HBA cache.

Note that ZIL applies only to Synchronous I/O, while HBA cache is forall writes. Also note that HBA cache can be used (or shared) for a readcache, as well, according to the HBA setup. And, a good (SLC) SSD canhandle 50,000 IOPS until it's filled. Which takes a very long timerelative to an HBA cache.



------

I've always wondered what the benefit (and difficulty to add to ZFS)would be to having an async write cache for ZFS - that is, ZFS currentlybuffers async writes in RAM, until it decides to aggregate enough ofthem to flush to disk. I think it would be interesting to see what wouldhappen if an async SSD cache was available, since the write pattern is"large, streaming", which means that the same devices useful for L2ARCwould perform well as an async write cache. In essence, use the asyncwrite SSD as an extra-large buffer, in the same way that L2ARC on SSDsupplements main memory for the read cache.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] WriteBack versus SSD-ZIL

Reply via email to