Edward Ned Harvey wrote:

In this email, when I say PERC, I really mean either a PERC, or any other hardware WriteBack buffered raid controller with BBU.

For future server purchases, I want to know which is faster: (a) A bunch of hard disks with PERC and WriteBack enabled, or (b) A bunch of hard disks, plus one SSD for ZIL. Unfortunately, I don’t have an SSD available for testing. So here is what I was able to do:

I measured the write speed of the naked disks (PERC set to WriteThrough). Results were around 350 ops/sec.

I measured the write speed with the WriteBack enabled. Results were around 1250 ops/sec.

So right from the start, we can see there’s a huge performance boost by enabling the WriteBack. Even for large sequential writes, buffering allows the disks to operate much more continuously. The next question is how it compares against the SSD ZIL.

Since I don’t have an SSD available, I created a ram device and put the ZIL there. This is not a measure of the speed if I had an SSD; rather, it is a measure which the SSD cannot possibly achieve. So it serves to establish an upper bound. If the upper and lower bounds are near each other, then we have a good estimate of the speed with SSD ZIL … but if the upper bound and lower bound are far apart, we haven’t gotten much information.

With the ZIL in RAM, results were around 1700 ops/sec.

This measure is very far from the lower bound. But it still serves to provide some useful knowledge. The take-home knowledge is:

· There’s a lot of performance to be gained by accelerating the ZIL. Potentially up to 6x or 7x faster than naked disks.

· The WriteBack raid controller achieves a lot of this performance increase. About 3x or 4x acceleration.

· I don’t know how much an SSD would help. I don’t know if it’s better, the same, or worse than the PERC. I don’t know if the combination of PERC and SSD together would go faster than either one individually.

I have a hypothesis. I think the best configuration will be to use a PERC, with WriteBack enabled on all the spindle hard drives, but include an SSD for ZIL, and set the PERC for WriteThrough on the SSD. This has yet to be proven or disproven.


Cache on controllers is almost always battery-backed DRAM (NVRAM having made an exit from the scene awhile ago). As such, it has fabulous latency and throughput, compared to anything else. From a performance standpoint, it will beat a SSD ZIL on a 1-for-1 basis. However, it's almost never the case that you find a HBA cache even a fraction of the size of a small SSD. So, what happens when you flood the HBA with more I/O than the on-board cache can handle? It reduces performance back to the level of NO cache.

From everything I've seen, an SSD wins simply because it's 20-100x the size. HBAs almost never have more than 512MB of cache, and even fancy SAN boxes generally have 1-2GB max. So, HBAs are subject to being overwhelmed with heavy I/O. The SSD ZIL has a much better chance of being able to weather a heavy I/O period without being filled. Thus, SSDs are better at "average" performance - they provide a relatively steady performance profile, whereas HBA cache is very spiky.

The other real advantage of SSD ZIL is that it covers the entire pool. Most larger pools spread their disks over multiple controllers, each of which must have a cache in order for the whole pool to perform evenly.

If you know you're going to be doing very intermittent or modest level of I/O, and that I/O is likely to fit within a HBA's cache, then it will outperform an SSD. For a continuous heavy load, or for extremely spiky loads (which are considerably in excess of the HBA's cache), then an SSD will win handily. The key here is the hard drives - the faster they are, the faster the HBA will be able to empty it's cache to disk, and the lower likelihood it will get overwhelmed by new I/O.

All that said, I'm pretty sure most I/O patterns heavily favor SSD ZIL over HBA cache.


Note that ZIL applies only to Synchronous I/O, while HBA cache is for all writes. Also note that HBA cache can be used (or shared) for a read cache, as well, according to the HBA setup. And, a good (SLC) SSD can handle 50,000 IOPS until it's filled. Which takes a very long time relative to an HBA cache.


------
I've always wondered what the benefit (and difficulty to add to ZFS) would be to having an async write cache for ZFS - that is, ZFS currently buffers async writes in RAM, until it decides to aggregate enough of them to flush to disk. I think it would be interesting to see what would happen if an async SSD cache was available, since the write pattern is "large, streaming", which means that the same devices useful for L2ARC would perform well as an async write cache. In essence, use the async write SSD as an extra-large buffer, in the same way that L2ARC on SSD supplements main memory for the read cache.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to