On Wed, Oct 24, 2007 at 06:05:48PM -0400, Bill Sommerfeld wrote: > > I'm curious about the failure semantics. > > - is everything on a cache device checksummed? where are the checksums > kept? host memory? indirect blocks on the cache device? both?
Checksummed in memory. > - Are the contents of the cache device retained across reboot or across > export/import cycles, or does it always start out empty/cold? It always starts out cold. We have talked about the possibility to storing it persistently on disk, but that is beyond the scope of the initial implementation. > - What happens if a cache device fails during operation? Requests for cached data go straight to disk, and performance will suffer. The current implementation doesn't integrate with the existing ZFS FMA framework, but this will be addressed later as part of future FMA updates. > - What happens if it's missing at boot or during an import? (It sounds > like we should be able to drive on with reduced performance). It will show up as faulted but it will not prevent the pool from being imported or used, identical to the way inactive spares behave today. > Other thoughts: > > 1) Applicability: I'd assume that cache devices only start to get > interesting when main memory is already maxed out or if cache storage is > fast enough but much cheaper per bit than main memory. Yes, this only makes sense if your working set size exceeds the size of the in-core ARC. As you point out, this may be limited by physical capacity or (more likely) price/performance. > 2) It seems like the properties that make a device suitable as a cache > device largely overlap with the properties that make a device suitable > as a dedicated intent log. While members of the system performance > hotrod association will as always want to tweak every available tunable, > it would be nice if an administrator didn't have to statically partition > a limited amount of expensive fast storage between cache and log if ZFS > could do a reasonable job of allocating space dynamically based on the > workload... This is not necessarily the case. Log device performance is based solely on write latency. Read performance doesn't matter. For a cache device, the most important attribute is read latency, though read bandwidth and write bandwidth do matter. For example, commodity flash SSDs are reasonably close to being usable as cache devices, but are a long way from being practical log devices. That being said, it is obviously possible to construct a device that would be suitable for both tasks. One could use NVRAM for your read cache, but it would be pretty expensive ;-) You can always partition such a device into two slices and allocate them as you see fit. Considering that the size of log devices will typically be quite small (<8G) and cache devices will typically be quite large (>128G) it's hard to imagine a case where squeezing that extra 8G out of the device would be worth the (incredibly large) added software complexity. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
