2007]

Eric Schrock Wed, 24 Oct 2007 15:52:47 -0700

On Wed, Oct 24, 2007 at 06:05:48PM -0400, Bill Sommerfeld wrote:
> 
> I'm curious about the failure semantics.
> 
> - is everything on a cache device checksummed?  where are the checksums
> kept?  host memory?  indirect blocks on the cache device?  both?


Checksummed in memory.

> - Are the contents of the cache device retained across reboot or across
> export/import cycles, or does it always start out empty/cold?

It always starts out cold.  We have talked about the possibility to
storing it persistently on disk, but that is beyond the scope of the
initial implementation.

> - What happens if a cache device fails during operation?  

Requests for cached data go straight to disk, and performance will
suffer.  The current implementation doesn't integrate with the existing
ZFS FMA framework, but this will be addressed later as part of future
FMA updates.

> - What happens if it's missing at boot or during an import?  (It sounds
> like we should be able to drive on with reduced performance).

It will show up as faulted but it will not prevent the pool from being
imported or used, identical to the way inactive spares behave today.

> Other thoughts:
> 
>  1) Applicability:  I'd assume that cache devices only start to get
> interesting when main memory is already maxed out or if cache storage is
> fast enough but much cheaper per bit than main memory.

Yes, this only makes sense if your working set size exceeds the size of
the in-core ARC.  As you point out, this may be limited by physical
capacity or (more likely) price/performance.

>  2) It seems like the properties that make a device suitable as a cache
> device largely overlap with the properties that make a device suitable
> as a dedicated intent log.  While members of the system performance
> hotrod association will as always want to tweak every available tunable,
> it would be nice if an administrator didn't have to statically partition
> a limited amount of expensive fast storage between cache and log if ZFS
> could do a reasonable job of allocating space dynamically based on the
> workload...

This is not necessarily the case.  Log device performance is based
solely on write latency.  Read performance doesn't matter.  For a cache
device, the most important attribute is read latency, though read
bandwidth and write bandwidth do matter.  For example, commodity flash
SSDs are reasonably close to being usable as cache devices, but are a
long way from being practical log devices.

That being said, it is obviously possible to construct a device that
would be suitable for both tasks.  One could use NVRAM for your read
cache, but it would be pretty expensive ;-)

You can always partition such a device into two slices and allocate them
as you see fit.  Considering that the size of log devices will typically
be quite small (<8G) and cache devices will typically be quite large
(>128G) it's hard to imagine a case where squeezing that extra 8G out of
the device would be worth the (incredibly large) added software
complexity.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

ZFS L2ARC [PSARC/2007/618 FastTrack timeout 10/31/2007]

Reply via email to