Crash Recovery in the Disk Cache

Nick Hurley Mon, 11 Jun 2012 14:42:56 -0700

All,

As part of our grand plans for the disk cache, the next big area we
are going to look at is crash recovery, specifically how to avoid
trashing the cache unless we're relatively certain that it's needed.


Right now, we have one dirty bit for the entire cache. This bit is set
when the disk cache is opened, and cleared when the disk cache is
closed. When we go to open the disk cache, if the bit is already set,
then we assume that the disk cache is corrupt, and trash the whole
thing and start from scratch. This approach is, to say the least,
probably a bit over cautious.

The plan is to get to the point where we can experience an unclean
shutdown and assume the disk cache is just fine until we have proof
otherwise. At the point when we determine something is amiss in the
cache, we would either delete the entire cache (as we do now), or (if
possible in whatever future scheme we come up with) delete just the
affected portion of the cache, leaving the rest intact.

So far, I've come up with 3 possible ways of having better crash
recovery. In no particular order, they are:

-Keep some sort of hash of contents per entry (similar to how Chrome
does things)
*This could conceivably allow us to only delete entries that are known corrupt
*This doesn't (necessarily) require any wild changes to the current
on-disk format (it could be kept as a metadata field on all entries)
*At first glance, the implementation seems relatively straightforward
(adding the standard caveats here about how nothing is ever nearly as
simple as one thinks, etc)

-Use some sort of journaling and/or append-only structure to store the
cache, with some markers for consistency per entry (or some other
subdivision of the data)
*The consistency markers are still undetermined
*There are a lot of open implementations of filesystems using these
kinds of structures, so lots of good reference material
*I have a sneaking suspicion that even with lots of good reference
material, this would not exactly be a simple implementation

-Use leveldb to store the cache
*We are still unsure if leveldb uses fsync to ensure consistency (this
could be a deal-breaker)
*Import of leveldb has already been started:
https://bugzilla.mozilla.org/show_bug.cgi?id=leveldb
*However, we're unsure of how much more effort would be required to
get leveldb fully imported before we could start using it

Of course, these are not the only pros/cons with any of these ideas,
just the main ones that come to mind as I'm writing this email.

What I would like everyone's input on is:

(1) More big pros/cons of the above ideas which may help swing the
decision one way or another
(2) More ideas on how to avoid trashing everything when we don't have
to (may be variations on the above, or other ideas that I haven't
thought of or have forgotten since the necko team discussed this in
person in February)

I also plan to point some known-smart people who may not subscribe to
this list at this post to try to get as many smart people thinking
about this as we can.

-Nick
_______________________________________________
dev-tech-network mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-network

Crash Recovery in the Disk Cache

Reply via email to