All, As part of our grand plans for the disk cache, the next big area we are going to look at is crash recovery, specifically how to avoid trashing the cache unless we're relatively certain that it's needed.
Right now, we have one dirty bit for the entire cache. This bit is set when the disk cache is opened, and cleared when the disk cache is closed. When we go to open the disk cache, if the bit is already set, then we assume that the disk cache is corrupt, and trash the whole thing and start from scratch. This approach is, to say the least, probably a bit over cautious. The plan is to get to the point where we can experience an unclean shutdown and assume the disk cache is just fine until we have proof otherwise. At the point when we determine something is amiss in the cache, we would either delete the entire cache (as we do now), or (if possible in whatever future scheme we come up with) delete just the affected portion of the cache, leaving the rest intact. So far, I've come up with 3 possible ways of having better crash recovery. In no particular order, they are: -Keep some sort of hash of contents per entry (similar to how Chrome does things) *This could conceivably allow us to only delete entries that are known corrupt *This doesn't (necessarily) require any wild changes to the current on-disk format (it could be kept as a metadata field on all entries) *At first glance, the implementation seems relatively straightforward (adding the standard caveats here about how nothing is ever nearly as simple as one thinks, etc) -Use some sort of journaling and/or append-only structure to store the cache, with some markers for consistency per entry (or some other subdivision of the data) *The consistency markers are still undetermined *There are a lot of open implementations of filesystems using these kinds of structures, so lots of good reference material *I have a sneaking suspicion that even with lots of good reference material, this would not exactly be a simple implementation -Use leveldb to store the cache *We are still unsure if leveldb uses fsync to ensure consistency (this could be a deal-breaker) *Import of leveldb has already been started: https://bugzilla.mozilla.org/show_bug.cgi?id=leveldb *However, we're unsure of how much more effort would be required to get leveldb fully imported before we could start using it Of course, these are not the only pros/cons with any of these ideas, just the main ones that come to mind as I'm writing this email. What I would like everyone's input on is: (1) More big pros/cons of the above ideas which may help swing the decision one way or another (2) More ideas on how to avoid trashing everything when we don't have to (may be variations on the above, or other ideas that I haven't thought of or have forgotten since the necko team discussed this in person in February) I also plan to point some known-smart people who may not subscribe to this list at this post to try to get as many smart people thinking about this as we can. -Nick _______________________________________________ dev-tech-network mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-network
