[email protected] wrote: > Author: cmpilato > Date: Tue Jun 22 15:40:00 2010 > New Revision: 956921 > > URL: http://svn.apache.org/viewvc?rev=956921&view=rev > Log: > Correct the FSFS structure documentation for lock storage, which > doesn't appear to match the implementation. > > If a repository has a locked file /A/D/G/rho, there will be a > serialized hash file for that path (as an MD5 digest, > ".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and > that digest file will be directly referenced from the digest files for > /A/D/G, /A/D, /A and /. The documentation implies that the digest > file for /A/D/G/rho will only be referenced by a digest file for > /A/D/G (which is then referenced by the digest file for /A/D, which > itself is referenced by the digest for /A, etc.)
By the way, I think this was an accident in the implementation. A reading of the code leads you to believe that the original intent was to essentially mirror the FS path structure. I think a single mistake (the failure to update a stringbuf_t with a new value on every iteration) resulted in the behavior we have today. I can't decide if this is a happy accident or a bug we should address. It actually seems to make some of the common queries much faster than they would otherwise be, but at the potential cost of disk usage and memory consumption. I mean, in a ginormous repository with 10,000 locked files, there's a serialized hash file (or maybe right many of them) with thousands of entries in it. Makes finding those thousands of entries really fast, after you've parsed the file and loaded that thousands-of-entries-having hash into memory. -- C. Michael Pilato <[email protected]> CollabNet <> www.collab.net <> Distributed Development On Demand

