Philip Martin wrote: > Neels J Hofmeyr <ne...@elego.de> writes: > >> THE PRISTINE STORE >> ================== >> >> The pristine store is a local cache of complete content of files that are >> known to be in the repository. It is hashed by a checksum of that content >> (SHA1). > > I'm not sure whether you are planning one table per pristine store or > one table per working copy, but I think it's one per pristine store. > Obviously it doesn't makes no difference until pristine stores can be > shared (and it might be one per directory in the short term depending > on when stop being one database per directory).
Thanks for that. This is the tip of an iceberg called 'a pristine store does not equal a working copy [root]'. The question is how to store the PRISTINE table (also see below) once it serves various working copies. Will we have a separate SQLite db store, and create a new file system entity called 'pristine store' that the user can place anywhere, like a working copy? We could also keep pristine store and working copy welded together, so that one working copy can use the pristine store of another working copy, and that a 'pristine store' that isn't used as a working copy is just a --depth=empty checkout of any folder URL of that repository. It practically has the same effect as completely separating pristine stores from working copies (there is another SQLite store somewhere else), but we can just re-use the WC API, no need to have a separate pristine *store* API (create new store, contact local store database, indicate a store location, checking presence given a location, etc.). > >> SOME IMPLEMENTATION INSIGHTS >> ============================ >> >> There is a PRISTINE table in the SQLite database with columns >> (checksum, md5_checksum, size, refcount) >> >> The pristine contents are stored in the local filesystem in a pristine file, >> which may or may not be compressed (opaquely hidden behind the pristines >> API). >> The goal is to be able to have a pristine store per working copy, per user as >> well as system-wide, and to configure each working copy as to which pristine >> store(s) it should use for reading/writing. >> >> There is a canonical way of getting a given CHECKSUM's pristine file name for >> a given working copy without contacting the WC database (static function >> get_pristine_fname()). >> >> When interacting with the pristine store, we want to, as appropriate, check >> for (combos of): >> db-presence - presence in the PRISTINE table with noted file size > 0 >> file-presence - pristine file presence >> stat-match - PRISTINE table's size and mtime match file system >> checksum-match - validity of data in the file against the checksum >> >> file-presence is gotten for free from a successful stat-match (fstat), >> checksum-match (fopen) and unchecked read of the file (fopen). >> >> How fast we consider things: >> db-presence - very fast to moderately fast (in case of "empty db cache") >> file-presence - slow (fstat or fopen) >> stat-match - slow (fstat plus SQLite query) >> checksum-match - super slow (reading, checksumming) > > I'm prepared to believe a database query can be faster that stat when > the inode cache is cold, but what about when the inode cache is hot? Also thanks for this! I don't know that much about database/file system benchmarks, let alone on different platforms. My initial classifications are mostly guessing, mixed with provocative prodding to wake up more experienced devs ;) I'm also not really aware how expensive it is to calculate a checksum while reading a stream for other purposes. How much cpu time does it add if the file I/O would happen anyway? Is it neglectable? I guess we'll ultimately have to just try out what performs best. > If the database query requires even one system call then it could well > be slower. Multiple processes accessing a working copy, or writing to > the pristine store, might bias this further towards stat being faster, > If we decide to share the pristine store between several working > copies then a shared database could become a bottleneck. > > [...] > >> Use case "need": "I want to use this pristine's content, definitely." >> --------------- >> pseudocode: >> pristine_check(&present, checksum, _usable) (3) >> if !present: >> get_pristine_from_repos(checksum, ra) (9) >> pristine_read(&stream, checksum) (6) >> >> (3) check for _usable: >> - db-presence >> - if the checksum is not present in the table, return that it is not >> present (don't check for file existence as well). >> - stat-match (includes file-presence) >> - if the checksum is present in the table but file is bad/not there, >> bail, asking user to 'svn cleanup --pristines' (or sth.) >> >> (9) See use case "fetch". After this, either the pristine file is ready for >> reading, or "fetch" has bailed already. >> >> (6) fopen() > > > I think this is the most important case from a performance point of > view. This is what 'svn status' et al. use, and it's important for > GUIs as a lot of the "feel" depends on how fast a process can query > the metadata. Agreed. > If we were to do away with the PRISTINE table, then we would not have > to worry about it becoming a bottleneck. We don't need the existance > check if we are just about to open the file, since opening the file > proves that it exists. <rant>Yes, I meant that, semantically, there has to be an existence check. You're right that it is gotten for free from opening the file. It's still important to note where the antenna sits that detects non-existence.</rant> > We obviously have the checksum already, from > the BASE/WORKING table, so we only need the PRISTINE table for the > size/mtime. Perhaps we could store those in the BASE/WORKING table > and eliminate the PRISTINE table, or is this too much of a layering > violation? The pristine store is then just a sharded directory, into > which we move files and from which we read files. -1 While we could store size&mtime in the BASE/WORKING tables, this causes size and mtime to be stored multiple times (whereever a pristine is referenced) and involves editing multiple entries when a pristine is removed/added due to high-water-mark or repair. That would be nothing less than horrible. Taking one step away from that, each working copy should have a dedicated table that stores size and mtime only once. Then we still face the situation that size and mtime are stored multiple times (once per working copy), and where, if a central pristine store is restructured, every working copy has to be updated. Bad idea. Instead, we could not store size and mtime at all! :) They are merely half-checks for validity. During normal operation, size and mtime should never change, because we don't open write streams to pristines. If anyone messes with the pristine store accidentally, we would pick it up with the size, or if that stayed the same, with the mtime. But we can pick up all cases of bitswaps/disk failure *only* by verifying *full checksum validity*! So, while checking size and mtime gives a sense of basic sanity, it is really just a puny excuse for not checking full checksum validity. If we really care about correctness of pristines, *every* read of a pristine should verify the checksum along the way. (That would include to always read the complete pristine, even if just a few lines along the middle are needed) * neels dreams of disks that hardware-checksum on-the-fly If I further follow my dream of us emulating such hardware, we would store checksums for sub-chunks of each pristine, so that we can read small sections of pristines, being sure that the given section is correct without having to read the whole pristine. Whoa, look where you got me now! ;) I think it's a very valid question. Chuck the mtime and size, thus get rid of the PRISTINE table, thus do away with checking for any inconsistency between table and file system, also do away with possible database bottlenecks, and reduce the location of the pristine store to a mere local abspath. We have the checksum, we have the filename. Checking mtime and length protects against accidental editing of the pristine files. But any malicious or hw-failure corruption can in fact be *protected* by keeping mtime and length intact! ("hey, we checked it, it must be correct.") Let's play through a corrupted pristine (with unchanged mtime/length). This is just theoretical... Commit modification: - User makes a checkout / revert / update that uses a locally corrupted pristine. The corrupted pristine thus sits in the WC. - User makes a text mod - User commits - Client/network layer communicate the *delta* between the local pristine and the local mod to the repository, and the checksum of the modified text. - Repos applies the delta to the intact pristine it has in *its* store. - Repos finds the resulting checksum to be *different* from the client's checksum, because the underlying pristine was corrupt. --> Yay! No need to do *ANY* local verification at all!! Of course, in case the client/network layer decide to send the full text instead of a delta, the corruption is no longer detected. :( Merge and commit: - User makes a merge that uses a locally corrupted pristine. - The merge *delta* applied to the working copy is incorrect. - User does not note the corruption (e.g. via --accept=mine-full) - User commits - Repos accepts the changes based on the corrupted pristine that was used to get the merge delta, because it can't tell the difference from a normal modification. --> My goodness, merge needs to check pristine validity on each read, as if it wasn't slow enough. But as discussed above, even if merge checked mtime and length, it would not necessarily detect disk failure and crafted malicious corruption. Thanks, Philip. I'm now challenging the need to store mtime and length, and a need to do more checksumming instead. The checksumming overhead could be smaller than the database bottleneck slew. For future optimisation, I'm also suggesting pristines should have additionally stored checksums for small chunks of each pristine, while still being indexed by the full checksum. (Which may imply a db again :/ , but that db would only be hit if we're trying to save time by reading just a small bit of the pristine) Everyone, please prove me wrong! Thanks, ~Neels
signature.asc
Description: OpenPGP digital signature