I'll work on reviewing this stuff. I believe there are quite a few details that need to be worked out (like exact presence values).
Cheers, -g On Tue, Aug 10, 2010 at 12:18, Julian Foad <julian.f...@wandisco.com> wrote: > Any responses would be greatly appreciated. > > - Julian > > > On Tue, 2010-08-03, Julian Foad wrote: >> On Mon, 2010-07-12, Erik Huelsmann wrote: >> > After lots of discussion regarding the way NODE_DATA/4th tree should >> > be working, I'm now ready to post a summary of the progress. In my >> > last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I >> > stated why we need this; this post is about the conclusion of what >> > needs to happen. Also included are the first steps there. >> > >> > >> > With the advent of NODE_DATA, we distinguish node values specifically >> > related to BASE nodes, those specifically related to "current" WORKING >> > nodes and those which are to be maintained for multiple levels of >> > WORKING nodes (not only the "current" view) (the latter category is >> > most often also shared with BASE). >> > >> > The respective tables will hold the columns shown below. >> > >> > >> > ------------------------- >> > TABLE WORKING_NODE ( >> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >> > local_relpath TEXT NOT NULL, >> > parent_relpath TEXT, >> > moved_here INTEGER, >> > moved_to TEXT, >> > original_repos_id INTEGER REFERENCES REPOSITORY (id), >> > original_repos_path TEXT, >> > original_revnum INTEGER, >> > translated_size INTEGER, >> > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ >> > keep_local INTEGER, >> > >> > PRIMARY KEY (wc_id, local_relpath) >> > ); >> > >> > CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath); >> > -------------------------------- >> > >> > The moved_* and original_* columns are typical examples of "WORKING >> > fields only maintained for the visible WORKING nodes": the original_* >> > and moved_* fields are inherited from the operation root by all >> > children part of the operation. The operation root will be the visible >> > change on its own level, meaning it'll have rows both in the >> > WORKING_NODE and NODE_DATA tables. The fact that these columns are not >> > in the WORKING_NODE table means that tree changes are not preserved >> > accros overlapping changes. This is fully compatible with what we do >> > today: changes to higher levels destroy changes to lower levels. >> > >> > The translated_size and last_mod_time columns exist in WORKING_NODE >> > and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that >> > they exist in BASE_NODE is a bit of a hack: it's to prevent creation >> > of WORKING_NODE data for every file which has keyword expansion or eol >> > translation properties set: these columns serve only to optimize >> > working copy scanning for changes and as such only relate to the >> > visible WORKING_NODEs. >> > >> >> Can we come up with an English description of what each table will now >> represent? >> >> "The BASE_NODE table lists the existing node-revs in the repository that >> comprise the mixed-revision tree that was most recently updated/switched >> to or checked out. (The kind and content of these nodes is not here; >> see the NODE_DATA table.)" >> >> > TABLE BASE_NODE ( >> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >> > local_relpath TEXT NOT NULL, >> > repos_id INTEGER REFERENCES REPOSITORY (id), >> > repos_relpath TEXT, >> >> We need a revision number column here to go along with repos_id and >> relpath to make a valid node-rev reference, don't we? >> >> > parent_relpath TEXT, >> >> (While we're reorganising, can we move that "parent_relpath" column to >> adjacent to "local_relpath"?) >> >> > translated_size INTEGER, >> > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ >> > dav_cache BLOB, >> > incomplete_children INTEGER, >> > file_external TEXT, >> > >> > PRIMARY KEY (wc_id, local_relpath) >> > ); >> > >> >> "The NODE_DATA table records the kind and shallow content (props, text, >> link target) of each node in the WC. It includes both the nodes that >> comprise the currently 'visible' (or 'actual' or 'on-disk') state of the >> WC and also all nodes that are part of a copied or moved tree but >> currently shadowed by a replacement performed inside that tree. >> >> At least one row exists for each WC path, including paths with no change >> and all paths affected by a tree change (add, delete, etc.). If the >> same path is affected by multiple levels of tree change - a replacement >> inside a copied directory, for example - then multiple rows exist with >> different 'op_depth' values." >> >> > TABLE NODE_DATA ( >> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >> > local_relpath TEXT NOT NULL, >> > op_depth INTEGER NOT NULL, >> > presence TEXT NOT NULL, >> > kind TEXT NOT NULL, >> > checksum TEXT, >> > changed_rev INTEGER, >> > changed_date INTEGER, /* an APR date/time (usec since 1970) */ >> > changed_author TEXT, >> >> The changed_* columns can only belong to a node-rev that exists in the >> repository. What node-rev do they belong to and why aren't they >> alongside the node-rev details? >> >> (The changed_* columns convey essentially a rev number and two of the >> rev-props associated with that revnum that can be used in keyword >> expansions. We should consider representing that information in a more >> general form, both to avoid tying the DB format to the choice of those >> two particular revprops, and to avoid the redundancy of storing these >> same data and author values N times.) >> >> >> > depth TEXT, >> > symlink_target TEXT, >> > properties BLOB, >> >> (While we're rearranging, can we group the node-content fields together: >> kind, properties, checksum, symlink_target?) >> >> > PRIMARY KEY (wc_id, local_relpath, oproot) >> >> s/oproot/op_depth/? >> >> > ); >> > >> > CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath); >> > >> > >> > Which leaves the NODE_DATA structure above. The op_depth column >> > contains the depth of the node - relative to the wc root - on which >> > the operation was run which caused the creation of the given NODE_DATA >> > node. In the final scheme (based on single-db), the value will be 0 >> > for base and a positive integer for WORKING related data. >> >> Let's assume single-db. By the last sentence, I understand: For each >> BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0; >> for every node brought in by a tree operation (copy, move, add) to an >> immediate child of the WC root there is a NODE_DATA row with 'op_root' = >> 1; for every child of a child ... 2; and so on. >> >> >> - Julian >> >> >> > In order to be able to implement NODE_DATA even without having a fully >> > functional SINGLE_DB yet, a transitional node numbering scheme needs >> > to be devised. The following numbers will apply: BASE == 0, >> > WORKING-this-dir == 1, WORKING-any-immediate-child == 2. >> > >> > >> > Other transitioning related remarks: >> > >> > * Conditional-protected experimentational sections, just like with >> > SINGLE_DB >> > * Initial implementation will simply replace the current >> > functionality of the 2 tables, from there we can work our way through >> > whatever needs doing. >> > * Am I forgetting any others? >> > >> > Bye, >> > >> > Erik. >> >> > > >