I did draw some sort of chart of NODE_DATA in wc-metadata.sql... [[[ For illustration, with a scenario like this:
# (0) svn rm foo svn cp ^/moo foo # (1) svn rm foo/bar touch foo/bar svn add foo/bar # (2) , these are the NODE_DATA for the path foo/bar (before single-db, the numbering of op_depth is still a bit different): (0) BASE_NODE -----> NODE_DATA (op_depth == 0) (1) NODE_DATA (op_depth == 1) ( <----_ ) (2) NODE_DATA (op_depth == 2) <----- WORKING_NODE 0 is the original data for foo/bar before 'svn rm foo' (if it existed). 1 is the data for foo/bar copied in from ^/moo/bar. (There would also be a WORKING_NODE for the path foo, with original_* pointing at ^/moo.) 2 is the to-be-committed data for foo/bar, created by 'svn add foo/bar'. An 'svn revert foo/bar' would remove the NODE_DATA of (2) (and possibly rewire the WORKING_NODE to represent a child of the operation (1)). So foo/bar would be a copy of ^/moo/bar again. ]]] So there's always at most one working node and at most one base node per path. While working node rows get "overwritten" with operations done on child paths, the intermediate node_data are kept until commit or revert... That's about all I know of it. Been thinking on those last_modtime and translated_size columns that are kept in both BASE_NODE and WORKING_NODE. I believe this duplication shows that detecting modifications in the local file system is a different concept from base/working. <purist>We should have a separate table for that instead of doing "is-there-a-working-node?-no?-then-look-in-the-base"</purist> <hacker>we do that kind of looking up anyway and having a little 'if' to select the proper modtime/size doesn't hurt much</hacker> (Am not having a strong opinion, just brainstorming...) On 2010-08-10 18:18, Julian Foad wrote: > Any responses would be greatly appreciated. I've had so many family and summer events in the past weeks... soon I'll start forgetting my passwords! ~Holiday-Neels > > - Julian > > > On Tue, 2010-08-03, Julian Foad wrote: >> On Mon, 2010-07-12, Erik Huelsmann wrote: >>> After lots of discussion regarding the way NODE_DATA/4th tree should >>> be working, I'm now ready to post a summary of the progress. In my >>> last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I >>> stated why we need this; this post is about the conclusion of what >>> needs to happen. Also included are the first steps there. >>> >>> >>> With the advent of NODE_DATA, we distinguish node values specifically >>> related to BASE nodes, those specifically related to "current" WORKING >>> nodes and those which are to be maintained for multiple levels of >>> WORKING nodes (not only the "current" view) (the latter category is >>> most often also shared with BASE). >>> >>> The respective tables will hold the columns shown below. >>> >>> >>> ------------------------- >>> TABLE WORKING_NODE ( >>> wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >>> local_relpath TEXT NOT NULL, >>> parent_relpath TEXT, >>> moved_here INTEGER, >>> moved_to TEXT, >>> original_repos_id INTEGER REFERENCES REPOSITORY (id), >>> original_repos_path TEXT, >>> original_revnum INTEGER, >>> translated_size INTEGER, >>> last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ >>> keep_local INTEGER, >>> >>> PRIMARY KEY (wc_id, local_relpath) >>> ); >>> >>> CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath); >>> -------------------------------- >>> >>> The moved_* and original_* columns are typical examples of "WORKING >>> fields only maintained for the visible WORKING nodes": the original_* >>> and moved_* fields are inherited from the operation root by all >>> children part of the operation. The operation root will be the visible >>> change on its own level, meaning it'll have rows both in the >>> WORKING_NODE and NODE_DATA tables. The fact that these columns are not >>> in the WORKING_NODE table means that tree changes are not preserved >>> accros overlapping changes. This is fully compatible with what we do >>> today: changes to higher levels destroy changes to lower levels. >>> >>> The translated_size and last_mod_time columns exist in WORKING_NODE >>> and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that >>> they exist in BASE_NODE is a bit of a hack: it's to prevent creation >>> of WORKING_NODE data for every file which has keyword expansion or eol >>> translation properties set: these columns serve only to optimize >>> working copy scanning for changes and as such only relate to the >>> visible WORKING_NODEs. >>> >> >> Can we come up with an English description of what each table will now >> represent? >> >> "The BASE_NODE table lists the existing node-revs in the repository that >> comprise the mixed-revision tree that was most recently updated/switched >> to or checked out. (The kind and content of these nodes is not here; >> see the NODE_DATA table.)" >> >>> TABLE BASE_NODE ( >>> wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >>> local_relpath TEXT NOT NULL, >>> repos_id INTEGER REFERENCES REPOSITORY (id), >>> repos_relpath TEXT, >> >> We need a revision number column here to go along with repos_id and >> relpath to make a valid node-rev reference, don't we? >> >>> parent_relpath TEXT, >> >> (While we're reorganising, can we move that "parent_relpath" column to >> adjacent to "local_relpath"?) >> >>> translated_size INTEGER, >>> last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ >>> dav_cache BLOB, >>> incomplete_children INTEGER, >>> file_external TEXT, >>> >>> PRIMARY KEY (wc_id, local_relpath) >>> ); >>> >> >> "The NODE_DATA table records the kind and shallow content (props, text, >> link target) of each node in the WC. It includes both the nodes that >> comprise the currently 'visible' (or 'actual' or 'on-disk') state of the >> WC and also all nodes that are part of a copied or moved tree but >> currently shadowed by a replacement performed inside that tree. >> >> At least one row exists for each WC path, including paths with no change >> and all paths affected by a tree change (add, delete, etc.). If the >> same path is affected by multiple levels of tree change - a replacement >> inside a copied directory, for example - then multiple rows exist with >> different 'op_depth' values." >> >>> TABLE NODE_DATA ( >>> wc_id INTEGER NOT NULL REFERENCES WCROOT (id), >>> local_relpath TEXT NOT NULL, >>> op_depth INTEGER NOT NULL, >>> presence TEXT NOT NULL, >>> kind TEXT NOT NULL, >>> checksum TEXT, >>> changed_rev INTEGER, >>> changed_date INTEGER, /* an APR date/time (usec since 1970) */ >>> changed_author TEXT, >> >> The changed_* columns can only belong to a node-rev that exists in the >> repository. What node-rev do they belong to and why aren't they >> alongside the node-rev details? >> >> (The changed_* columns convey essentially a rev number and two of the >> rev-props associated with that revnum that can be used in keyword >> expansions. We should consider representing that information in a more >> general form, both to avoid tying the DB format to the choice of those >> two particular revprops, and to avoid the redundancy of storing these >> same data and author values N times.) >> >> >>> depth TEXT, >>> symlink_target TEXT, >>> properties BLOB, >> >> (While we're rearranging, can we group the node-content fields together: >> kind, properties, checksum, symlink_target?) >> >>> PRIMARY KEY (wc_id, local_relpath, oproot) >> >> s/oproot/op_depth/? >> >>> ); >>> >>> CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath); >>> >>> >>> Which leaves the NODE_DATA structure above. The op_depth column >>> contains the depth of the node - relative to the wc root - on which >>> the operation was run which caused the creation of the given NODE_DATA >>> node. In the final scheme (based on single-db), the value will be 0 >>> for base and a positive integer for WORKING related data. >> >> Let's assume single-db. By the last sentence, I understand: For each >> BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0; >> for every node brought in by a tree operation (copy, move, add) to an >> immediate child of the WC root there is a NODE_DATA row with 'op_root' = >> 1; for every child of a child ... 2; and so on. >> >> >> - Julian >> >> >>> In order to be able to implement NODE_DATA even without having a fully >>> functional SINGLE_DB yet, a transitional node numbering scheme needs >>> to be devised. The following numbers will apply: BASE == 0, >>> WORKING-this-dir == 1, WORKING-any-immediate-child == 2. >>> >>> >>> Other transitioning related remarks: >>> >>> * Conditional-protected experimentational sections, just like with >>> SINGLE_DB >>> * Initial implementation will simply replace the current >>> functionality of the 2 tables, from there we can work our way through >>> whatever needs doing. >>> * Am I forgetting any others? >>> >>> Bye, >>> >>> Erik. >> >> > >
signature.asc
Description: OpenPGP digital signature