Re: NODE_DATA (2nd iteration)

Neels J Hofmeyr Thu, 12 Aug 2010 05:29:23 -0700

I did draw some sort of chart of NODE_DATA in wc-metadata.sql...

[[[
   For illustration, with a scenario like this:


     # (0)
     svn rm foo
     svn cp ^/moo foo   # (1)
     svn rm foo/bar
     touch foo/bar
     svn add foo/bar    # (2)

   , these are the NODE_DATA for the path foo/bar (before single-db, the
   numbering of op_depth is still a bit different):

   (0)  BASE_NODE ----->  NODE_DATA (op_depth == 0)
   (1)                    NODE_DATA (op_depth == 1) ( <----_ )
   (2)                    NODE_DATA (op_depth == 2)   <----- WORKING_NODE

   0 is the original data for foo/bar before 'svn rm foo' (if it existed).
   1 is the data for foo/bar copied in from ^/moo/bar. (There would also be
     a WORKING_NODE for the path foo, with original_* pointing at ^/moo.)
   2 is the to-be-committed data for foo/bar, created by 'svn add foo/bar'.

   An 'svn revert foo/bar' would remove the NODE_DATA of (2) (and possibly
   rewire the WORKING_NODE to represent a child of the operation (1)).
   So foo/bar would be a copy of ^/moo/bar again.
]]]

So there's always at most one working node and at most one base node per
path. While working node rows get "overwritten" with operations done on
child paths, the intermediate node_data are kept until commit or revert...

That's about all I know of it.

Been thinking on those last_modtime and translated_size columns that are
kept in both BASE_NODE and WORKING_NODE. I believe this duplication shows
that detecting modifications in the local file system is a different concept
from base/working. <purist>We should have a separate table for that instead
of doing "is-there-a-working-node?-no?-then-look-in-the-base"</purist>
<hacker>we do that kind of looking up anyway and having a little 'if' to
select the proper modtime/size doesn't hurt much</hacker>

(Am not having a strong opinion, just brainstorming...)


On 2010-08-10 18:18, Julian Foad wrote:
> Any responses would be greatly appreciated.

I've had so many family and summer events in the past weeks... soon I'll
start forgetting my passwords!

~Holiday-Neels

> 
> - Julian
> 
> 
> On Tue, 2010-08-03, Julian Foad wrote:
>> On Mon, 2010-07-12, Erik Huelsmann wrote:
>>> After lots of discussion regarding the way NODE_DATA/4th tree should
>>> be working, I'm now ready to post a summary of the progress. In my
>>> last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I
>>> stated why we need this; this post is about the conclusion of what
>>> needs to happen. Also included are the first steps there.
>>>
>>>
>>> With the advent of NODE_DATA, we distinguish node values specifically
>>> related to BASE nodes, those specifically related to "current" WORKING
>>> nodes and those which are to be maintained for multiple levels of
>>> WORKING nodes (not only the "current" view) (the latter category is
>>> most often also shared with BASE).
>>>
>>> The respective tables will hold the columns shown below.
>>>
>>>
>>> -------------------------
>>> TABLE WORKING_NODE (
>>>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>>>   local_relpath  TEXT NOT NULL,
>>>   parent_relpath  TEXT,
>>>   moved_here  INTEGER,
>>>   moved_to  TEXT,
>>>   original_repos_id  INTEGER REFERENCES REPOSITORY (id),
>>>   original_repos_path  TEXT,
>>>   original_revnum  INTEGER,
>>>   translated_size  INTEGER,
>>>   last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
>>>   keep_local  INTEGER,
>>>
>>>   PRIMARY KEY (wc_id, local_relpath)
>>>   );
>>>
>>> CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath);
>>> --------------------------------
>>>
>>> The moved_* and original_* columns are typical examples of "WORKING
>>> fields only maintained for the visible WORKING nodes": the original_*
>>> and moved_* fields are inherited from the operation root by all
>>> children part of the operation. The operation root will be the visible
>>> change on its own level, meaning it'll have rows both in the
>>> WORKING_NODE and NODE_DATA tables. The fact that these columns are not
>>> in the WORKING_NODE table means that tree changes are not preserved
>>> accros overlapping changes. This is fully compatible with what we do
>>> today: changes to higher levels destroy changes to lower levels.
>>>
>>> The translated_size and last_mod_time columns exist in WORKING_NODE
>>> and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that
>>> they exist in BASE_NODE is a bit of a hack: it's to prevent creation
>>> of WORKING_NODE data for every file which has keyword expansion or eol
>>> translation properties set: these columns serve only to optimize
>>> working copy scanning for changes and as such only relate to the
>>> visible WORKING_NODEs.
>>>
>>
>> Can we come up with an English description of what each table will now
>> represent?
>>
>> "The BASE_NODE table lists the existing node-revs in the repository that
>> comprise the mixed-revision tree that was most recently updated/switched
>> to or checked out.  (The kind and content of these nodes is not here;
>> see the NODE_DATA table.)"
>>
>>>  TABLE BASE_NODE (
>>>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>>>   local_relpath  TEXT NOT NULL,
>>>   repos_id  INTEGER REFERENCES REPOSITORY (id),
>>>   repos_relpath  TEXT,
>>
>> We need a revision number column here to go along with repos_id and
>> relpath to make a valid node-rev reference, don't we?
>>
>>>   parent_relpath  TEXT,
>>
>> (While we're reorganising, can we move that "parent_relpath" column to
>> adjacent to "local_relpath"?)
>>
>>>   translated_size  INTEGER,
>>>   last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
>>>   dav_cache  BLOB,
>>>   incomplete_children  INTEGER,
>>>   file_external  TEXT,
>>>
>>>   PRIMARY KEY (wc_id, local_relpath)
>>>   );
>>>
>>
>> "The NODE_DATA table records the kind and shallow content (props, text,
>> link target) of each node in the WC.  It includes both the nodes that
>> comprise the currently 'visible' (or 'actual' or 'on-disk') state of the
>> WC and also all nodes that are part of a copied or moved tree but
>> currently shadowed by a replacement performed inside that tree.
>>
>> At least one row exists for each WC path, including paths with no change
>> and all paths affected by a tree change (add, delete, etc.).  If the
>> same path is affected by multiple levels of tree change - a replacement
>> inside a copied directory, for example - then multiple rows exist with
>> different 'op_depth' values."
>>
>>> TABLE NODE_DATA (
>>>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>>>   local_relpath  TEXT NOT NULL,
>>>   op_depth  INTEGER NOT NULL,
>>>   presence  TEXT NOT NULL,
>>>   kind  TEXT NOT NULL,
>>>   checksum  TEXT,
>>>   changed_rev  INTEGER,
>>>   changed_date  INTEGER,  /* an APR date/time (usec since 1970) */
>>>   changed_author  TEXT,
>>
>> The changed_* columns can only belong to a node-rev that exists in the
>> repository.  What node-rev do they belong to and why aren't they
>> alongside the node-rev details?
>>
>> (The changed_* columns convey essentially a rev number and two of the
>> rev-props associated with that revnum that can be used in keyword
>> expansions.  We should consider representing that information in a more
>> general form, both to avoid tying the DB format to the choice of those
>> two particular revprops, and to avoid the redundancy of storing these
>> same data and author values N times.)
>>
>>
>>>   depth  TEXT,
>>>   symlink_target  TEXT,
>>>   properties  BLOB,
>>
>> (While we're rearranging, can we group the node-content fields together:
>> kind, properties, checksum, symlink_target?)
>>
>>>   PRIMARY KEY (wc_id, local_relpath, oproot)
>>
>> s/oproot/op_depth/?
>>
>>>   );
>>>
>>> CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath);
>>>
>>>
>>> Which leaves the NODE_DATA structure above. The op_depth column
>>> contains the depth of the node - relative to the wc root - on which
>>> the operation was run which caused the creation of the given NODE_DATA
>>> node.  In the final scheme (based on single-db), the value will be 0
>>> for base and a positive integer for WORKING related data.
>>
>> Let's assume single-db.  By the last sentence, I understand: For each
>> BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0;
>> for every node brought in by a tree operation (copy, move, add) to an
>> immediate child of the WC root there is a NODE_DATA row with 'op_root' =
>> 1; for every child of a child ... 2; and so on.
>>
>>
>> - Julian
>>
>>
>>> In order to be able to implement NODE_DATA even without having a fully
>>> functional SINGLE_DB yet, a transitional node numbering scheme needs
>>> to be devised. The following numbers will apply: BASE == 0,
>>> WORKING-this-dir == 1, WORKING-any-immediate-child == 2.
>>>
>>>
>>> Other transitioning related remarks:
>>>
>>>  * Conditional-protected experimentational sections, just like with 
>>> SINGLE_DB
>>>  * Initial implementation will simply replace the current
>>> functionality of the 2 tables, from there we can work our way through
>>> whatever needs doing.
>>>  * Am I forgetting any others?
>>>
>>> Bye,
>>>
>>> Erik.
>>
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: NODE_DATA (2nd iteration)

Reply via email to