Given all the responses in the thread, I'd say we're moving to the single table for BASE and WORKING node recording. There was a flurry of activity from me yesterday and this morning regarding NODE_DATA: that was just me flushing my queue of patches.
The work isn't completely irrelevant, as it identifies the spots where the NODES table will be introduced, just like NODE_DATA had to. Today, I'll go to draw up the NODES table and move over the queries which had already been modified for NODE_DATA over to the NODES design. I hope to get a very long way today already. If it's not done today, then I expect it to be able to finish it this week. Anyone wanting to join in: let's chat on IRC. Bye, Erik. On Thu, Sep 2, 2010 at 11:34 PM, Erik Huelsmann <ehu...@gmail.com> wrote: > > > As described by Julian earlier this month, Julian, Philip and I observed > that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in > common. Notably, by introducing the NODE_DATA table, most fields from > BASE_NODE and WORKING_NODE already moved to a common table. > > The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the > side of WORKING_NODE are the 2 cache fields 'translated_size' and > 'last_mod_time'. Apart from those two, there are the indexing fields wc_id, > local_relpath and parent_relpath. > > In the end we're storing *lots* of bytes (wc_id, local_relpath and > parent_relpath) to store 2 64-bit values. > > On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path > and revision. The NODE_DATA table already has the fields original_repos_id, > original_repos_path and original_revision. When op_depth == 0, these are > guaranteed to be empty (null), since they are for working nodes with > copy/move source information. Renaming the three fields in NODE_DATA to > repos_id, repos_path and revision, generalizing their use to include > op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE > would be reduced to a store of the dav_cache, translated_size and > last_mod_time fields. > > By subsuming translated_size and last_mod_time into NODE_DATA, neither > WORKING_NODE nor BASE_NODE will need to store these values anymore. This > eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then > only stores dav_cache. Here too, it's probably more efficient (in size) to > store dav_cache in NODE_DATA to prevent repeated storage of wc_id, > local_relpath and parent_relpath in BASE_NODE. > > In addition to the eliminated storage overhead, we'd be making things a > little less complex for ourselves: UPDATE, INSERT and DELETE queries would > be operating only on a single table, removing the need to split updates > across multiple statements. > > > This week, I was discussing this change with Greg on IRC. We both have the > feeling this should work out well. The proposal here is to switch > (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES. > > > Comments? Fears? Enhancements? > > > Bye, > > > Erik. >