Re: FS Node API

Ivan Zhakov Sat, 09 Jan 2016 05:39:58 -0800

On 9 January 2016 at 03:48, Stefan Fuhrmann <[email protected]> wrote:
>
> Hi Ivan & interested parties,
>
Hi Stefan!


See my response inline.

> Excellent of you starting with the API work!  For all
> who don't know yet:  Ivan and I talked about this idea
> in Berlin and agreed that it sounds like a good idea.
> I had played with some prototypic implementation FSX
> and learned a bit about the its pitfalls.
>
> Below there is a list of questions that I had when
> reviewing the first few commits.  There are also sections
> on what I have in mind and partly in code for the new API.
>
> -- Stefan^2.
>
>
> Questions / Feedback to fs-node-api as of r1723819
> --------------------------------------------------
>
> [Not commenting on the implementation since that depends
>  on points in the other sections.]
>
> * BRANCH-README: "The branch is NOT going to be merged to
>   trunk; I'm going to re-implement everything in trunk ..."
>
>   - Why implementing it a second time (assuming the new
>     API turns out to be useful)?
>   - Do you expect the code to be "too prototypic" to be
>     used for production?  If so, any specific indication
>     for that?
>   - I think having the feature developed on a branch makes
>     sense because the switch-over to the new API will be
>     very noisy.  Otherwise, it could be done on /trunk
>     directly - and be reverted again if necessary.
>
I don't like developing new features in branches, so I consider this
branch as experiment to explore the FS node API idea and make
important decisions. It's not a big problem to re-implement all this
on trunk using multiple incremental changes with detailed log messages
and proper docstrings.


> * svn_fs.h
>
>   - Maybe, move the new API functions to new section.
>     They will not always be 1:1 replacements.
>
Sure, but for simplicity I just revving existing functions to accept
svn_fs_node_t instead ROOT+PATH.

> * "svn_fs_node_t does not depends on svn_fs_root object created from"
>
>   - I'm fine with that.  Any specific reason behind this
>     other than that node_t should be be self-contained
>     within fs_t and result_pool?
>
libsvn_repos/reporter.c open and closes svn_fs_root_t dynamically and
it will be very challenging to rewrite to maintain dependency between
svn_fs_root_t and svn_fs_node_t.

> * "svn_fs_node_t always reference existing node in tree (?)"
>
>   - Within a transaction, nodes of type svn_node_none
>     make sense, e.g. after a deletion.  Footnote: in-txn
>     nodes do have a number of update / invalidation issues,
>     which makes having separate vtables instances for
>     in-txn and in-rev nodes a good idea.
>   - I'm not too sure we should allow creating nodes for
>     any random path but I guess we must at least have
>     the option to do so b/c of symmetry.
>

There are two possible design approaches:
1. svn_fs_node_t is just combination of ROOT+PATH and may point to
some never existent node.
2. svn_fs_node_t is always point to some existing node in tree. Yes,
we may end up with dangling node in case of TXN root nodes, but it's
out of scope.

I decided to go with (2) at this time.

> * svn_fs_open_node
>
>   - Could we name this svn_fs_node or svn_fs_get_node?
>     "open" suggests that there are resources involved
>     which require an implicit and sometimes explicit
>     "close".
>
This is intentional name: svn_fs_open_node() actually do some work to
check whether node exists.

> * svn_fs_file_length
>
>   - I was thinking of naming that svn_fs_node_length.
>     It could error out on non-files as it does today
>     or return some informative value like the representation
>     size.  Caller logic needs to handle the "not a file"
>     case anyway, so it could use the new info on its own
>     discretion.
>
Makes sense, but for the branch I wanted to keep API like similar as
possible for easy switching callers.

> * svn_fs_dir_entries2
>
>   - Again, this could simply be svn_fs_sub_nodes.
I kept existing semantic just to making porting to new API easier. We
can discuss this improvement later.

>   - The result should be an array of node_t *, sorted
>     by their names.  This is what the storage layer
>     delivers in FS* and it is what the repos layer
>     would like to scan during reporting.
I think it's completely ortogonal to FS node API, so decided to move
this task out of scope of the branch. Btw current update reporter
implementation uses hashes very intensively.

> Additional Goals
> ----------------
>
> * Use 2-pool paradigm.
>
Yes, I already using 2-pool paradigm in all revv functions.

> * Where appropriate, make the API more orthogonal and
>   consistent (e.g. svn_fs_node_make instead of
>   svn_fs_node_make_file and svn_fs_node_make_dir).
>
I think it's of scope at least for experimental branch. Personally I'm
fine with current naming and would like to avoid just flavoring
changes.

>
> Observations in my Experiments
> ------------------------------
>
> * Despite calling it "node API" it actually concerned with
>   efficiently navigating directories.  So, always represent
>   the node using 2 structs: the actual node plus a (usually
>   shared) struct containing all the context info - representing
>   the directory.
>
> * The parameters for svn_fs_node_make, svn_fs_node_copy etc.
>   should take pure paths to describe the target instead of
>   parent-node + local-name.
>
> * Some function API function names clash with existing ones,
>   i.e. need to be revved (svn_fs_node_history3,
>   svn_fs_node_relation2 and everything prop).
>
> * Txn nodes and revision nodes should use 2 different
>   instances of the vtable.  Error handling for operations
>   that either apply to txn nodes or revision nodes only
>   can then be moved to lib_fs by checking for NULL pointers
>   in the vtable.
>
What is purpose of different vtables for txn and revision nodes?

>
> Proposal for an Implementation Strategy
> ---------------------------------------
>
> * node_compat.*
>   - Provide a default / fallback implementation in lib_fs
>     for backends that don't have native node API support.
>   - Blueprint for backend code, i.e. ensure we can have
>     a clean structure
>   - Implement API (roughly) one function at a time.
>
> * Ensure semantic equivalence to old API and high test
>   coverage from the start
>   - Implement node_compat in terms of the old backend API
>   - Implement the old FS API in terms of the new API,
>     i.e. route all calls through node_compat.*
>   - Have #define that controls whether the old API is run
>     natively or gets diverted through node_compat.*
>   - If the backend vtable entry is NULL for an old API
>     function, divert it through node_compat.*
I've already added '#if 0' code for that. See svn_fs_open_node().
>
> * Switch API users over to the new API
>   - This is what makes the branch worthwhile.
>   - Deprecate the old API only after migrating most callers
>
> * Implement the node API in FSFS
>   - Take node_compat and only replace the vtable calls
>     with direct function calls.
>   - Add direct addressing info (noderev IDs) to the structs.
I don't understand this point. Why we cannot use pointer to dag_node_t
for svn_fs_node_t implemention in FSFS?

>   - Switch functions over to "native" code.
>   - NULL the old API vtable entries to enforce the new
>     backend logic.
>
> * Final review and merger into /trunk
>   - Report processing as in "diff --summary" is a good
>     performance indicator.
My first goal is to switch 'svn ls -v' to new API and see how it works.



-- 
Ivan Zhakov

Re: FS Node API

Reply via email to