Re: Status of the branch diff-optimizations-bytes branch

Johan Corveleyn Tue, 25 Jan 2011 18:32:00 -0800

On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <[email protected]> wrote:
> Johan (and other interested parties),
> I've been following some of the commits to the
> diff-optimizations-branch with interest.  While I've not reviewed them
> for technical merit, it appears that others have, and that there is
> good work going on in the branch.


Thanks for your interest.

It might be good to get a little bit more review on the whole, I
think. A lot of people have read parts of the code, but if I remember
correctly most (if not all) of them have said things like "I haven't
reviewed it in detail, but here are some suggestions, feedback, ...".

> I'm wondering what the potential
> for merging back to trunk is.  This is the TODO section from the
> BRANCH-README:
>
> TODO:
>
>  * eliminate identical prefix                 [DONE]
>  * eliminate identical suffix                 [DONE]
>  * make diff3 and diff4 use datasources_open  [DONE]
>     (this may eliminate the need for datasource_open, and the flag
>      datasource_opened in token.c#svn_diff__get_tokens)
>  * implement (part of) increment_pointers and
>    decrement_pointers with a macro            [DONE]
>     (offers another 10% speedup)
>  * integrate (some of the) low-level optimizations for prefix/suffix
>    scanning, proposed by stefan2 [2]          []
>  * revv svn_diff.h#svn_diff_fns_t             []
>
> It looks like, for the most part, any destabilizing functionality is
> completed, and what remains are simply optimizations.  This
> optimization work can probably be performed on trunk, and if so, we
> should merge the branch back and do the cleanup work there.  The only
> bit on the current list of stuff is revving svn_diff_fns_t.  Can that
> work be parallelized?

Yes, you are correct. Most of the essential work is done. Further
optimizations can just as well be done on trunk.

Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
admit that I don't really know (yet) how to go about that. Very early
during the branch work, danielsh pointed out that I modified this
public struct (vtable for reading data from datasources), so it should
be revved. Is it listed/documented somewhere what should be done for
that (community guide perhaps)?

(one slightly related note to this: I introduced the function
datasources_open, to open the multiple datasources at once (as opposed
to the original datasource_open, which only opens one datasource). But
maybe the new function should be renamed to datasource_open_all or
datasources_open_all or something, to make it stand out a little bit
more).

> I'm not trying to rush the work, nor do I think it is essential for
> 1.7, but it feels like a pretty good performance increase, and one
> that we shouldn't have any problem shipping.  (If there are currently
> API uncertainties, than it would be good to wait until 1.7.x branches,
> though.)

Yes, I think it's quite ready to be merged, and could provide a very
significant performance increase (for diff, merge and blame).

The API is stable now, I think, except maybe for the name of the
datasources_open function (see above). If we decide to go (for
optimizations reasons) for specialized prefix/suffix scanning
functions for 2, 3 or 4 datasources, I think it's best to keep the
generic datasources_open API (with an array of datasources), and only
split up between specialized versions in the implementation.

> Anyway, I just really like the concept / implementation of the branch,
> and I'm excited to get it into the hands of users.  Thoughts?
>
> -Hyrum
>

Cheers,
-- 
Johan

Re: Status of the branch diff-optimizations-bytes branch

Reply via email to