On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote: > Johan (and other interested parties), > I've been following some of the commits to the > diff-optimizations-branch with interest. While I've not reviewed them > for technical merit, it appears that others have, and that there is > good work going on in the branch.
Thanks for your interest. It might be good to get a little bit more review on the whole, I think. A lot of people have read parts of the code, but if I remember correctly most (if not all) of them have said things like "I haven't reviewed it in detail, but here are some suggestions, feedback, ...". > I'm wondering what the potential > for merging back to trunk is. This is the TODO section from the > BRANCH-README: > > TODO: > > * eliminate identical prefix [DONE] > * eliminate identical suffix [DONE] > * make diff3 and diff4 use datasources_open [DONE] > (this may eliminate the need for datasource_open, and the flag > datasource_opened in token.c#svn_diff__get_tokens) > * implement (part of) increment_pointers and > decrement_pointers with a macro [DONE] > (offers another 10% speedup) > * integrate (some of the) low-level optimizations for prefix/suffix > scanning, proposed by stefan2 [2] [] > * revv svn_diff.h#svn_diff_fns_t [] > > It looks like, for the most part, any destabilizing functionality is > completed, and what remains are simply optimizations. This > optimization work can probably be performed on trunk, and if so, we > should merge the branch back and do the cleanup work there. The only > bit on the current list of stuff is revving svn_diff_fns_t. Can that > work be parallelized? Yes, you are correct. Most of the essential work is done. Further optimizations can just as well be done on trunk. Revving svn_diff_fns_t: what do you mean with parallelizing it? I must admit that I don't really know (yet) how to go about that. Very early during the branch work, danielsh pointed out that I modified this public struct (vtable for reading data from datasources), so it should be revved. Is it listed/documented somewhere what should be done for that (community guide perhaps)? (one slightly related note to this: I introduced the function datasources_open, to open the multiple datasources at once (as opposed to the original datasource_open, which only opens one datasource). But maybe the new function should be renamed to datasource_open_all or datasources_open_all or something, to make it stand out a little bit more). > I'm not trying to rush the work, nor do I think it is essential for > 1.7, but it feels like a pretty good performance increase, and one > that we shouldn't have any problem shipping. (If there are currently > API uncertainties, than it would be good to wait until 1.7.x branches, > though.) Yes, I think it's quite ready to be merged, and could provide a very significant performance increase (for diff, merge and blame). The API is stable now, I think, except maybe for the name of the datasources_open function (see above). If we decide to go (for optimizations reasons) for specialized prefix/suffix scanning functions for 2, 3 or 4 datasources, I think it's best to keep the generic datasources_open API (with an array of datasources), and only split up between specialized versions in the implementation. > Anyway, I just really like the concept / implementation of the branch, > and I'm excited to get it into the hands of users. Thoughts? > > -Hyrum > Cheers, -- Johan