Re: svn commit: r997203 - in /subversion/trunk: ./ subversion/include/ subversion/libsvn_client/ subversion/libsvn_diff/ subversion/libsvn_fs_fs/ subversion/libsvn_ra_svn/ subversion/libsvn_repos/ sub
On 15.09.2010 09:56, Hyrum K. Wright wrote: On Wed, Sep 15, 2010 at 9:45 AM, Stefan Sperlings...@elego.de wrote: On Wed, Sep 15, 2010 at 06:52:07AM -, hwri...@apache.org wrote: == --- subversion/trunk/subversion/include/svn_string.h (original) +++ subversion/trunk/subversion/include/svn_string.h Wed Sep 15 06:52:06 2010 @@ -253,6 +253,15 @@ svn_stringbuf_chop(svn_stringbuf_t *str, void svn_stringbuf_fillchar(svn_stringbuf_t *str, unsigned char c); +/** Append a single character @a byte onto @a targetstr. + * + * reallocs if necessary. @a targetstr is affected, nothing else is. + * @since New in 1.7. + */ +void +svn_stringbuf_appendbyte(svn_stringbuf_t *targetstr, + char byte); + The docstring should list advantages svn_stringbuf_appendbyte(buf, c) has over svn_stringbuf_appendbytes(buf,c, 1). We need to understand where the performance benefits really come from. IIRC the benefit was dependent on the optimizer in the compiler to some extent, is this correct? I don't think that's completely correct, but I'll leave it to Stefan F. to determine that. Since this function has now been merged to trunk, updates to the doc string should be done on trunk. But I feel comfortable allowing Stefan to make these docstring edits and commit without prior review, instead of requiring him to mail the patch to the list. -Hyrum (Almost) done so in r1001413. Because I had to move one statement to make the rationals easier to explain, I didn't commit it on the trunk directly. It's a very minor change but I didn't feel comfortable to put an approved by: Hyrum in the commit message ... -- Stefan^2.
Re: object-model: Wrapping Subversion C-structs in C++
Hyrum K. Wright hyrum_wright_at_mail.utexas.edu mailto:hyrum_wright_at_mail.utexas.edu?Subject=Re:%20object-model:%20Wrapping%20Subversion%20C-structs%20in%20C%2B%2B wrote: For the C++ folks out there, I've got a question about an approach to take on the object-model branch. At issue is how to wrap the various C structures returned to callers, particularly in a backward compatible manner. Currently, I'm looking at svn_wc_notify_t *. As I see it, there are a few options: 1) Just wrap the pointer to the C struct as a member of the wrapper class. Pros: Easy to implement; lightweight constructor. Cons: Getters would need to translate to C++ types; would need to implement a copy constructor which deep copies the C struct; would also introduce pools, since creating and duplicating C structs requires them. 2) Wrap each C struct member individually Pros: C-C++ complexity is constrained to the constructor, everything else is C++ types Cons: Hard to extend for future compatibility 3) Just pass the C-struct pointer around; don't even bother with a class Pros: Dead simple. Cons: Requires more memory management thought by consumers; not C++-y enough; may introduce wrapping difficulties. I'd like to come up with something consistent, which would be used throughout the C++ bindings. I'm also interested in a solution which ensures the C++ bindings can be used as the basis for other object-oriented bindings models (Python, Perl, etc.) Thoughts? One issue that has not been talked about in this thread is strong typing. If you remember the problems with Johan's diff / blame optimizations, the reason behind it was a confusion of type semantics. Some ints were line numbers, others were file offsets. But there was / is no formal way to tell them apart. Since you decided to use templates in your code, I thought I would give it a try and design a simple template class that allows you to define any number of int-like types that are mutually distinct and require explicit conversion. It would be nice to have the C++ wrappers use these types instead of plain ints etc. in their signatures. -- Stefan^2. // TypedInts.cpp : Defines the entry point for the console application. // #include stdafx.h // extend that enum to define further types / kinds of integers enum IntegerTypes { itLineNumber, itFileOffset, itRevision }; // a type selection utility struct // (X, Y, true) - X // (X, Y, false) - Y templateclass First, class Second, bool get_first struct SelectType { }; templateclass First, class Second struct SelectTypeFirst, Second, true { typedef typename First type; }; templateclass First, class Second struct SelectTypeFirst, Second, false { typedef typename Second type; }; // a utility struct mapping a (potentially unsigned) integer // to the corresponding signed integer. templateclass T struct DiffType { typedef typename T type; }; template struct DiffTypeunsigned char { typedef char type; }; template struct DiffTypeunsigned short { typedef short type; }; template struct DiffTypeunsigned { typedef int type; }; template struct DiffTypeunsigned long { typedef long type; }; template struct DiffTypeunsigned long long { typedef long long type; }; // A typed integer: // V .. base integer type (e.g. unsigned) // T .. formal classification, i.e. this actually separates the int types // diff_type .. if true, this is the difference type //if false, this is the absolute value type //TypedInt(X,Y,false) - TypedInt(X,Y,false) - TypedInt(X,Y,true) // // The arithmetics, conversions and getters have been carefully // designed that only meaningful combinations of arguments are // valid and everything else will be rejected by the compiler. // // In optimized code, this class does not impose any runtime overhead // over plain use of built-in types. templateclass V, IntegerTypes T, bool diff_type class TypedInt { public: // encapsulated int types typedef typename V absolute_value_t; typedef typename DiffTypeV::type diff_value_t; typedef typename SelectTypediff_value_t, absolute_value_t, diff_type::type value_t; // typed integers typedef typename TypedIntV, T, false absolute_t; typedef typename TypedIntV, T, true diff_t; typedef typename TypedIntV, T, diff_type this_t; // expose template parameters enum {type = T, is_diff = diff_type}; // construction, auto-conversion TypedInt() : value() { } TypedInt(value_t value) : value(value) { } // data access value_t get() { return value; } const value_t get() const { return value; } value_t* operator() { return value; } const value_t* operator() const { return value; } // assignment TypedInt operator=(value_t rhs)
Re: Do we want 'svn patch' to be able to add empty files?
On Wed, Sep 01, 2010 at 06:37:08PM +0100, Julian Foad wrote: Daniel Näslund wrote: The question is: Do we want 'svn patch to be able to add empty files. If add an empty file is a well defined operation in Git-diff syntax, then yes, it would be good to support it. As well as delete this empty file and any other valid Subversion operations that aren't yet supported. Support for deleting empty files was added in r1001493. There's a lot of other interresting stuff in this thread (adding support for dirs in the git format and properly applying symlinks). Will take a stab at those in the close future. Daniel
Re: [RFC] Preprocessor flags to target code to a specific API version
On Sat, Sep 25, 2010 at 21:07, Dani Church dchu...@cheri.shyou.org wrote: ... You're certainly right, it is a big patch-- updating all of the SVN_DEPRECATED macros is around a 100KB patch. I went through and updated them all based on the @deprecated tag version, and for the functions that were deprecated by a similarly-named one (like svn_client_info = svn_client_info2) or that were related to a deprecated type (like svn_client_commit_item_create = svn_client_commit_item3_create) I also checked it against the version where the newer function or type was actually introduced, and updated the doc comment where there was a mismatch. It's still possible that some functions got the wrong SVN_DEPRECATED tag, but that's hardly the end of the world; it just means that sometimes the compiler won't spit out the right deprecation warnings if you've set SVN_TARGET_API. I also looked into the other idea I mentioned, tuning the API to give errors when you use a function or a type that was introduced in a later API than the one you're targeting. I used ctags and a whole bunch of text processing to determine which functions, types, defines, and members were introduced in which API version, and put #if SVN_TARGET_API = x ... #endif blocks around all of them. I can't say that the notation was absolutely perfect, but there is this: gcc will compile all of the headers without error using any SVN_TARGET_API from 0 to 7, and using 7 (or omitting it entirely) produces a set of preprocessed headers that are identical to the originals, so at the very least anyone not using API targeting won't be affected by the changes. The patch for this is a lot bigger, about 500KB. I may have gone a bit overboard. Anyway, I have a couple patches here if anyone wants to look at them, but they're probably a bit large to attach to an email. What should I do with them? First off, if you could somehow extract the patch for fixing the comments' @deprecated statements, that would be awesome. Fixing those is a no-brainer and introduces no extra complexity. Switching from USE_SVN_API to SVN_TARGET_API is good. I was going to suggest a Proper SVN_ prefix, and woot! You did that already. Wrapping declarations with #if SVN_TARGET_API is a non-starter, I believe. That sounds like it would obfuscate the headers a bit too much. In your original email, you mentioned something about gcc poison which made it sound like you could flag certain declarations to gcc as bad to use. I just looked up that poison stuff, and I don't see how we can magically work that into the SVN_DEPRECATED macros that occur in each declaration (since you have to list the function name in there). The closest that I could see would be something like: SVN_DEPRECATED_5 svn_error_t * svn_wc_some_function(arguments, more argments); SVN_MAYBE_POISON_5(svn_wc_some_function); Where the latter macro expands to: _Pragma(GCC poison svn_wc_some_function); (I think that is the right syntax) But again, that will add even more cruft around the declarations which I'm not sure will be acceptable (gotta see what people think). It is also possible to simply deprecate the future APIs rather than declare them poisoned, although it means adding an SVN_DEPRECATED_? to the true API. Or maybe we could add SVN_CURRENT (or somesuch) to the current APIs, which become deprecated if you target an older API. Cheers, -g
Re: [RFC] Preprocessor flags to target code to a specific API version
Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400: SVN_DEPRECATED_5 svn_error_t * svn_wc_some_function(arguments, more argments); SVN_MAYBE_POISON_5(svn_wc_some_function); Where the latter macro expands to: _Pragma(GCC poison svn_wc_some_function); (I think that is the right syntax) But again, that will add even more cruft around the declarations which I'm not sure will be acceptable (gotta see what people think). Adding the SVN_MAYBE_POISON(svn_wc_some_function) can be done mechanically (by a script that reads the @deprecated comments and transforms the header files accordingly) --- instead of editing all headers now, we could provide a knob for people to run this script during 'make install'.
Re: svnrdump tool
Ramkumar Ramachandra wrote on Sun, Sep 26, 2010 at 01:48:18 +0530: Hi Daniel, Daniel Shahaf writes: Daniel Shahaf wrote on Sat, Sep 25, 2010 at 16:49:08 +0200: 0:/tmp/svn% $svnrdump dump file://$PWD/r1/trunk/A/B 21 | grep Node-path Node-path: trunk Node-path: trunk/A Node-path: trunk/A/B Node-path: trunk/A/B/E Node-path: trunk/A/B/E/alpha Node-path: trunk/A/B/E/beta Node-path: trunk/A/B/F Node-path: trunk/A/B/lambda Nice :-) Is it? If I try to dump a subdir, then I /shouldn't/ get a Create /trunk entry in the dumpfile... instead, it should be the user's business to create that directory in the target repository out of band, before loading the dump into it. svnrdump behaves exactly like svnsync in this manner. svnsync allows you to sync a subdir of a repository (i.e., svnsync $REPOS/trunk/A/B $MIRROR ), but does it also create /trunk/A/B in the mirror? But for now I still think that svnrdump invocation I quoted above shouldn't have outputted a 'create /trunk' entry in the dumpfile :-). @all, what do you think? See the tests only_trunk_dump, only_trunk_A_with_changes_dump, and descend_into_replace_dump that I imported from svnsync. They all exercise this subdirectory feature. Maybe the tests are not exhaustive? -- Ram
[PATCH] fix @deprecated tags in header comments
This patch is a fix for some of the @deprecated tags in the Doxygen comments. It standardizes them so that the phrasing Provided for backward compatibility with the 1.x API refers to the LAST version for which the function was valid rather than the first. [[[ * subversion/include/svn_fs.h, subversion/include/svn_diff.h, subversion/include/svn_mergeinfo.h, subversion/include/svn_repos.h, subversion/include/svn_io.h, subversion/include/svn_wc.h, subversion/include/svn_client.h: Fix @deprecated tags in Doxygen comments. ]]] Index: subversion/include/svn_fs.h === --- subversion/include/svn_fs.h (revision 1001135) +++ subversion/include/svn_fs.h (working copy) @@ -533,7 +533,7 @@ /** * Same as svn_fs_access_add_lock_token2(), but with @a path set to value 1. * - * @deprecated Provided for backward compatibility with the 1.1 API. + * @deprecated Provided for backward compatibility with the 1.5 API. */ SVN_DEPRECATED Index: subversion/include/svn_diff.h === --- subversion/include/svn_diff.h (revision 1001135) +++ subversion/include/svn_diff.h (working copy) @@ -537,7 +537,7 @@ /** Similar to svn_diff_file_output_unified3(), but with @a relative_to_dir * set to NULL and @a show_c_function to false. * - * @deprecated Provided for backwards compatibility with the 1.3 API. + * @deprecated Provided for backwards compatibility with the 1.4 API. */ SVN_DEPRECATED svn_error_t * Index: subversion/include/svn_mergeinfo.h === --- subversion/include/svn_mergeinfo.h (revision 1001135) +++ subversion/include/svn_mergeinfo.h (working copy) @@ -226,7 +226,7 @@ /** Like svn_mergeinfo_remove2, but always considers inheritance. * - * @deprecated Provided for backward compatibility with the 1.5 API. + * @deprecated Provided for backward compatibility with the 1.6 API. */ SVN_DEPRECATED svn_error_t * @@ -310,7 +310,7 @@ /** Like svn_mergeinfo_intersect2, but always considers inheritance. * - * @deprecated Provided for backward compatibility with the 1.5 API. + * @deprecated Provided for backward compatibility with the 1.6 API. */ SVN_DEPRECATED svn_error_t * Index: subversion/include/svn_repos.h === --- subversion/include/svn_repos.h (revision 1001135) +++ subversion/include/svn_repos.h (working copy) @@ -2556,7 +2556,7 @@ * Similar to svn_repos_load_fs2(), but with @a use_pre_commit_hook and * @a use_post_commit_hook always @c FALSE. * - * @deprecated Provided for backward compatibility with the 1.0 API. + * @deprecated Provided for backward compatibility with the 1.1 API. */ SVN_DEPRECATED svn_error_t * Index: subversion/include/svn_io.h === --- subversion/include/svn_io.h (revision 1001135) +++ subversion/include/svn_io.h (working copy) @@ -289,7 +289,7 @@ /** Like svn_io_open_unique_file2, but can't delete on pool cleanup. * - * @deprecated Provided for backward compatibility with the 1.0 API + * @deprecated Provided for backward compatibility with the 1.3 API * * @note In 1.4 the API was extended to require either @a f or * @a unique_name_p (the other can be NULL). Before that, both were @@ -1395,6 +1395,7 @@ * structures instead of svn_io_dirent2_t and with only a single pool. * * @since New in 1.3. + * @deprecated Provided for backward compatibility with the 1.6 API. */ SVN_DEPRECATED svn_error_t * Index: subversion/include/svn_wc.h === --- subversion/include/svn_wc.h (revision 1001135) +++ subversion/include/svn_wc.h (working copy) @@ -3315,7 +3315,7 @@ /** * Similar to svn_wc_walk_entries2(), but without cancellation support. * - * @deprecated Provided for backward compatibility with the 1.0 API. + * @deprecated Provided for backward compatibility with the 1.1 API. */ SVN_DEPRECATED svn_error_t * @@ -4126,7 +4126,7 @@ * instead of #svn_wc_status_func3_t. * * @since New in 1.5. - * @deprecated Provided for backward compatibility with the 1.4 API. + * @deprecated Provided for backward compatibility with the 1.5 API. */ SVN_DEPRECATED svn_error_t * @@ -4822,7 +4822,7 @@ * Similar to svn_wc_resolved_conflict2(), but takes an * svn_wc_notify_func_t and doesn't have cancellation support. * - * @deprecated Provided for backward compatibility with the 1.0 API. + * @deprecated Provided for backward compatibility with the 1.1 API. */ SVN_DEPRECATED svn_error_t * @@ -4995,7 +4995,7 @@ * * @since New in 1.5. * - * @deprecated Provided for backwards compatibility with the 1.5 API. + * @deprecated Provided for backwards compatibility with the 1.6 API. */ SVN_DEPRECATED svn_error_t * @@ -5825,6 +5825,7 @@
Re: [RFC] Preprocessor flags to target code to a specific API version
Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400: Wrapping declarations with #if SVN_TARGET_API is a non-starter, I believe. That sounds like it would obfuscate the headers a bit too much. In your original email, you mentioned something about gcc poison which made it sound like you could flag certain declarations to gcc as bad to use. What advantages does 'poison' have over, you know, attempting to compile an application against an old set of headers? gcc -I https://svn.apache.org/repos/asf/subversion/tags/1.4.0/subversion/include/
Re: [RFC] Preprocessor flags to target code to a specific API version
On Sun, 26 Sep 2010, Greg Stein wrote: First off, if you could somehow extract the patch for fixing the comments' @deprecated statements, that would be awesome. Fixing those is a no-brainer and introduces no extra complexity. Done, I sent that out in a separate email. Wrapping declarations with #if SVN_TARGET_API is a non-starter, I believe. That sounds like it would obfuscate the headers a bit too much. In some senses, I'd say yes-- it does make the headers more verbose, certainly. But I also feel there are cases where it actually clarifies things a bit, like when structs get extended; IMHO seeing struct svn_some_type { int x; int y; #if SVN_TARGET_API = 2 int x2; int y2; #endif #if SVN_TARGET_API = 5 void *baton; #endif } makes it very clear which parts of the struct were introduced in which versions. Yes, the information is duplicated in the Doxygen comments, but I feel this draws the eye more easily. That being said, it is a half-megabyte patch, and even given the overhead of unified diff format, that's a whole bunch of extra lines of code. In your original email, you mentioned something about gcc poison which made it sound like you could flag certain declarations to gcc as bad to use. I just looked up that poison stuff, and I don't see how we can magically work that into the SVN_DEPRECATED macros that occur in each declaration (since you have to list the function name in there). The closest that I could see would be something like: SVN_DEPRECATED_5 svn_error_t * svn_wc_some_function(arguments, more argments); SVN_MAYBE_POISON_5(svn_wc_some_function); Where the latter macro expands to: _Pragma(GCC poison svn_wc_some_function); (I think that is the right syntax) But again, that will add even more cruft around the declarations which I'm not sure will be acceptable (gotta see what people think). I thought about that, and that's pretty much what SVN_POISON does for GCC; however, I believe that if you poison an identifier, it's best not to have it appear anywhere in the included source. Also, the way I implemented the SVN_POISON() macro uses a different semantic for non-GCC compilers; it generates a function prototype that looks like this: svn_api_mismatch *svn_wc_some_function(svn_api_mismatch *); Where svn_api_mismatch is an opaque struct defined at the same time as SVN_POISON. That makes it so that any calls to the function will give argument type warnings and make some amount of sense. It is also possible to simply deprecate the future APIs rather than declare them poisoned, although it means adding an SVN_DEPRECATED_? to the true API. Or maybe we could add SVN_CURRENT (or somesuch) to the current APIs, which become deprecated if you target an older API. I think this may in fact end up being the best idea; I'd actually suggest a slightly different format for the SVN_DEPRECATED_? macros than I originally used. Rather than noting the API version in which a function was deprecated, instead note the versions in which it was active, like the following: SVN_API_SINCE_5 svn_error_t * svn_client_update3([...]); SVN_API_SINCE_2 SVN_API_UNTIL_4 svn_error_t * svn_client_update2([...]); SVN_API_UNTIL_1 svn_error_t * svn_client_update([...]); This has the benefit of being directly in line with the @since and @deprecated tags in the Doxygen comments. Anyway, that's probably more than enough for this email, I think. Is it any surprise that I don't have any issue with verbose header files? :) -Dani Church
Re: [RFC] Preprocessor flags to target code to a specific API version
On Sun, 26 Sep 2010, Daniel Shahaf wrote: Adding the SVN_MAYBE_POISON(svn_wc_some_function) can be done mechanically (by a script that reads the @deprecated comments and transforms the header files accordingly) --- instead of editing all headers now, we could provide a knob for people to run this script during 'make install'. I toyed around with that kind of idea, but, having just done a lot of partially-automated text processing on these files, I know just how hard it is to get it right-- I wouldn't want to leave that to an automated script. Also, I'm not a big fan of having to mangle the header files before they're usable-- I'd rather be able to point my GCC include path straight at my working copy. -Dani Church
Re: [RFC] Preprocessor flags to target code to a specific API version
On Sun, 26 Sep 2010, Daniel Shahaf wrote: Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400: Wrapping declarations with #if SVN_TARGET_API is a non-starter, I believe. That sounds like it would obfuscate the headers a bit too much. In your original email, you mentioned something about gcc poison which made it sound like you could flag certain declarations to gcc as bad to use. What advantages does 'poison' have over, you know, attempting to compile an application against an old set of headers? gcc -I https://svn.apache.org/repos/asf/subversion/tags/1.4.0/subversion/include/ The advantage is that if you're working on an open-source project for which the policy is we want to support any version of libsvn_client from 1.4 onwards, you can make sure that no one submits code that breaks the policy by associating the codebase itself with the 1.4 API, rather than having to tell everyone to download the 1.4.0 headers and change their include path. There are a couple other reasons why I like the idea, but pinning a project to a given API is the big one. -Dani Church
Re: place of svnrdump
On 2010-09-25 14:43, Daniel Shahaf wrote: Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530: Agreed, these modules should not be part of the core. However, in the case of Subversion, there absolutely NO way to get/ back up the revision history data* [5]. svnsync. On a side note, svnsync happens to be relatively slow. I tried to svnsync the ASF repos once (for huge test data). The slowness of svnsync made it practically unfeasible to pull off. I ended up downloading a zipped dump and 'svnadmin load'ing that dump. Even with a zipped dump already downloaded, 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions. (And someone rebooted that box after two days, halfway through, grr. Took some serious hacking to finish up without starting over.) So, that experience tells me that svnsync and svnadmin dump/load aren't close to optimal, for example compared to a straight download of 34 gigs that the ASF repos is... Anything that could speed up a remote dump/load process would probably be good -- while I don't know any details about svnrdump. My two cents: Rephrasing everything into the dump format and back blows up both data size and ETA. Maybe a remote backup mechanism could even break loose from discrete revision boundaries during transfer/load... In contrast, the speed of a remote 'svn log' just amazes me. It's pretty darn fast to get all the commit logs of a repos. So between that and getting the rev content as well there's some big speed loss. Heh, that's my reply to a single-word statement ;) ~Neels P.S.: If the whole ASF repos were a single Git WC, how long would that take to pull? (Given that Git tends to take up much more space than a Subversion repos, I wonder.) signature.asc Description: OpenPGP digital signature
[WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster
Hi devs, As discussed in [1], here is a patch that makes svn_diff_diff (libsvn_diff/diff.c) skip the identical prefix and suffix of the original and modified files, before starting the LCS (longest common subsequence) algorithm on the non-matching part. This makes diff a lot faster (especially for large files), thereby also speeding up blame in environments where the client is the bottleneck (with a fast enough server and network, blame is constrained by the speed of diff on the client side). This patch still has some rough edges (see below), hence the WIP marker. Nevertheless, it works most of the time, and it can demonstrate the speed improvement that can be expected. I will continue working on the rough edges, but in the meantime any feedback/review/thoughts are very much appreciated. The rough edges: 1) While scanning through the identical prefix, I have to count the number of lines (to have correct line offsets for the rest of the algorithm). To do that, I currently count the number of \n's, so it won't work for files with \r eol-style. Not so difficult to fix (similar to how diff_file.c#datasource_get_next_token does it), but I haven't done it yet. 2) Two tests still fail: - blame_tests.py 2: annotate a binary file - blame_tests.py 7: blame with different eol styles Maybe this is because of 1), I'll have to investigate. 3) It's only implemented for 2 files. I'd like to generalize this for an array of datasources, so it can also be used for diff3 and diff4. 4) As a small hack, I had to add a flag datasource_opened to token.c#svn_diff__get_tokens, so I could have different behavior for regular diff vs. diff3 and diff4. If 3) gets implemented, this hack is no longer needed. 5) I've struggled with making the code readable/understandable. It's likely that there is still a lot of room for improvement. I also probably need to document it some more. 6) I've not paid too much attention to low level optimizations, so here also there's probably room for improvement, which may be significant for the critical loops. 7) There are two warnings left conversion from 'apr_off_t' to 'int', in diff_file.c#find_identical_suffix. To completely eliminate this, I should probably make all chunks of type apr_off_t instead of int (but I have not done that yet, because the original implementation also used int for the chunk in the svn_diff__file_baton_t struct). Should I do this (also in the baton struct)? Or should I use an explicit cast somewhere? 8) A bigger problem: the output of diff/blame is sometimes different from the original implementation. It's always syntactically correct, but sometimes the less unique lines are taken differently (like when a new function is added, and diff thinks the new block starts from the closing brace of the previous function, ... stuff like that). This happens only because of the identical-suffix-scanning (it doesn't happen if I only enable the identical-prefix-scanning). I think I know the cause of this change in behavior: I completely eliminate the suffix from the LCS-building algorithm. And that probably causes it to sometimes come up with another longest common subsequence. Right now I don't know how to solve this completely. A workaround might be to add a certain number of suffix-lines to the non-matching-block, so they can be part of the LCS-searching. But probably one can always come up with examples where the number of suffix-lines is not enough. I'll have to look into this some more. Ideas welcome :-). Log message: [[[ Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster. * subversion/include/svn_diff.h (svn_diff_fns_t): Added new function types datasources_open and get_prefix_lines to the vtable. * subversion/libsvn_diff/diff_memory.c (datasources_open): New function (does nothing). (get_prefix_lines): New function (does nothing). (svn_diff__mem_vtable): Added new functions datasources_open and get_prefix_lines. * subversion/libsvn_diff/diff_file.c (svn_diff__file_baton_t): Added members prefix_lines, suffix_start_chunk[4] and suffix_offset_in_chunk. (increment_pointer_or_chunk, decrement_pointer_or_chunk): New functions. (find_identical_prefix, find_identical_suffix): New fuctions. (datasources_open): New function, to open both datasources and find their identical prefix and suffix. (get_prefix_lines): New function. (datasource_get_next_token): Stop at start of identical suffix. (svn_diff__file_vtable): Added new functions datasources_open and get_prefix_lines. * subversion/libsvn_diff/diff.h (svn_diff__get_tokens): Added argument datasource_opened, to indicate that the datasource was already opened. * subversion/libsvn_diff/token.c (svn_diff__get_tokens): Added argument datasource_opened. Only open the datasource if datasource_opened is FALSE. Set the starting offset of the position list to the number of prefix lines. * subversion/libsvn_diff/diff.c (svn_diff__diff): Add a chunk of