Re: svn commit: r997203 - in /subversion/trunk: ./ subversion/include/ subversion/libsvn_client/ subversion/libsvn_diff/ subversion/libsvn_fs_fs/ subversion/libsvn_ra_svn/ subversion/libsvn_repos/ sub

2010-09-26 Thread Stefan Fuhrmann


 On 15.09.2010 09:56, Hyrum K. Wright wrote:

On Wed, Sep 15, 2010 at 9:45 AM, Stefan Sperlings...@elego.de  wrote:

On Wed, Sep 15, 2010 at 06:52:07AM -, hwri...@apache.org wrote:

==
--- subversion/trunk/subversion/include/svn_string.h (original)
+++ subversion/trunk/subversion/include/svn_string.h Wed Sep 15 06:52:06 2010
@@ -253,6 +253,15 @@ svn_stringbuf_chop(svn_stringbuf_t *str,
  void
  svn_stringbuf_fillchar(svn_stringbuf_t *str, unsigned char c);

+/** Append a single character @a byte onto @a targetstr.
+ *
+ * reallocs if necessary. @a targetstr is affected, nothing else is.
+ * @since New in 1.7.
+ */
+void
+svn_stringbuf_appendbyte(svn_stringbuf_t *targetstr,
+ char byte);
+

The docstring should list advantages svn_stringbuf_appendbyte(buf, c)
has over svn_stringbuf_appendbytes(buf,c, 1). We need to understand
where the performance benefits really come from. IIRC the benefit was
dependent on the optimizer in the compiler to some extent, is this correct?

I don't think that's completely correct, but I'll leave it to Stefan
F. to determine that.

Since this function has now been merged to trunk, updates to the doc
string should be done on trunk.  But I feel comfortable allowing
Stefan to make these docstring edits and commit without prior review,
instead of requiring him to mail the patch to the list.

-Hyrum


(Almost) done so in r1001413. Because I had to move
one statement to make the rationals easier to explain,
I didn't commit it on the trunk directly. It's a very minor
change but I didn't feel comfortable to put an approved
by: Hyrum in the commit message ...

-- Stefan^2.



Re: object-model: Wrapping Subversion C-structs in C++

2010-09-26 Thread Stefan Fuhrmann
 Hyrum K. Wright hyrum_wright_at_mail.utexas.edu 
mailto:hyrum_wright_at_mail.utexas.edu?Subject=Re:%20object-model:%20Wrapping%20Subversion%20C-structs%20in%20C%2B%2B 
wrote:



For the C++ folks out there, I've got a question about an approach to
take on the object-model branch. At issue is how to wrap the various
C structures returned to callers, particularly in a backward
compatible manner. Currently, I'm looking at svn_wc_notify_t *. As I
see it, there are a few options:

1) Just wrap the pointer to the C struct as a member of the wrapper 
class.

Pros: Easy to implement; lightweight constructor.
Cons: Getters would need to translate to C++ types; would need to
implement a copy constructor which deep copies the C struct; would
also introduce pools, since creating and duplicating C structs
requires them.

2) Wrap each C struct member individually
Pros: C-C++ complexity is constrained to the constructor,
everything else is C++ types
Cons: Hard to extend for future compatibility

3) Just pass the C-struct pointer around; don't even bother with a class
Pros: Dead simple.
Cons: Requires more memory management thought by consumers; not
C++-y enough; may introduce wrapping difficulties.

I'd like to come up with something consistent, which would be used
throughout the C++ bindings. I'm also interested in a solution which
ensures the C++ bindings can be used as the basis for other
object-oriented bindings models (Python, Perl, etc.)

Thoughts? 


One issue that has not been talked about in this thread
is strong typing. If you remember the problems with
Johan's diff / blame optimizations, the reason behind
it was a confusion of type semantics. Some ints were
line numbers, others were file offsets. But there was / is
no formal way to tell them apart.

Since you decided to use templates in your code, I
thought I would give it a try and design a simple template
class that allows you to define any number of int-like
types that are mutually distinct and require explicit
conversion.

It would be nice to have the C++ wrappers use these
types instead of plain ints etc. in their signatures.

-- Stefan^2.

// TypedInts.cpp : Defines the entry point for the console application.
//

#include stdafx.h

// extend that enum to define further types / kinds of integers

enum IntegerTypes
{
itLineNumber,
itFileOffset,
itRevision
};

// a type selection utility struct
//   (X, Y, true)  - X
//   (X, Y, false) - Y

templateclass First, class Second, bool get_first
struct SelectType
{
};

templateclass First, class Second
struct SelectTypeFirst, Second, true
{
typedef typename First type;
};

templateclass First, class Second
struct SelectTypeFirst, Second, false
{
typedef typename Second type;
};

// a utility struct mapping a (potentially unsigned) integer 
// to the corresponding signed integer.

templateclass T
struct DiffType
{
typedef typename T type;
};

template
struct DiffTypeunsigned char
{
typedef char type;
};

template
struct DiffTypeunsigned short
{
typedef short type;
};

template
struct DiffTypeunsigned
{
typedef int type;
};

template
struct DiffTypeunsigned long
{
typedef long type;
};

template
struct DiffTypeunsigned long long
{
typedef long long type;
};

// A typed integer:
//   V .. base integer type (e.g. unsigned)
//   T .. formal classification, i.e. this actually separates the int types
//   diff_type .. if true, this is the difference type
//if false, this is the absolute value type
//TypedInt(X,Y,false) - TypedInt(X,Y,false) - 
TypedInt(X,Y,true)
//
// The arithmetics, conversions and getters have been carefully
// designed that only meaningful combinations of arguments are
// valid and everything else will be rejected by the compiler.
//
// In optimized code, this class does not impose any runtime overhead 
// over plain use of built-in types.

templateclass V, IntegerTypes T, bool diff_type
class TypedInt
{
public:

// encapsulated int types

typedef typename V absolute_value_t;
typedef typename DiffTypeV::type diff_value_t;
typedef typename SelectTypediff_value_t, 
absolute_value_t, 
diff_type::type value_t;

// typed integers

typedef typename TypedIntV, T, false absolute_t;
typedef typename TypedIntV, T, true diff_t;
typedef typename TypedIntV, T, diff_type this_t;

// expose template parameters

enum {type = T, is_diff = diff_type};

// construction, auto-conversion

TypedInt()
: value()
{
}

TypedInt(value_t value)
: value(value)
{
}

// data access

value_t get()
{
return value;
}

const value_t get() const
{
return value;
}

value_t* operator()
{
return value;
}

const value_t* operator() const
{
return value;
}

// assignment

TypedInt operator=(value_t rhs)

Re: Do we want 'svn patch' to be able to add empty files?

2010-09-26 Thread Daniel Näslund
On Wed, Sep 01, 2010 at 06:37:08PM +0100, Julian Foad wrote:
 Daniel Näslund wrote:
  The question is: Do we want 'svn patch to be able to add empty files.

 If add an empty file is a well defined operation in Git-diff syntax,
 then yes, it would be good to support it.  As well as delete this empty
 file and any other valid Subversion operations that aren't yet
 supported.

Support for deleting empty files was added in r1001493. 

There's a lot of other interresting stuff in this thread (adding support
for dirs in the git format and properly applying symlinks). Will take a
stab at those in the close future.

Daniel


Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Greg Stein
On Sat, Sep 25, 2010 at 21:07, Dani Church dchu...@cheri.shyou.org wrote:
...
 You're certainly right, it is a big patch-- updating all of the
 SVN_DEPRECATED macros is around a 100KB patch.  I went through and updated
 them all based on the @deprecated tag version, and for the functions that
 were deprecated by a similarly-named one (like svn_client_info =
 svn_client_info2) or that were related to a deprecated type (like
 svn_client_commit_item_create = svn_client_commit_item3_create) I also
 checked it against the version where the newer function or type was actually
 introduced, and updated the doc comment where there was a mismatch.  It's
 still possible that some functions got the wrong SVN_DEPRECATED tag, but
 that's hardly the end of the world; it just means that sometimes the
 compiler won't spit out the right deprecation warnings if you've set
 SVN_TARGET_API.

 I also looked into the other idea I mentioned, tuning the API to give errors
 when you use a function or a type that was introduced in a later API than
 the one you're targeting.  I used ctags and a whole bunch of text processing
 to determine which functions, types, defines, and members were introduced in
 which API version, and put
 #if SVN_TARGET_API = x ... #endif blocks around all of them.  I can't say
 that the notation was absolutely perfect, but there is this: gcc will
 compile all of the headers without error using any SVN_TARGET_API from 0 to
 7, and using 7 (or omitting it entirely) produces a set of preprocessed
 headers that are identical to the originals, so at the very least anyone not
 using API targeting won't be affected by the changes.  The patch for this is
 a lot bigger, about 500KB.

 I may have gone a bit overboard.

 Anyway, I have a couple patches here if anyone wants to look at them, but
 they're probably a bit large to attach to an email.  What should I do with
 them?

First off, if you could somehow extract the patch for fixing the
comments' @deprecated statements, that would be awesome. Fixing those
is a no-brainer and introduces no extra complexity.

Switching from USE_SVN_API to SVN_TARGET_API is good. I was going to
suggest a Proper SVN_ prefix, and woot! You did that already.

Wrapping declarations with #if SVN_TARGET_API is a non-starter, I
believe. That sounds like it would obfuscate the headers a bit too
much. In your original email, you mentioned something about gcc
poison which made it sound like you could flag certain declarations
to gcc as bad to use. I just looked up that poison stuff, and I
don't see how we can magically work that into the SVN_DEPRECATED
macros that occur in each declaration (since you have to list the
function name in there). The closest that I could see would be
something like:

SVN_DEPRECATED_5
svn_error_t *
svn_wc_some_function(arguments,
more argments);
SVN_MAYBE_POISON_5(svn_wc_some_function);

Where the latter macro expands to: _Pragma(GCC poison
svn_wc_some_function);  (I think that is the right syntax)

But again, that will add even more cruft around the declarations which
I'm not sure will be acceptable (gotta see what people think).

It is also possible to simply deprecate the future APIs rather than
declare them poisoned, although it means adding an SVN_DEPRECATED_? to
the true API. Or maybe we could add SVN_CURRENT (or somesuch) to the
current APIs, which become deprecated if you target an older API.

Cheers,
-g


Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Daniel Shahaf
Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400:
 SVN_DEPRECATED_5
 svn_error_t *
 svn_wc_some_function(arguments,
 more argments);
 SVN_MAYBE_POISON_5(svn_wc_some_function);
 
 Where the latter macro expands to: _Pragma(GCC poison
 svn_wc_some_function);  (I think that is the right syntax)
 
 But again, that will add even more cruft around the declarations which
 I'm not sure will be acceptable (gotta see what people think).

Adding the SVN_MAYBE_POISON(svn_wc_some_function) can be done
mechanically (by a script that reads the @deprecated comments and
transforms the header files accordingly) --- instead of editing all
headers now, we could provide a knob for people to run this script
during 'make install'.


Re: svnrdump tool

2010-09-26 Thread Daniel Shahaf
Ramkumar Ramachandra wrote on Sun, Sep 26, 2010 at 01:48:18 +0530:
 Hi Daniel,
 
 Daniel Shahaf writes:
  Daniel Shahaf wrote on Sat, Sep 25, 2010 at 16:49:08 +0200:
 0:/tmp/svn% $svnrdump dump file://$PWD/r1/trunk/A/B 21 | grep 
   Node-path
 Node-path: trunk
 Node-path: trunk/A
 Node-path: trunk/A/B
 Node-path: trunk/A/B/E
 Node-path: trunk/A/B/E/alpha
 Node-path: trunk/A/B/E/beta
 Node-path: trunk/A/B/F
 Node-path: trunk/A/B/lambda
   
   Nice :-)
   
  
  Is it?  If I try to dump a subdir, then I /shouldn't/ get a Create
  /trunk entry in the dumpfile... instead, it should be the user's
  business to create that directory in the target repository out of band,
  before loading the dump into it.
 
 svnrdump behaves exactly like svnsync in this manner.

svnsync allows you to sync a subdir of a repository (i.e.,
  svnsync $REPOS/trunk/A/B $MIRROR
), but does it also create /trunk/A/B in the mirror?

But for now I still think that svnrdump invocation I quoted above
shouldn't have outputted a 'create /trunk' entry in the dumpfile :-).
@all, what do you think?

 See the tests
 only_trunk_dump, only_trunk_A_with_changes_dump, and
 descend_into_replace_dump that I imported from svnsync. They all
 exercise this subdirectory feature. Maybe the tests are not
 exhaustive?
 
 -- Ram


[PATCH] fix @deprecated tags in header comments

2010-09-26 Thread Dani Church
This patch is a fix for some of the @deprecated tags in the Doxygen 
comments.  It standardizes them so that the phrasing Provided for 
backward compatibility with the 1.x API refers to the LAST version for 
which the function was valid rather than the first.


[[[
* subversion/include/svn_fs.h,
  subversion/include/svn_diff.h,
  subversion/include/svn_mergeinfo.h,
  subversion/include/svn_repos.h,
  subversion/include/svn_io.h,
  subversion/include/svn_wc.h,
  subversion/include/svn_client.h:
  Fix @deprecated tags in Doxygen comments.
]]]
Index: subversion/include/svn_fs.h
===
--- subversion/include/svn_fs.h (revision 1001135)
+++ subversion/include/svn_fs.h (working copy)
@@ -533,7 +533,7 @@
 /**
  * Same as svn_fs_access_add_lock_token2(), but with @a path set to value 1.
  *
- * @deprecated Provided for backward compatibility with the 1.1 API.
+ * @deprecated Provided for backward compatibility with the 1.5 API.
  */
 
 SVN_DEPRECATED
Index: subversion/include/svn_diff.h
===
--- subversion/include/svn_diff.h   (revision 1001135)
+++ subversion/include/svn_diff.h   (working copy)
@@ -537,7 +537,7 @@
 /** Similar to svn_diff_file_output_unified3(), but with @a relative_to_dir
  * set to NULL and @a show_c_function to false.
  *
- * @deprecated Provided for backwards compatibility with the 1.3 API.
+ * @deprecated Provided for backwards compatibility with the 1.4 API.
  */
 SVN_DEPRECATED
 svn_error_t *
Index: subversion/include/svn_mergeinfo.h
===
--- subversion/include/svn_mergeinfo.h  (revision 1001135)
+++ subversion/include/svn_mergeinfo.h  (working copy)
@@ -226,7 +226,7 @@
 
 /** Like svn_mergeinfo_remove2, but always considers inheritance.
  *
- * @deprecated Provided for backward compatibility with the 1.5 API.
+ * @deprecated Provided for backward compatibility with the 1.6 API.
  */
 SVN_DEPRECATED
 svn_error_t *
@@ -310,7 +310,7 @@
 
 /** Like svn_mergeinfo_intersect2, but always considers inheritance.
  *
- * @deprecated Provided for backward compatibility with the 1.5 API.
+ * @deprecated Provided for backward compatibility with the 1.6 API.
  */
 SVN_DEPRECATED
 svn_error_t *
Index: subversion/include/svn_repos.h
===
--- subversion/include/svn_repos.h  (revision 1001135)
+++ subversion/include/svn_repos.h  (working copy)
@@ -2556,7 +2556,7 @@
  * Similar to svn_repos_load_fs2(), but with @a use_pre_commit_hook and
  * @a use_post_commit_hook always @c FALSE.
  *
- * @deprecated Provided for backward compatibility with the 1.0 API.
+ * @deprecated Provided for backward compatibility with the 1.1 API.
  */
 SVN_DEPRECATED
 svn_error_t *
Index: subversion/include/svn_io.h
===
--- subversion/include/svn_io.h (revision 1001135)
+++ subversion/include/svn_io.h (working copy)
@@ -289,7 +289,7 @@
 
 /** Like svn_io_open_unique_file2, but can't delete on pool cleanup.
  *
- * @deprecated Provided for backward compatibility with the 1.0 API
+ * @deprecated Provided for backward compatibility with the 1.3 API
  *
  * @note In 1.4 the API was extended to require either @a f or
  *   @a unique_name_p (the other can be NULL).  Before that, both were
@@ -1395,6 +1395,7 @@
  * structures instead of svn_io_dirent2_t and with only a single pool.
  *
  * @since New in 1.3.
+ * @deprecated Provided for backward compatibility with the 1.6 API.
  */
 SVN_DEPRECATED
 svn_error_t *
Index: subversion/include/svn_wc.h
===
--- subversion/include/svn_wc.h (revision 1001135)
+++ subversion/include/svn_wc.h (working copy)
@@ -3315,7 +3315,7 @@
 /**
  * Similar to svn_wc_walk_entries2(), but without cancellation support.
  *
- * @deprecated Provided for backward compatibility with the 1.0 API.
+ * @deprecated Provided for backward compatibility with the 1.1 API.
  */
 SVN_DEPRECATED
 svn_error_t *
@@ -4126,7 +4126,7 @@
  * instead of #svn_wc_status_func3_t.
  *
  * @since New in 1.5.
- * @deprecated Provided for backward compatibility with the 1.4 API.
+ * @deprecated Provided for backward compatibility with the 1.5 API.
  */
 SVN_DEPRECATED
 svn_error_t *
@@ -4822,7 +4822,7 @@
  * Similar to svn_wc_resolved_conflict2(), but takes an
  * svn_wc_notify_func_t and doesn't have cancellation support.
  *
- * @deprecated Provided for backward compatibility with the 1.0 API.
+ * @deprecated Provided for backward compatibility with the 1.1 API.
  */
 SVN_DEPRECATED
 svn_error_t *
@@ -4995,7 +4995,7 @@
  *
  * @since New in 1.5.
  *
- * @deprecated Provided for backwards compatibility with the 1.5 API.
+ * @deprecated Provided for backwards compatibility with the 1.6 API.
  */
 SVN_DEPRECATED
 svn_error_t *
@@ -5825,6 +5825,7 @@
  

Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Daniel Shahaf
Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400:
 Wrapping declarations with #if SVN_TARGET_API is a non-starter, I
 believe. That sounds like it would obfuscate the headers a bit too
 much. In your original email, you mentioned something about gcc
 poison which made it sound like you could flag certain declarations
 to gcc as bad to use.

What advantages does 'poison' have over, you know, attempting to compile
an application against an old set of headers?

gcc -I 
https://svn.apache.org/repos/asf/subversion/tags/1.4.0/subversion/include/


Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Dani Church

On Sun, 26 Sep 2010, Greg Stein wrote:


First off, if you could somehow extract the patch for fixing the
comments' @deprecated statements, that would be awesome. Fixing those
is a no-brainer and introduces no extra complexity.


Done, I sent that out in a separate email.


Wrapping declarations with #if SVN_TARGET_API is a non-starter, I
believe. That sounds like it would obfuscate the headers a bit too
much.


In some senses, I'd say yes-- it does make the headers more verbose,
certainly.  But I also feel there are cases where it actually clarifies
things a bit, like when structs get extended; IMHO seeing

struct svn_some_type {
  int x;
  int y;

#if SVN_TARGET_API = 2
  int x2;
  int y2;
#endif

#if SVN_TARGET_API = 5
  void *baton;
#endif
}

makes it very clear which parts of the struct were introduced in which
versions.  Yes, the information is duplicated in the Doxygen comments,
but I feel this draws the eye more easily.

That being said, it is a half-megabyte patch, and even given the
overhead of unified diff format, that's a whole bunch of extra lines of
code.


In your original email, you mentioned something about gcc
poison which made it sound like you could flag certain declarations
to gcc as bad to use. I just looked up that poison stuff, and I
don't see how we can magically work that into the SVN_DEPRECATED
macros that occur in each declaration (since you have to list the
function name in there). The closest that I could see would be
something like:

SVN_DEPRECATED_5
svn_error_t *
svn_wc_some_function(arguments,
   more argments);
SVN_MAYBE_POISON_5(svn_wc_some_function);

Where the latter macro expands to: _Pragma(GCC poison
svn_wc_some_function);  (I think that is the right syntax)

But again, that will add even more cruft around the declarations which
I'm not sure will be acceptable (gotta see what people think).


I thought about that, and that's pretty much what SVN_POISON does for
GCC; however, I believe that if you poison an identifier, it's best not
to have it appear anywhere in the included source.  Also, the way I
implemented the SVN_POISON() macro uses a different semantic for non-GCC
compilers; it generates a function prototype that looks like this:

svn_api_mismatch *svn_wc_some_function(svn_api_mismatch *);

Where svn_api_mismatch is an opaque struct defined at the same time as
SVN_POISON.  That makes it so that any calls to the function will give
argument type warnings and make some amount of sense.


It is also possible to simply deprecate the future APIs rather than
declare them poisoned, although it means adding an SVN_DEPRECATED_? to
the true API. Or maybe we could add SVN_CURRENT (or somesuch) to the
current APIs, which become deprecated if you target an older API.


I think this may in fact end up being the best idea; I'd actually
suggest a slightly different format for the SVN_DEPRECATED_? macros than
I originally used.  Rather than noting the API version in which a
function was deprecated, instead note the versions in which it was
active, like the following:

SVN_API_SINCE_5
svn_error_t *
svn_client_update3([...]);

SVN_API_SINCE_2
SVN_API_UNTIL_4
svn_error_t *
svn_client_update2([...]);

SVN_API_UNTIL_1
svn_error_t *
svn_client_update([...]);

This has the benefit of being directly in line with the @since and
@deprecated tags in the Doxygen comments.

Anyway, that's probably more than enough for this email, I think.  Is it
any surprise that I don't have any issue with verbose header files? :)

-Dani Church


Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Dani Church

On Sun, 26 Sep 2010, Daniel Shahaf wrote:


Adding the SVN_MAYBE_POISON(svn_wc_some_function) can be done
mechanically (by a script that reads the @deprecated comments and
transforms the header files accordingly) --- instead of editing all
headers now, we could provide a knob for people to run this script
during 'make install'.


I toyed around with that kind of idea, but, having just done a lot of 
partially-automated text processing on these files, I know just how hard 
it is to get it right-- I wouldn't want to leave that to an automated 
script.  Also, I'm not a big fan of having to mangle the header files 
before they're usable-- I'd rather be able to point my GCC include path 
straight at my working copy.


-Dani Church


Re: [RFC] Preprocessor flags to target code to a specific API version

2010-09-26 Thread Dani Church

On Sun, 26 Sep 2010, Daniel Shahaf wrote:


Greg Stein wrote on Sun, Sep 26, 2010 at 16:01:53 -0400:

Wrapping declarations with #if SVN_TARGET_API is a non-starter, I
believe. That sounds like it would obfuscate the headers a bit too
much. In your original email, you mentioned something about gcc
poison which made it sound like you could flag certain declarations
to gcc as bad to use.


What advantages does 'poison' have over, you know, attempting to compile
an application against an old set of headers?

gcc -I 
https://svn.apache.org/repos/asf/subversion/tags/1.4.0/subversion/include/


The advantage is that if you're working on an open-source project for 
which the policy is we want to support any version of libsvn_client from 
1.4 onwards, you can make sure that no one submits code that breaks the 
policy by associating the codebase itself with the 1.4 API, rather than 
having to tell everyone to download the 1.4.0 headers and change their 
include path.  There are a couple other reasons why I like the idea, but 
pinning a project to a given API is the big one.


-Dani Church


Re: place of svnrdump

2010-09-26 Thread Neels J Hofmeyr
On 2010-09-25 14:43, Daniel Shahaf wrote:
 Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530:
 Agreed, these modules should not be part of the core. However, in the
 case of Subversion, there absolutely NO way to get/ back up the
 revision history data* [5].
 
 svnsync.

On a side note, svnsync happens to be relatively slow. I tried to svnsync
the ASF repos once (for huge test data). The slowness of svnsync made it
practically unfeasible to pull off. I ended up downloading a zipped dump and
'svnadmin load'ing that dump. Even with a zipped dump already downloaded,
'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions.
(And someone rebooted that box after two days, halfway through, grr. Took
some serious hacking to finish up without starting over.)

So, that experience tells me that svnsync and svnadmin dump/load aren't
close to optimal, for example compared to a straight download of 34 gigs
that the ASF repos is... Anything that could speed up a remote dump/load
process would probably be good -- while I don't know any details about svnrdump.

My two cents: Rephrasing everything into the dump format and back blows up
both data size and ETA. Maybe a remote backup mechanism could even break
loose from discrete revision boundaries during transfer/load...

In contrast, the speed of a remote 'svn log' just amazes me. It's pretty
darn fast to get all the commit logs of a repos. So between that and getting
the rev content as well there's some big speed loss.

Heh, that's my reply to a single-word statement ;)

~Neels


P.S.: If the whole ASF repos were a single Git WC, how long would that take
to pull? (Given that Git tends to take up much more space than a Subversion
repos, I wonder.)



signature.asc
Description: OpenPGP digital signature


[WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-09-26 Thread Johan Corveleyn
Hi devs,

As discussed in [1], here is a patch that makes svn_diff_diff
(libsvn_diff/diff.c) skip the identical prefix and suffix of the
original and modified files, before starting the LCS (longest common
subsequence) algorithm on the non-matching part.

This makes diff a lot faster (especially for large files), thereby
also speeding up blame in environments where the client is the
bottleneck (with a fast enough server and network, blame is
constrained by the speed of diff on the client side).

This patch still has some rough edges (see below), hence the WIP
marker. Nevertheless, it works most of the time, and it can
demonstrate the speed improvement that can be expected. I will
continue working on the rough edges, but in the meantime any
feedback/review/thoughts are very much appreciated.

The rough edges:
1) While scanning through the identical prefix, I have to count the
number of lines (to have correct line offsets for the rest of the
algorithm). To do that, I currently count the number of \n's, so it
won't work for files with \r eol-style. Not so difficult to fix
(similar to how diff_file.c#datasource_get_next_token does it), but I
haven't done it yet.

2) Two tests still fail:
- blame_tests.py 2: annotate a binary file
- blame_tests.py 7: blame with different eol styles
Maybe this is because of 1), I'll have to investigate.

3) It's only implemented for 2 files. I'd like to generalize this for
an array of datasources, so it can also be used for diff3 and diff4.

4) As a small hack, I had to add a flag datasource_opened to
token.c#svn_diff__get_tokens, so I could have different behavior for
regular diff vs. diff3 and diff4. If 3) gets implemented, this hack is
no longer needed.

5) I've struggled with making the code readable/understandable. It's
likely that there is still a lot of room for improvement. I also
probably need to document it some more.

6) I've not paid too much attention to low level optimizations, so
here also there's probably room for improvement, which may be
significant for the critical loops.

7) There are two warnings left conversion from 'apr_off_t' to 'int',
in diff_file.c#find_identical_suffix. To completely eliminate this, I
should probably make all chunks of type apr_off_t instead of int
(but I have not done that yet, because the original implementation
also used int for the chunk in the svn_diff__file_baton_t struct).
Should I do this (also in the baton struct)? Or should I use an
explicit cast somewhere?

8) A bigger problem: the output of diff/blame is sometimes different
from the original implementation. It's always syntactically correct,
but sometimes the less unique lines are taken differently (like when
a new function is added, and diff thinks the new block starts from the
closing brace of the previous function, ... stuff like that). This
happens only because of the identical-suffix-scanning (it doesn't
happen if I only enable the identical-prefix-scanning).

I think I know the cause of this change in behavior: I completely
eliminate the suffix from the LCS-building algorithm. And that
probably causes it to sometimes come up with another longest common
subsequence. Right now I don't know how to solve this completely. A
workaround might be to add a certain number of suffix-lines to the
non-matching-block, so they can be part of the LCS-searching. But
probably one can always come up with examples where the number of
suffix-lines is not enough. I'll have to look into this some more.
Ideas welcome :-).

Log message:
[[[
Make svn_diff_diff skip identical prefix and suffix to make diff and blame
faster.

* subversion/include/svn_diff.h
  (svn_diff_fns_t): Added new function types datasources_open and
   get_prefix_lines to the vtable.

* subversion/libsvn_diff/diff_memory.c
  (datasources_open): New function (does nothing).
  (get_prefix_lines): New function (does nothing).
  (svn_diff__mem_vtable): Added new functions datasources_open and
   get_prefix_lines.

* subversion/libsvn_diff/diff_file.c
  (svn_diff__file_baton_t): Added members prefix_lines, suffix_start_chunk[4]
   and suffix_offset_in_chunk.
  (increment_pointer_or_chunk, decrement_pointer_or_chunk): New functions.
  (find_identical_prefix, find_identical_suffix): New fuctions.
  (datasources_open): New function, to open both datasources and find their
   identical prefix and suffix.
  (get_prefix_lines): New function.
  (datasource_get_next_token): Stop at start of identical suffix.
  (svn_diff__file_vtable): Added new functions datasources_open and
   get_prefix_lines.

* subversion/libsvn_diff/diff.h
  (svn_diff__get_tokens): Added argument datasource_opened, to indicate that
   the datasource was already opened.

* subversion/libsvn_diff/token.c
  (svn_diff__get_tokens): Added argument datasource_opened. Only open the
   datasource if datasource_opened is FALSE. Set the starting offset of the
   position list to the number of prefix lines.

* subversion/libsvn_diff/diff.c
  (svn_diff__diff): Add a chunk of