RE: svn commit: r1054229 - /subversion/trunk/subversion/libsvn_subr/opt.c

2011-01-01 Thread Kamesh Jayachandran
-  SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2010 The Apache Software 
Foundation.\n
+  SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2011 The Apache Software 
Foundation.\n

Won't this be 2010-2011?

With regards
Kamesh Jayachandran




Re: svn commit: r1054229 - /subversion/trunk/subversion/libsvn_subr/opt.c

2011-01-01 Thread Hyrum K Wright
On Sat, Jan 1, 2011 at 10:51 AM, Kamesh Jayachandran kam...@collab.net wrote:
-  SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2010 The Apache Software 
Foundation.\n
+  SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2011 The Apache Software 
Foundation.\n

 Won't this be 2010-2011?

Nope.

-Hyrum


diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

2011-01-01 Thread Johan Corveleyn
[ Taking a privately-started discussion with danielsh to the list, in
case others have inspiration/insight about this. Question at hand: I'm
having trouble making diff3 work with prefix/suffix scanning of the
diff-optimizations-bytes branch. Any feedback is highly appreciated
:-). ]

On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
[...snip...]
 In diff3 with prefix enabled, I was seeing failures like this:

 EXPECTED:
 line1
 
 line2
 ===
 line2 modified


 ACTUAL:
 
 line1
 line2
 ===
 line1
 line2 modified


 so I assumed that when the prefix code gobbled the shared prefix.  How
 to fix it from there was purely guesswork on my part (and I haven't
 implemented it).

 These failures would go away if I prepended a line0 left and line0
 right to the file.

Yes, I know. That's indeed because the prefix-scanning code gobbled up
the identical prefix. That problem is also there in diff2.

In diff2, I fixed this by simply pre-pending a piece of common diff
to the diff linked list, in the function diff.c#svn_diff__diff.

From r1037371:
[[[
--- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
(original)
+++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
Sun Nov 21 02:36:07 2010
@@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
  svn_diff_t *diff;
  svn_diff_t **diff_ref = diff;

+  if (want_common  (original_start  1))
+{
+  /* we have a prefix to skip */
+  (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+  (*diff_ref)-type = svn_diff__type_common;
+  (*diff_ref)-original_start = 0;
+  (*diff_ref)-original_length = original_start - 1;
+  (*diff_ref)-modified_start = 0;
+  (*diff_ref)-modified_length = modified_start - 1;
+  (*diff_ref)-latest_start = 0;
+  (*diff_ref)-latest_length = 0;
+
+  diff_ref = (*diff_ref)-next;
+}
+
  while (1)
{
  if (original_start  lcs-position[0]-offset
]]]

Naively, I wanted to do the same for diff3 (along with some other
tricks, to make the algorithm aware of the prefix_lines), but it
doesn't work. See the patch in attachment.

I mean: it works partly: diff-diff3-test.exe passes with the attached
patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
merge_tests.py (61) don't. I haven't studied the test failures in
detail (I probably should, but I currently can't wrap my head around
those tests).

I'm now pondering another approach, to pre-pend an lcs piece to the
lcs linked list, and let the rest of the algorithm do its work as
normal. Inspiration for this came when I was reading the code of
diff3.c#svn_diff__resolve_conflict, where a similar trick is used
around line 121:
[[[
/* If there were matching symbols at the start of
 * both sequences, record that fact.
 */
if (common_length  0)
  {
lcs = apr_palloc(subpool, sizeof(*lcs));
lcs-next = NULL;
lcs-position[0] = start_position[0];
lcs-position[1] = start_position[1];
lcs-length = common_length;

lcs_ref = lcs-next;
  }
]]]

Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
I'm already passing the prefix_lines as a new parameter. If that would
work out, I could probably remove the special common-diff-injecting
code in diff.c#svn_diff__diff.

Thoughts?

It will take me some time/experimentation to work out the details, but
I think that should work ...

Cheers,
-- 
Johan
Index: subversion/libsvn_diff/diff3.c
===
--- subversion/libsvn_diff/diff3.c  (revision 1041582)
+++ subversion/libsvn_diff/diff3.c  (working copy)
@@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff,
 {
   svn_diff__tree_t *tree;
   svn_diff__position_t *position_list[3];
+  svn_diff_datasource_e datasource[] = {svn_diff_datasource_original,
+svn_diff_datasource_modified,
+svn_diff_datasource_latest};
   svn_diff__lcs_t *lcs_om;
   svn_diff__lcs_t *lcs_ol;
   apr_pool_t *subpool;
   apr_pool_t *treepool;
+  apr_off_t prefix_lines = 0;
 
   *diff = NULL;
 
@@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff,
 
   svn_diff__tree_create(tree, treepool);
 
+  SVN_ERR(vtable-datasources_open(diff_baton, prefix_lines, datasource, 3));
+
   SVN_ERR(svn_diff__get_tokens(position_list[0],
tree,
diff_baton, vtable,
svn_diff_datasource_original,
-   FALSE,
-   0,
+   TRUE,
+   prefix_lines,
subpool));
 
   SVN_ERR(svn_diff__get_tokens(position_list[1],
tree,
diff_baton, vtable,
svn_diff_datasource_modified,
-   

My take on the diff-optimizations-bytes branch

2011-01-01 Thread Stefan Fuhrmann

Hi Johan,

Thursday night I did something stupid and had a look at  how
svn blame could be made faster based on the HEAD code in
your branch.

One night and most of the following day later, I think I made it
a few percent faster now. Some of the code I committed directly
to /trunk and you need to pull these changes into your branch
to compile the attached patch.

Feel free to test it, take it apart and morph it any way you like --
I know the patch isn't very pretty in the first place. I tested the
patch on x64 LINUX and would like to hear whether it at least
somewhat improved performance on your system for your
svn blame config.xml use-case.

-- Stefan^2.

[[[
Speed-up of datasource_open() and its sub-routines
by a series of optimizations:

* allocate the file[] array on stack instead of heap
  (all members become directly addressible through
  the stack pointer because all static sub-functions
  will actually be in-lined)
* minor re-arragements in arithmetic expressions to
  maximize reuse of results (e.g. in INCREMENT_POINTERS)
* move hot spots to seperate functions and provide a
  specialized version for file_len == 2
  (many loops collapse to a single execution, other
  values can be hard-coded, etc.)
* use seperate loops to process data within a chunk
  so we don't need to call INCREMENT_POINTERS  friends
  that frequently
* scan / compare strings in machine-word granularity
  instead of bytes
]]]

Index: subversion/libsvn_diff/diff_file.c
===
--- subversion/libsvn_diff/diff_file.c  (revision 1054044)
+++ subversion/libsvn_diff/diff_file.c  (working copy)
@@ -46,6 +46,7 @@
 
 #include private/svn_utf_private.h
 #include private/svn_eol_private.h
+#include private/svn_adler32.h
 
 /* A token, i.e. a line read from a file. */
 typedef struct svn_diff__file_token_t
@@ -258,10 +259,10 @@
  \
 for (i = 0; i  files_len; i++)  \
 {\
-  if (all_files[i]-curp  all_files[i]-endp - 1)   \
-all_files[i]-curp++;\
+  if (all_files[i].curp + 1  all_files[i].endp) \
+all_files[i].curp++; \
   else   \
-SVN_ERR(increment_chunk(all_files[i], pool));\
+SVN_ERR(increment_chunk(all_files[i], pool));   \
 }\
   } while (0)
 
@@ -276,10 +277,10 @@
  \
 for (i = 0; i  files_len; i++)  \
 {\
-  if (all_files[i]-curp  all_files[i]-buffer) \
-all_files[i]-curp--;\
+  if (all_files[i].curp  all_files[i].buffer)   \
+all_files[i].curp--; \
   else   \
-SVN_ERR(decrement_chunk(all_files[i], pool));\
+SVN_ERR(decrement_chunk(all_files[i], pool));   \
 }\
   } while (0)
 
@@ -349,12 +350,12 @@
  * the file (this can happen while scanning backwards). This is the case if
  * one of them has chunk == -1. */
 static svn_boolean_t
-is_one_at_bof(struct file_info *file[], apr_size_t file_len)
+is_one_at_bof(struct file_info file[], apr_size_t file_len)
 {
   apr_size_t i;
 
   for (i = 0; i  file_len; i++)
-if (file[i]-chunk == -1)
+if (file[i].chunk == -1)
   return TRUE;
 
   return FALSE;
@@ -363,53 +364,152 @@
 /* Check whether one of the FILEs has its pointers at EOF (this is the case if
  * one of them has curp == endp (this can only happen at the last chunk)) */
 static svn_boolean_t
-is_one_at_eof(struct file_info *file[], apr_size_t file_len)
+is_one_at_eof(struct file_info file[], apr_size_t file_len)
 {
   apr_size_t i;
 
   for (i = 0; i  file_len; i++)
-if (file[i]-curp == file[i]-endp)
+if (file[i].curp == file[i].endp)
   return TRUE;
 
   return FALSE;
 }
 
-/* Find the prefix which is identical between all elements of the FILE array.
- * Return the number of prefix lines in PREFIX_LINES.  REACHED_ONE_EOF will be
- * set to TRUE if one of the FILEs reached its end while scanning prefix,
- * i.e. at least one file consisted entirely of prefix.  Otherwise, 
- * REACHED_ONE_EOF is set to FALSE.
- *
- * After this function is 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-01 Thread Daniel Shahaf
[ trimming CC ]

Danny Trebbien wrote on Sun, Dec 26, 2010 at 15:39:05 -0800:
 Attached are version 3 of the patch and corresponding log message.
 Also attached is a TGZ archive of two Subversion repository dumps that
 are used by a new test case.

 [[[
 Add a command line option (--source-encoding) to the svnsync init, sync, and
 copy-revprops subcommands that allows the user to specify the character
 encoding of translatable properties from the source repository. This is needed
 to allow svnsync to sync some older Subversion repositories that have
 properties that were not encoded in UTF-8.
 
 As discussed at:
   http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
   http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
   http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
 
 Around half of the changes are to rename the normalized count variables so
 that it is more clear that the counters only count line ending normalizations
 and not re-encoding normalizations. The other half of the changes are cosmetic
 (adding comments or trimming trailing whitespace in lines) or exist to pass 
 the
 argument to --source-encoding through to the functions that need it (mainly
 normalize_string() in subversion/svnsync/sync.c).
 

+1.

Refresh my memory please: did we decide in some thread that the
recodings should be counted too?

 * subversion/svnsync/main.c
   (svnsync__opt) Add svnsync_opt_source_encoding.
   (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of
 acceptable options for the init, sync, and copy-revprops subcommands.
   (svnsync_options): Add a description of the --source-encoding option.
   (opt_baton_t, subcommand_baton_t): Add the SOURCE_ENCODING field.
   (log_properties_normalized): Rename the NORMALIZED_REV_PROPS_COUNT parameter
 to LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT
 parameter to LE_NORMALIZED_NODE_PROPS_COUNT.
   (copy_revprops): Add the PROP_ENCODING parameter. Rename the 
 NORMALIZED_COUNT
 parameter to LE_NORMALIZED_COUNT.
   (make_subcommand_baton): Set the SOURCE_ENCODING field of the resulting
 subcommand_baton_t object to the value of SOURCE_ENCODING from the
 opt_baton_t object.
   (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started,
replay_rev_finished): Pass SOURCE_ENCODING to svnsync_* functions and
 copy_revprops().
   (replay_baton_t): Rename the NORMALIZED_REV_PROPS_COUNT field to
 LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT field to
 LE_NORMALIZED_NODE_PROPS_COUNT.
   (main): Handle the case when the command line option is --source-encoding.
 Set the SOURCE_ENCODING field of the opt_baton_t object to either OPT_ARG
 or NULL.
 
 * subversion/svnsync/sync.c
   (normalize_string): Rename WAS_NORMALIZED to WAS_LE_NORMALIZED to make clear
 that the counter only counts line ending normalizations. Add the ENCODING
 parameter. Handle the case when ENCODING is not NULL by calling
 svn_subst_translate_string2(). Switch to the two pools (result  scratch
 pools) pattern. Use memchr() instead of strchr() because the length of
 (*STR)-data is known.
   (svnsync_normalize_revprops): Rename NORMALIZED_COUNT to
 NORMALIZED_LE_COUNT. Add the ENCODING parameter. Move the call to
 apr_hash_set() outside of the if block (if (le_normalized)), as the
 property value may have been changed because of re-encoding even when no
 line ending was normalized.

The sentence Move.* is too much detail.  It's probably fine to just
omit it --- there is no functional change, and the details of the hunk
need no introduction.

 * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump
   A dump of a repository with the following features:
   1. The log message (`svn:log` revision property) of revision 1 has CRLF line
  endings.
   2. The log message of revision 2 has CR line endings.
   3. Revision 3 introduces an `svn:ignore` node property with CRLF line 
 endings.
   4. Revision 4 introduces a custom node property, `x:related-to`, with CRLF
  line endings.
 

I don't understand

* What is the difference between copy_bad_line_endings() and
  copy_bad_line_endings2()?

* How does copy_bad_line_endings2() relate to the subject --- which is
  non-UTF-8 log messages, as opposed to non-LF-only log messages?

  i.e., why aren't you adding a dump where the 'before' has an
  ISO-8859-1 property, and the 'after' has that property recoded to
  UTF-8?

 * 
 subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.expected.dump
   A dump of the expected result of using svnsync to sync
   copy-bad-line-endings2.dump.
 ]]]

 Index: subversion/svnsync/sync.h
 ===
 --- subversion/svnsync/sync.h (revision 1052903)
 +++ subversion/svnsync/sync.h (working copy)
 @@ -51,19 +56,25 @@