RE: svn commit: r1054229 - /subversion/trunk/subversion/libsvn_subr/opt.c
- SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2010 The Apache Software Foundation.\n + SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2011 The Apache Software Foundation.\n Won't this be 2010-2011? With regards Kamesh Jayachandran
Re: svn commit: r1054229 - /subversion/trunk/subversion/libsvn_subr/opt.c
On Sat, Jan 1, 2011 at 10:51 AM, Kamesh Jayachandran kam...@collab.net wrote: - SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2010 The Apache Software Foundation.\n + SVN_ERR(svn_cmdline_fputs(_(Copyright (C) 2011 The Apache Software Foundation.\n Won't this be 2010-2011? Nope. -Hyrum
diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning
[ Taking a privately-started discussion with danielsh to the list, in case others have inspiration/insight about this. Question at hand: I'm having trouble making diff3 work with prefix/suffix scanning of the diff-optimizations-bytes branch. Any feedback is highly appreciated :-). ] On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: [...snip...] In diff3 with prefix enabled, I was seeing failures like this: EXPECTED: line1 line2 === line2 modified ACTUAL: line1 line2 === line1 line2 modified so I assumed that when the prefix code gobbled the shared prefix. How to fix it from there was purely guesswork on my part (and I haven't implemented it). These failures would go away if I prepended a line0 left and line0 right to the file. Yes, I know. That's indeed because the prefix-scanning code gobbled up the identical prefix. That problem is also there in diff2. In diff2, I fixed this by simply pre-pending a piece of common diff to the diff linked list, in the function diff.c#svn_diff__diff. From r1037371: [[[ --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c (original) +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c Sun Nov 21 02:36:07 2010 @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs, svn_diff_t *diff; svn_diff_t **diff_ref = diff; + if (want_common (original_start 1)) +{ + /* we have a prefix to skip */ + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); + + (*diff_ref)-type = svn_diff__type_common; + (*diff_ref)-original_start = 0; + (*diff_ref)-original_length = original_start - 1; + (*diff_ref)-modified_start = 0; + (*diff_ref)-modified_length = modified_start - 1; + (*diff_ref)-latest_start = 0; + (*diff_ref)-latest_length = 0; + + diff_ref = (*diff_ref)-next; +} + while (1) { if (original_start lcs-position[0]-offset ]]] Naively, I wanted to do the same for diff3 (along with some other tricks, to make the algorithm aware of the prefix_lines), but it doesn't work. See the patch in attachment. I mean: it works partly: diff-diff3-test.exe passes with the attached patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and merge_tests.py (61) don't. I haven't studied the test failures in detail (I probably should, but I currently can't wrap my head around those tests). I'm now pondering another approach, to pre-pend an lcs piece to the lcs linked list, and let the rest of the algorithm do its work as normal. Inspiration for this came when I was reading the code of diff3.c#svn_diff__resolve_conflict, where a similar trick is used around line 121: [[[ /* If there were matching symbols at the start of * both sequences, record that fact. */ if (common_length 0) { lcs = apr_palloc(subpool, sizeof(*lcs)); lcs-next = NULL; lcs-position[0] = start_position[0]; lcs-position[1] = start_position[1]; lcs-length = common_length; lcs_ref = lcs-next; } ]]] Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where I'm already passing the prefix_lines as a new parameter. If that would work out, I could probably remove the special common-diff-injecting code in diff.c#svn_diff__diff. Thoughts? It will take me some time/experimentation to work out the details, but I think that should work ... Cheers, -- Johan Index: subversion/libsvn_diff/diff3.c === --- subversion/libsvn_diff/diff3.c (revision 1041582) +++ subversion/libsvn_diff/diff3.c (working copy) @@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff, { svn_diff__tree_t *tree; svn_diff__position_t *position_list[3]; + svn_diff_datasource_e datasource[] = {svn_diff_datasource_original, +svn_diff_datasource_modified, +svn_diff_datasource_latest}; svn_diff__lcs_t *lcs_om; svn_diff__lcs_t *lcs_ol; apr_pool_t *subpool; apr_pool_t *treepool; + apr_off_t prefix_lines = 0; *diff = NULL; @@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff, svn_diff__tree_create(tree, treepool); + SVN_ERR(vtable-datasources_open(diff_baton, prefix_lines, datasource, 3)); + SVN_ERR(svn_diff__get_tokens(position_list[0], tree, diff_baton, vtable, svn_diff_datasource_original, - FALSE, - 0, + TRUE, + prefix_lines, subpool)); SVN_ERR(svn_diff__get_tokens(position_list[1], tree, diff_baton, vtable, svn_diff_datasource_modified, -
My take on the diff-optimizations-bytes branch
Hi Johan, Thursday night I did something stupid and had a look at how svn blame could be made faster based on the HEAD code in your branch. One night and most of the following day later, I think I made it a few percent faster now. Some of the code I committed directly to /trunk and you need to pull these changes into your branch to compile the attached patch. Feel free to test it, take it apart and morph it any way you like -- I know the patch isn't very pretty in the first place. I tested the patch on x64 LINUX and would like to hear whether it at least somewhat improved performance on your system for your svn blame config.xml use-case. -- Stefan^2. [[[ Speed-up of datasource_open() and its sub-routines by a series of optimizations: * allocate the file[] array on stack instead of heap (all members become directly addressible through the stack pointer because all static sub-functions will actually be in-lined) * minor re-arragements in arithmetic expressions to maximize reuse of results (e.g. in INCREMENT_POINTERS) * move hot spots to seperate functions and provide a specialized version for file_len == 2 (many loops collapse to a single execution, other values can be hard-coded, etc.) * use seperate loops to process data within a chunk so we don't need to call INCREMENT_POINTERS friends that frequently * scan / compare strings in machine-word granularity instead of bytes ]]] Index: subversion/libsvn_diff/diff_file.c === --- subversion/libsvn_diff/diff_file.c (revision 1054044) +++ subversion/libsvn_diff/diff_file.c (working copy) @@ -46,6 +46,7 @@ #include private/svn_utf_private.h #include private/svn_eol_private.h +#include private/svn_adler32.h /* A token, i.e. a line read from a file. */ typedef struct svn_diff__file_token_t @@ -258,10 +259,10 @@ \ for (i = 0; i files_len; i++) \ {\ - if (all_files[i]-curp all_files[i]-endp - 1) \ -all_files[i]-curp++;\ + if (all_files[i].curp + 1 all_files[i].endp) \ +all_files[i].curp++; \ else \ -SVN_ERR(increment_chunk(all_files[i], pool));\ +SVN_ERR(increment_chunk(all_files[i], pool)); \ }\ } while (0) @@ -276,10 +277,10 @@ \ for (i = 0; i files_len; i++) \ {\ - if (all_files[i]-curp all_files[i]-buffer) \ -all_files[i]-curp--;\ + if (all_files[i].curp all_files[i].buffer) \ +all_files[i].curp--; \ else \ -SVN_ERR(decrement_chunk(all_files[i], pool));\ +SVN_ERR(decrement_chunk(all_files[i], pool)); \ }\ } while (0) @@ -349,12 +350,12 @@ * the file (this can happen while scanning backwards). This is the case if * one of them has chunk == -1. */ static svn_boolean_t -is_one_at_bof(struct file_info *file[], apr_size_t file_len) +is_one_at_bof(struct file_info file[], apr_size_t file_len) { apr_size_t i; for (i = 0; i file_len; i++) -if (file[i]-chunk == -1) +if (file[i].chunk == -1) return TRUE; return FALSE; @@ -363,53 +364,152 @@ /* Check whether one of the FILEs has its pointers at EOF (this is the case if * one of them has curp == endp (this can only happen at the last chunk)) */ static svn_boolean_t -is_one_at_eof(struct file_info *file[], apr_size_t file_len) +is_one_at_eof(struct file_info file[], apr_size_t file_len) { apr_size_t i; for (i = 0; i file_len; i++) -if (file[i]-curp == file[i]-endp) +if (file[i].curp == file[i].endp) return TRUE; return FALSE; } -/* Find the prefix which is identical between all elements of the FILE array. - * Return the number of prefix lines in PREFIX_LINES. REACHED_ONE_EOF will be - * set to TRUE if one of the FILEs reached its end while scanning prefix, - * i.e. at least one file consisted entirely of prefix. Otherwise, - * REACHED_ONE_EOF is set to FALSE. - * - * After this function is
Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)
[ trimming CC ] Danny Trebbien wrote on Sun, Dec 26, 2010 at 15:39:05 -0800: Attached are version 3 of the patch and corresponding log message. Also attached is a TGZ archive of two Subversion repository dumps that are used by a new test case. [[[ Add a command line option (--source-encoding) to the svnsync init, sync, and copy-revprops subcommands that allows the user to specify the character encoding of translatable properties from the source repository. This is needed to allow svnsync to sync some older Subversion repositories that have properties that were not encoded in UTF-8. As discussed at: http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020 http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518 http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550 Around half of the changes are to rename the normalized count variables so that it is more clear that the counters only count line ending normalizations and not re-encoding normalizations. The other half of the changes are cosmetic (adding comments or trimming trailing whitespace in lines) or exist to pass the argument to --source-encoding through to the functions that need it (mainly normalize_string() in subversion/svnsync/sync.c). +1. Refresh my memory please: did we decide in some thread that the recodings should be counted too? * subversion/svnsync/main.c (svnsync__opt) Add svnsync_opt_source_encoding. (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of acceptable options for the init, sync, and copy-revprops subcommands. (svnsync_options): Add a description of the --source-encoding option. (opt_baton_t, subcommand_baton_t): Add the SOURCE_ENCODING field. (log_properties_normalized): Rename the NORMALIZED_REV_PROPS_COUNT parameter to LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT parameter to LE_NORMALIZED_NODE_PROPS_COUNT. (copy_revprops): Add the PROP_ENCODING parameter. Rename the NORMALIZED_COUNT parameter to LE_NORMALIZED_COUNT. (make_subcommand_baton): Set the SOURCE_ENCODING field of the resulting subcommand_baton_t object to the value of SOURCE_ENCODING from the opt_baton_t object. (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started, replay_rev_finished): Pass SOURCE_ENCODING to svnsync_* functions and copy_revprops(). (replay_baton_t): Rename the NORMALIZED_REV_PROPS_COUNT field to LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT field to LE_NORMALIZED_NODE_PROPS_COUNT. (main): Handle the case when the command line option is --source-encoding. Set the SOURCE_ENCODING field of the opt_baton_t object to either OPT_ARG or NULL. * subversion/svnsync/sync.c (normalize_string): Rename WAS_NORMALIZED to WAS_LE_NORMALIZED to make clear that the counter only counts line ending normalizations. Add the ENCODING parameter. Handle the case when ENCODING is not NULL by calling svn_subst_translate_string2(). Switch to the two pools (result scratch pools) pattern. Use memchr() instead of strchr() because the length of (*STR)-data is known. (svnsync_normalize_revprops): Rename NORMALIZED_COUNT to NORMALIZED_LE_COUNT. Add the ENCODING parameter. Move the call to apr_hash_set() outside of the if block (if (le_normalized)), as the property value may have been changed because of re-encoding even when no line ending was normalized. The sentence Move.* is too much detail. It's probably fine to just omit it --- there is no functional change, and the details of the hunk need no introduction. * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump A dump of a repository with the following features: 1. The log message (`svn:log` revision property) of revision 1 has CRLF line endings. 2. The log message of revision 2 has CR line endings. 3. Revision 3 introduces an `svn:ignore` node property with CRLF line endings. 4. Revision 4 introduces a custom node property, `x:related-to`, with CRLF line endings. I don't understand * What is the difference between copy_bad_line_endings() and copy_bad_line_endings2()? * How does copy_bad_line_endings2() relate to the subject --- which is non-UTF-8 log messages, as opposed to non-LF-only log messages? i.e., why aren't you adding a dump where the 'before' has an ISO-8859-1 property, and the 'after' has that property recoded to UTF-8? * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.expected.dump A dump of the expected result of using svnsync to sync copy-bad-line-endings2.dump. ]]] Index: subversion/svnsync/sync.h === --- subversion/svnsync/sync.h (revision 1052903) +++ subversion/svnsync/sync.h (working copy) @@ -51,19 +56,25 @@