Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-28 Thread Gavin Baumanis
Thanks Hyrum,

Appreciate the update.
I'll remove it from my "watch" list.


On 28/03/2011, at 11:58 PM, Hyrum K Wright wrote:

> Actually, it did.  Mike committed it in r1084335, and reported that
> fact in this thread on Mar 22.
> 
> Best,
> -Hyrum
> 
> On Sun, Mar 27, 2011 at 7:27 PM, Gavin Baumanis  
> wrote:
>> Ping.
>> This patch submission has received no comments.
>> 
>> 
>> 
>> On 22/03/2011, at 8:27 AM, Danny Trebbien wrote:
>> 
>>> On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
>>>  wrote:
 Danny Trebbien  writes:
> Attached are the patch, the log message, and the two TGZ archives of
> DUMP files (for the tests).
 
 The patch is missing.
 
 --
 Philip
>>> 
>>> Sorry.  Attached is the patch with a .txt extension.
>>> 
>> 
>> 

Gavin "Beau" Baumanis
E: gav...@thespidernet.com




Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-28 Thread Hyrum K Wright
Actually, it did.  Mike committed it in r1084335, and reported that
fact in this thread on Mar 22.

Best,
-Hyrum

On Sun, Mar 27, 2011 at 7:27 PM, Gavin Baumanis  wrote:
> Ping.
> This patch submission has received no comments.
>
>
>
> On 22/03/2011, at 8:27 AM, Danny Trebbien wrote:
>
>> On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
>>  wrote:
>>> Danny Trebbien  writes:
 Attached are the patch, the log message, and the two TGZ archives of
 DUMP files (for the tests).
>>>
>>> The patch is missing.
>>>
>>> --
>>> Philip
>>
>> Sorry.  Attached is the patch with a .txt extension.
>> 
>
>


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-27 Thread Gavin Baumanis
Ping.
This patch submission has received no comments.



On 22/03/2011, at 8:27 AM, Danny Trebbien wrote:

> On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
>  wrote:
>> Danny Trebbien  writes:
>>> Attached are the patch, the log message, and the two TGZ archives of
>>> DUMP files (for the tests).
>> 
>> The patch is missing.
>> 
>> --
>> Philip
> 
> Sorry.  Attached is the patch with a .txt extension.
> 



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-22 Thread C. Michael Pilato
On 03/22/2011 08:41 PM, Danny Trebbien wrote:
>> Finished committing this in r1084335, though pieces of it were committed in
>> prior revisions as they weren't strictly part of this logical change.
>>
>> --
>> C. Michael Pilato 
>> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 
> 
> Thank you so much!!
> 
> Issue 3817 (http://subversion.tigris.org/issues/show_bug.cgi?id=3817)
> can be closed now.

Done.

-- 
C. Michael Pilato 
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-22 Thread Danny Trebbien
> Finished committing this in r1084335, though pieces of it were committed in
> prior revisions as they weren't strictly part of this logical change.
>
> --
> C. Michael Pilato 
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Thank you so much!!

Issue 3817 (http://subversion.tigris.org/issues/show_bug.cgi?id=3817)
can be closed now.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-22 Thread C. Michael Pilato
On 03/22/2011 02:24 PM, C. Michael Pilato wrote:
> On 03/21/2011 05:27 PM, Danny Trebbien wrote:
>> On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
>>  wrote:
>>> Danny Trebbien  writes:
 Attached are the patch, the log message, and the two TGZ archives of
 DUMP files (for the tests).
>>>
>>> The patch is missing.
>>>
>>> --
>>> Philip
>>
>> Sorry.  Attached is the patch with a .txt extension.
> 
> I'm looking at the patch now.

Finished committing this in r1084335, though pieces of it were committed in
prior revisions as they weren't strictly part of this logical change.

-- 
C. Michael Pilato 
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-22 Thread C. Michael Pilato
On 03/21/2011 05:27 PM, Danny Trebbien wrote:
> On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
>  wrote:
>> Danny Trebbien  writes:
>>> Attached are the patch, the log message, and the two TGZ archives of
>>> DUMP files (for the tests).
>>
>> The patch is missing.
>>
>> --
>> Philip
> 
> Sorry.  Attached is the patch with a .txt extension.

I'm looking at the patch now.

-- 
C. Michael Pilato 
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-21 Thread Danny Trebbien
On Mon, Mar 21, 2011 at 10:20 AM, Philip Martin
 wrote:
> Danny Trebbien  writes:
>> Attached are the patch, the log message, and the two TGZ archives of
>> DUMP files (for the tests).
>
> The patch is missing.
>
> --
> Philip

Sorry.  Attached is the patch with a .txt extension.
Index: subversion/svnsync/sync.h
===
--- subversion/svnsync/sync.h   (revision 1082876)
+++ subversion/svnsync/sync.h   (working copy)
@@ -33,15 +33,20 @@ extern "C" {
 #include "svn_delta.h"
 
 
-/* Normalize the line ending style of the values of properties in REV_PROPS
- * that "need translation" (according to svn_prop_needs_translation(),
- * currently all svn:* props) so that they contain only LF (\n) line endings.
- * The number of properties that needed normalization is returned in
+/* Normalize the encoding and line ending style of the values of properties
+ * in REV_PROPS that "need translation" (according to
+ * svn_prop_needs_translation(), which is currently all svn:* props) so that
+ * they are encoded in UTF-8 and contain only LF (\n) line endings.
+ *
+ * The number of properties that needed line ending normalization is returned 
in
  * *NORMALIZED_COUNT.
+ *
+ * No re-encoding is performed if SOURCE_PROP_ENCODING is NULL.
  */
 svn_error_t *
 svnsync_normalize_revprops(apr_hash_t *rev_props,
int *normalized_count,
+   const char *source_prop_encoding,
apr_pool_t *pool);
 
 
@@ -51,15 +56,21 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
  * the commit.  TO_URL is the URL of the root of the repository into
  * which the commit is being made.
  *
+ * If SOURCE_PROP_ENCODING is NULL, then property values are presumed to be
+ * encoded in UTF-8 and are not re-encoded. Otherwise, the property values are
+ * presumed to be encoded in SOURCE_PROP_ENCODING, and are normalized to UTF-8.
+ *
  * As the sync editor encounters property values, it might see the need to
- * normalize them (to LF line endings). Each carried out normalization adds 1
- * to the *NORMALIZED_NODE_PROPS_COUNTER (for notification).
+ * normalize them (re-encode and/or change to LF line endings). Each 
carried-out
+ * line ending normalization adds 1 to the *NORMALIZED_NODE_PROPS_COUNTER
+ * (for notification).
  */
 svn_error_t *
 svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
 void *wrapped_edit_baton,
 svn_revnum_t base_revision,
 const char *to_url,
+const char *source_prop_encoding,
 svn_boolean_t quiet,
 const svn_delta_editor_t **editor,
 void **edit_baton,
Index: subversion/svnsync/main.c
===
--- subversion/svnsync/main.c   (revision 1082876)
+++ subversion/svnsync/main.c   (working copy)
@@ -61,6 +61,7 @@ enum svnsync__opt {
   svnsync_opt_sync_password,
   svnsync_opt_config_dir,
   svnsync_opt_config_options,
+  svnsync_opt_source_prop_encoding,
   svnsync_opt_disable_locking,
   svnsync_opt_version,
   svnsync_opt_trust_server_cert,
@@ -105,8 +106,9 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "the destination repository by any method other than 'svnsync'.\n"
  "In other words, the destination repository should be a read-only\n"
  "mirror of the source repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
-svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_prop_encoding, 'q',
+svnsync_opt_allow_non_empty, svnsync_opt_disable_locking,
+svnsync_opt_steal_lock } },
 { "synchronize", synchronize_cmd, { "sync" },
   N_("usage: svnsync synchronize DEST_URL [SOURCE_URL]\n"
  "\n"
@@ -118,8 +120,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "source URL.  Specifying SOURCE_URL is recommended in particular\n"
  "if untrusted users/administrators may have write access to the\n"
  "DEST_URL repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_prop_encoding, 'q',
+svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
 { "copy-revprops", copy_revprops_cmd, { 0 },
   N_("usage:\n"
  "\n"
@@ -139,8 +141,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "DEST_URL repository.\n"
  "\n"
  "Form 2 is deprecated syntax, equivalent to specifying 
\"-rREV[:REV2]\".\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', 'r', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_prop_encoding, 'q', 'r',
+svnsync_opt_disable_locking, svnsync_opt_steal_lock

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-21 Thread Philip Martin
Danny Trebbien  writes:
> Attached are the patch, the log message, and the two TGZ archives of
> DUMP files (for the tests).

The patch is missing.

-- 
Philip


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1.1)

2011-03-21 Thread Danny Trebbien
I know that everyone is incredibly busy preparing 1.7, but would
someone please review and commit this patch?  I don't want to see it
slip to 1.8.

Attached are the patch, the log message, and the two TGZ archives of
DUMP files (for the tests).  They are the same as before, except that
I diff'ed against revision 1082876 and included message IDs of
referenced threads in the log message.


On Mon, Feb 21, 2011 at 1:31 PM, Gavin Beau Baumanis
 wrote:
>
> Hi Everyone,
>
> I have logged this submission in the issue tracker.
>
> #3817 refers.
> http://subversion.tigris.org/issues/show_bug.cgi?id=3817
[[[
Add a command line option (--source-prop-encoding) to the svnsync init, sync,
and copy-revprops subcommands that allows the user to specify the character
encoding of translatable properties from the source repository. This is needed
to allow svnsync to sync some older Subversion repositories that have
properties that were not encoded in UTF-8.

As discussed at:
  http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
Message-ID: loom.20100909t000542...@post.gmane.org
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
Message-ID: aanlktimyfxpkos9okcd3tv8tdjznhb6y0awbxnt+r...@mail.gmail.com
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
Message-ID: aanlktik11jljxnnqgbunzwenyjj5_yutut0wksbey...@mail.gmail.com
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/124731
Message-ID: aanlktikjd3bwbigqwkjdeakub49-wjawszj8iyzae...@mail.gmail.com
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125057
Message-ID: aanlktin48eyejig4tyvptovelygr1rfxk-_tkjhwo...@mail.gmail.com
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125783
Message-ID: AANLkTimjj48xt4s=zOqC6GxoPPEdLpfSB9YmBY=7b...@mail.gmail.com

Most of the changes exist to pass the argument to --source-prop-encoding through
to the functions that need it (mainly normalize_string() in
subversion/svnsync/sync.c).

* subversion/svnsync/main.c
  (svnsync__opt) Add svnsync_opt_source_encoding.
  (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of
acceptable options for the init, sync, and copy-revprops subcommands.
  (svnsync_options): Add a description of the --source-prop-encoding option.
  (opt_baton_t, subcommand_baton_t): Add the SOURCE_PROP_ENCODING field.
  (log_properties_normalized): Document the QUIET parameter.
  (copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass it through to
svnsync_normalize_revprops().
  (make_subcommand_baton): Set the SOURCE_PROP_ENCODING field of the resulting
subcommand_baton_t object to the value of SOURCE_PROP_ENCODING from the
opt_baton_t object.
  (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started,
   replay_rev_finished): Pass SOURCE_PROP_ENCODING to svnsync_* functions and
copy_revprops().
  (main): Handle the case when the command line option is
--source-prop-encoding. Set the SOURCE_PROP_ENCODING field of the
opt_baton_t object to either OPT_ARG or NULL.

* subversion/svnsync/sync.c
  (normalize_string): Add the SOURCE_PROP_ENCODING parameter. Always call
svn_subst_translate_string2(). Switch to the "two pools" (result & scratch
pools) pattern.
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass the
value through to normalize_string().
  (edit_baton_t): Add the SOURCE_PROP_ENCODING field.
  (change_file_prop, change_dir_prop): Pass SOURCE_PROP_ENCODING from the edit
baton to normalize_string().
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter.

* subversion/svnsync/sync.h
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Update
the documentation of the function.
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter. Update the
documentation of the function.

* subversion/tests/cmdline/svnrdump_tests.py
  (copy_bad_line_endings2_dump): New test case.
  (test_list): Add copy_bad_line_endings2_dump.

* subversion/tests/cmdline/svnsync_tests.py
  (run_sync): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync synchronize`.
  (run_copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Build up the
command line arguments to `svnsync copy-revprops`.
  (run_init): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync initialize`.
  (setup_and_sync): Add the SOURCE_PROP_ENCODING parameter. Pass the value
through to run_init(), run_sync(), and run_copy_revprops().
  (run_test): Add the SOURCE_PROP_ENCODING parameter. Pass the value through to
setup_and_sync().
  (copy_bad_line_endings2): New test case.
  (copy_bad_encoding): New test case.
  (identity_copy): New test case.
  (test_list): Add copy_bad_line_endings2, copy_bad_encoding, and identity_copy.

* subversion/tests/cmdline/svnrsync_tests_data/copy-bad-line-endings

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1)

2011-02-21 Thread Gavin Beau Baumanis
Hi Everyone,

I have logged this submission in the issue tracker.

#3817 refers.
http://subversion.tigris.org/issues/show_bug.cgi?id=3817


On 21/02/2011, at 3:31 AM, Danny Trebbien wrote:

> On Mon, Feb 14, 2011 at 12:38 AM, Daniel Shahaf  
> wrote:
>> Danny Trebbien wrote on Sun, Feb 13, 2011 at 20:20:45 -0800:
>>> Attached are the corrected patch and log message. The new patch
>>> incorporates this fix:
>>> https://github.com/dtrebbien/subversion/commit/190f876b52626be6b30fe4e5a311c113fd87e589
>> 
>> Is there a github link that shows the equivalent of 'svn log --diff' of
>> this patch?  It's now about 700 lines, so having such a link would be
>> very helpful to reviewers.
>> 
>> Daniel
> 
> GitHub has a special "compare branches" URL, but you have to know the
> SHA1 of two commits:  my latest non-merge commit and the last commit
> of "upstream/trunk" that I merged into my master branch before that
> commit.  Currently those are 190f876b52626be6b30fe4e5a311c113fd87e589
> and c3e62d94c79a91176e2faab5bf6032bc070d5bc4, respectively (I
> determined this with the `gitk` tool, which draws the commit DAG
> [directed acyclic graph]).  The corresponding "compare" URL on GitHub
> is:  
> .
> If you clone my git repo, then this roughly corresponds to the output
> of `git diff c3e62d94c7...190f876b52`.
> 
> That lists all of my changes to my branch.  I'm not sure that it is
> going to be helpful because I haven't maintained separate branches for
> each of my proposed changes to Subversion trunk, so everything is kind
> of mixed in.



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1)

2011-02-20 Thread Danny Trebbien
On Mon, Feb 14, 2011 at 12:38 AM, Daniel Shahaf  wrote:
> Danny Trebbien wrote on Sun, Feb 13, 2011 at 20:20:45 -0800:
>> Attached are the corrected patch and log message. The new patch
>> incorporates this fix:
>> https://github.com/dtrebbien/subversion/commit/190f876b52626be6b30fe4e5a311c113fd87e589
>
> Is there a github link that shows the equivalent of 'svn log --diff' of
> this patch?  It's now about 700 lines, so having such a link would be
> very helpful to reviewers.
>
> Daniel

GitHub has a special "compare branches" URL, but you have to know the
SHA1 of two commits:  my latest non-merge commit and the last commit
of "upstream/trunk" that I merged into my master branch before that
commit.  Currently those are 190f876b52626be6b30fe4e5a311c113fd87e589
and c3e62d94c79a91176e2faab5bf6032bc070d5bc4, respectively (I
determined this with the `gitk` tool, which draws the commit DAG
[directed acyclic graph]).  The corresponding "compare" URL on GitHub
is:  
.
 If you clone my git repo, then this roughly corresponds to the output
of `git diff c3e62d94c7...190f876b52`.

That lists all of my changes to my branch.  I'm not sure that it is
going to be helpful because I haven't maintained separate branches for
each of my proposed changes to Subversion trunk, so everything is kind
of mixed in.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1)

2011-02-14 Thread Daniel Shahaf
Danny Trebbien wrote on Sun, Feb 13, 2011 at 20:20:45 -0800:
> Attached are the corrected patch and log message. The new patch
> incorporates this fix:
> https://github.com/dtrebbien/subversion/commit/190f876b52626be6b30fe4e5a311c113fd87e589

Is there a github link that shows the equivalent of 'svn log --diff' of
this patch?  It's now about 700 lines, so having such a link would be
very helpful to reviewers.

Daniel


[PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5.1)

2011-02-13 Thread Danny Trebbien
>> +def identity_copy(sbox):
>> +  "copy UTF-8 svn:* props identically"
>> +  orig_lc_all = locale.setlocale(locale.LC_ALL)
>> +  other_locales = [ "English.1252", "German.1252", "French.1252", 
>> "en_US.ISO-8859-1", "en_GB.ISO-8859-1", "de_DE.ISO-8859-1" ]
>> +  for other_locale in other_locales:
>> +    try:
>> +      locale.setlocale(locale.LC_ALL, other_locale)
>> +      break
>> +    except:
>> +      pass
>
> Don't you need to check that at least one of the six locales was
> set successfully?

My bad. Yes I should.

Attached are the corrected patch and log message. The new patch
incorporates this fix:
https://github.com/dtrebbien/subversion/commit/190f876b52626be6b30fe4e5a311c113fd87e589
[[[
Add a command line option (--source-prop-encoding) to the svnsync init, sync,
and copy-revprops subcommands that allows the user to specify the character
encoding of translatable properties from the source repository. This is needed
to allow svnsync to sync some older Subversion repositories that have
properties that were not encoded in UTF-8.

As discussed at:
  http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/124731
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125057
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125783

Most of the changes exist to pass the argument to --source-prop-encoding through
to the functions that need it (mainly normalize_string() in
subversion/svnsync/sync.c).

* subversion/svnsync/main.c
  (svnsync__opt) Add svnsync_opt_source_encoding.
  (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of
acceptable options for the init, sync, and copy-revprops subcommands.
  (svnsync_options): Add a description of the --source-prop-encoding option.
  (opt_baton_t, subcommand_baton_t): Add the SOURCE_PROP_ENCODING field.
  (log_properties_normalized): Document the QUIET parameter.
  (copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass it through to
svnsync_normalize_revprops().
  (make_subcommand_baton): Set the SOURCE_PROP_ENCODING field of the resulting
subcommand_baton_t object to the value of SOURCE_PROP_ENCODING from the
opt_baton_t object.
  (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started,
   replay_rev_finished): Pass SOURCE_PROP_ENCODING to svnsync_* functions and
copy_revprops().
  (main): Handle the case when the command line option is
--source-prop-encoding. Set the SOURCE_PROP_ENCODING field of the
opt_baton_t object to either OPT_ARG or NULL.

* subversion/svnsync/sync.c
  (normalize_string): Add the SOURCE_PROP_ENCODING parameter. Always call
svn_subst_translate_string2(). Switch to the "two pools" (result & scratch
pools) pattern.
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass the
value through to normalize_string().
  (edit_baton_t): Add the SOURCE_PROP_ENCODING field.
  (change_file_prop, change_dir_prop): Pass SOURCE_PROP_ENCODING from the edit
baton to normalize_string().
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter.

* subversion/svnsync/sync.h
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Update
the documentation of the function.
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter. Update the
documentation of the function.

* subversion/tests/cmdline/svnrdump_tests.py
  (copy_bad_line_endings2_dump): New test case.
  (test_list): Add copy_bad_line_endings2_dump.

* subversion/tests/cmdline/svnsync_tests.py
  (run_sync): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync synchronize`.
  (run_copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Build up the
command line arguments to `svnsync copy-revprops`.
  (run_init): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync initialize`.
  (setup_and_sync): Add the SOURCE_PROP_ENCODING parameter. Pass the value
through to run_init(), run_sync(), and run_copy_revprops().
  (run_test): Add the SOURCE_PROP_ENCODING parameter. Pass the value through to
setup_and_sync().
  (copy_bad_line_endings2): New test case.
  (copy_bad_encoding): New test case.
  (identity_copy): New test case.
  (test_list): Add copy_bad_line_endings2, copy_bad_encoding, and identity_copy.

* subversion/tests/cmdline/svnrsync_tests_data/copy-bad-line-endings2.dump
  A dump of a repository with the following features:
  1. The log message (`svn:log` revision property) of revision 1 has CRLF line
 endings.
  2. The log message of revision 2 has CR line endings.
  3. Revision 3 introduces an `svn:ignore` node property with CRLF line endings.
  4. Revision 4 introduces a custom node proper

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5)

2011-02-13 Thread Daniel Shahaf
Danny Trebbien wrote on Sat, Feb 12, 2011 at 18:17:30 -0800:
> +def identity_copy(sbox):
> +  "copy UTF-8 svn:* props identically"
> +  orig_lc_all = locale.setlocale(locale.LC_ALL)
> +  other_locales = [ "English.1252", "German.1252", "French.1252", 
> "en_US.ISO-8859-1", "en_GB.ISO-8859-1", "de_DE.ISO-8859-1" ]
> +  for other_locale in other_locales:
> +try:
> +  locale.setlocale(locale.LC_ALL, other_locale)
> +  break
> +except:
> +  pass

Don't you need to check that at least one of the six locales was
set successfully?

> +  run_test(sbox, "copy-bad-encoding.expected.dump",
> +   exp_dump_file_name="copy-bad-encoding.expected.dump",
> +   bypass_prop_validation=True)
> +
> +  locale.setlocale(locale.LC_ALL, orig_lc_all)
> +
>  #--
>  
>  def delete_svn_props(sbox):
> @@ -912,6 +958,9 @@ test_list = [ None,
>info_synchronized,
>info_not_synchronized,
>copy_bad_line_endings,
> +  copy_bad_line_endings2,
> +  copy_bad_encoding,
> +  identity_copy,
>delete_svn_props,
>commit_a_copy_of_root,
>descend_into_replace,



[PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 5)

2011-02-12 Thread Danny Trebbien
Attached are version 5 of the patch and log message.  The six DUMP
files are the same, and TGZ archives of these files can be downloaded
from Gmane:  
http://cache.gmane.org//gmane/comp/version-control/subversion/devel/125057-001.bin
and 
http://cache.gmane.org//gmane/comp/version-control/subversion/devel/125057-002.bin

There was a slight problem with v. 4 of the patch that had something
to do with the concern raised in a message from January 10, 2011:
.
 The issue was that `svnsync` would fail to sync Subversion
repositories containing UTF-8-encoded svn:* props if the
system-default language encoding is not UTF-8.  The fix was to not
pass NULL for ENCODING to svn_subst_translate_string2():
https://github.com/dtrebbien/subversion/commit/76b6856818553b9bc62a3c7c7fd2aa084ed1a3e1

Version 5 of the patch also adds a test that `svnsync` can sync
repositories containing UTF-8-encoded svn:* props even if the
system-default language encoding is not UTF-8.
[[[
Add a command line option (--source-prop-encoding) to the svnsync init, sync,
and copy-revprops subcommands that allows the user to specify the character
encoding of translatable properties from the source repository. This is needed
to allow svnsync to sync some older Subversion repositories that have
properties that were not encoded in UTF-8.

As discussed at:
  http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/124731
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125057

Most of the changes exist to pass the argument to --source-prop-encoding through
to the functions that need it (mainly normalize_string() in
subversion/svnsync/sync.c).

* subversion/svnsync/main.c
  (svnsync__opt) Add svnsync_opt_source_encoding.
  (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of
acceptable options for the init, sync, and copy-revprops subcommands.
  (svnsync_options): Add a description of the --source-prop-encoding option.
  (opt_baton_t, subcommand_baton_t): Add the SOURCE_PROP_ENCODING field.
  (log_properties_normalized): Document the QUIET parameter.
  (copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass it through to
svnsync_normalize_revprops().
  (make_subcommand_baton): Set the SOURCE_PROP_ENCODING field of the resulting
subcommand_baton_t object to the value of SOURCE_PROP_ENCODING from the
opt_baton_t object.
  (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started,
   replay_rev_finished): Pass SOURCE_PROP_ENCODING to svnsync_* functions and
copy_revprops().
  (main): Handle the case when the command line option is
--source-prop-encoding. Set the SOURCE_PROP_ENCODING field of the
opt_baton_t object to either OPT_ARG or NULL.

* subversion/svnsync/sync.c
  (normalize_string): Add the SOURCE_PROP_ENCODING parameter. Always call
svn_subst_translate_string2(). Switch to the "two pools" (result & scratch
pools) pattern.
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Pass the
value through to normalize_string().
  (edit_baton_t): Add the SOURCE_PROP_ENCODING field.
  (change_file_prop, change_dir_prop): Pass SOURCE_PROP_ENCODING from the edit
baton to normalize_string().
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter.

* subversion/svnsync/sync.h
  (svnsync_normalize_revprops): Add the SOURCE_PROP_ENCODING parameter. Update
the documentation of the function.
  (svnsync_get_sync_editor): Add the SOURCE_PROP_ENCODING parameter. Update the
documentation of the function.

* subversion/tests/cmdline/svnrdump_tests.py
  (copy_bad_line_endings2_dump): New test case.
  (test_list): Add copy_bad_line_endings2_dump.

* subversion/tests/cmdline/svnsync_tests.py
  (run_sync): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync synchronize`.
  (run_copy_revprops): Add the SOURCE_PROP_ENCODING parameter. Build up the
command line arguments to `svnsync copy-revprops`.
  (run_init): Add the SOURCE_PROP_ENCODING parameter. Build up the command line
arguments to `svnsync initialize`.
  (setup_and_sync): Add the SOURCE_PROP_ENCODING parameter. Pass the value
through to run_init(), run_sync(), and run_copy_revprops().
  (run_test): Add the SOURCE_PROP_ENCODING parameter. Pass the value through to
setup_and_sync().
  (copy_bad_line_endings2): New test case.
  (copy_bad_encoding): New test case.
  (identity_copy): New test case.
  (test_list): Add copy_bad_line_endings2, copy_bad_encoding, and identity_copy.

* subversion/tests/cmdline/svnrsync_tests_data/copy-bad-line-endings2.dump
  A dump of a repository with the following features:
  1. T

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-02-06 Thread Gavin Beau Baumanis
Ping. This patch submission is yet to receive any new comments.


On 26/01/2011, at 3:29 AM, Danny Trebbien wrote:

>> Hi Danny.
>> 
>> I committed the patch you refer to, on which this one depends, in
>> r1063320.  However, I'm not likely to get around to reviewing this patch
>> any time soon.  It would be great if someone else could take it.
>> 
>> - Julian
> 
> Okay. That's fine. Thank you for the update.



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-01-25 Thread Danny Trebbien
> Hi Danny.
>
> I committed the patch you refer to, on which this one depends, in
> r1063320.  However, I'm not likely to get around to reviewing this patch
> any time soon.  It would be great if someone else could take it.
>
> - Julian

Okay. That's fine. Thank you for the update.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-01-25 Thread Julian Foad
On Fri, 2011-01-14, Danny Trebbien wrote:
> Attached are version 4 of the patch and log message. Also attached are
> two TGZ archives containing the six DUMP files that the new svnsync
> and svnrdump tests depend upon. These TGZ archives can be extracted
> directly into your working copy of trunk.
> 
> Note that this patch depends on
> ,
> which must be committed first.

Hi Danny.

I committed the patch you refer to, on which this one depends, in
r1063320.  However, I'm not likely to get around to reviewing this patch
any time soon.  It would be great if someone else could take it.

- Julian




Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-01-23 Thread Danny Trebbien
Ping. I haven't yet received feedback on this patch.

On Fri, Jan 14, 2011 at 1:01 PM, Danny Trebbien  wrote:
> Attached are version 4 of the patch and log message. Also attached are
> two TGZ archives containing the six DUMP files that the new svnsync
> and svnrdump tests depend upon. These TGZ archives can be extracted
> directly into your working copy of trunk.
>
> Note that this patch depends on
> ,
> which must be committed first.


[PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-01-14 Thread Danny Trebbien
Attached are version 4 of the patch and log message. Also attached are
two TGZ archives containing the six DUMP files that the new svnsync
and svnrdump tests depend upon. These TGZ archives can be extracted
directly into your working copy of trunk.

Note that this patch depends on
,
which must be committed first.
Index: subversion/svnsync/sync.h
===
--- subversion/svnsync/sync.h   (revision 1058560)
+++ subversion/svnsync/sync.h   (working copy)
@@ -33,15 +33,20 @@ extern "C" {
 #include "svn_delta.h"
 
 
-/* Normalize the line ending style of the values of properties in REV_PROPS
- * that "need translation" (according to svn_prop_needs_translation(),
- * currently all svn:* props) so that they contain only LF (\n) line endings.
- * The number of properties that needed normalization is returned in
+/* Normalize the encoding and line ending style of the values of properties
+ * in REV_PROPS that "need translation" (according to
+ * svn_prop_needs_translation(), which is currently all svn:* props) so that
+ * they are encoded in UTF-8 and contain only LF (\n) line endings.
+ *
+ * The number of properties that needed line ending normalization is returned 
in
  * *NORMALIZED_COUNT.
+ *
+ * No re-encoding is performed if SOURCE_PROP_ENCODING is NULL.
  */
 svn_error_t *
 svnsync_normalize_revprops(apr_hash_t *rev_props,
int *normalized_count,
+   const char *source_prop_encoding,
apr_pool_t *pool);
 
 
@@ -51,15 +56,21 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
  * the commit.  TO_URL is the URL of the root of the repository into
  * which the commit is being made.
  *
+ * If SOURCE_PROP_ENCODING is NULL, then property values are presumed to be
+ * encoded in UTF-8 and are not re-encoded. Otherwise, the property values are
+ * presumed to be encoded in SOURCE_PROP_ENCODING, and are normalized to UTF-8.
+ *
  * As the sync editor encounters property values, it might see the need to
- * normalize them (to LF line endings). Each carried out normalization adds 1
- * to the *NORMALIZED_NODE_PROPS_COUNTER (for notification).
+ * normalize them (re-encode and/or change to LF line endings). Each 
carried-out
+ * line ending normalization adds 1 to the *NORMALIZED_NODE_PROPS_COUNTER
+ * (for notification).
  */
 svn_error_t *
 svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
 void *wrapped_edit_baton,
 svn_revnum_t base_revision,
 const char *to_url,
+const char *source_prop_encoding,
 svn_boolean_t quiet,
 const svn_delta_editor_t **editor,
 void **edit_baton,
Index: subversion/svnsync/main.c
===
--- subversion/svnsync/main.c   (revision 1058560)
+++ subversion/svnsync/main.c   (working copy)
@@ -61,6 +61,7 @@ enum svnsync__opt {
   svnsync_opt_sync_password,
   svnsync_opt_config_dir,
   svnsync_opt_config_options,
+  svnsync_opt_source_prop_encoding,
   svnsync_opt_disable_locking,
   svnsync_opt_version,
   svnsync_opt_trust_server_cert,
@@ -105,8 +106,9 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "the destination repository by any method other than 'svnsync'.\n"
  "In other words, the destination repository should be a read-only\n"
  "mirror of the source repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
-svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_prop_encoding, 'q',
+svnsync_opt_allow_non_empty, svnsync_opt_disable_locking,
+svnsync_opt_steal_lock } },
 { "synchronize", synchronize_cmd, { "sync" },
   N_("usage: svnsync synchronize DEST_URL [SOURCE_URL]\n"
  "\n"
@@ -118,8 +120,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "source URL.  Specifying SOURCE_URL is recommended in particular\n"
  "if untrusted users/administrators may have write access to the\n"
  "DEST_URL repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_prop_encoding, 'q',
+svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
 { "copy-revprops", copy_revprops_cmd, { 0 },
   N_("usage:\n"
  "\n"
@@ -139,8 +141,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "DEST_URL repository.\n"
  "\n"
  "Form 2 is deprecated syntax, equivalent to specifying 
\"-rREV[:REV2]\".\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', 'r', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEF

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-14 Thread Julian Foad
On Fri, 2011-01-14 at 09:16 -0800, Danny Trebbien wrote:
> >>> -  SVN_ERR(normalize_string(&value, &was_normalized, pool));
> >>> +  SVN_ERR(normalize_string(&value, &was_normalized, 
> >>> eb->prop_encoding,
> >>> +   pool, pool));
> >>>if (was_normalized)
> >>> -(*(eb->normalized_node_props_counter))++;
> >>> +(*(eb->le_normalized_node_props_counter))++;
> >>>  }
> >>
> >> This kind of code can be simplified (removing one level of indirection)
> >> by making the baton hold the actual counter instead of a pointer to it.
> >
> > Okay.
> 
> Actually, I take that back. The pointer to the count comes from the
> "public API" function svnsync_get_sync_editor(). I don't think that
> this indirection can be removed without adding another API function to
> retrieve the count. Something like:
> 
> int
> svnsync_get_edit_baton_normalization_count(void *edit_baton)
> {
>   return ((edit_baton_t*) edit_baton)->normalized_node_props_count;
> }

Oh... I was assuming there was some bit of code here in this file that
would be called at the end of all the processing, that could copy the
final count from the baton to the place where the caller wanted it.  If
that's not the case - if there's no place here where we could do that -
then I suggest we leave it as it is.  This was just a drive-by comment
and not a good reason to change the API.

- Julian


> Maybe this change can be done in another patch as the patch is getting
> large again.




Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-14 Thread Danny Trebbien
>>> -      SVN_ERR(normalize_string(&value, &was_normalized, pool));
>>> +      SVN_ERR(normalize_string(&value, &was_normalized, eb->prop_encoding,
>>> +                               pool, pool));
>>>        if (was_normalized)
>>> -        (*(eb->normalized_node_props_counter))++;
>>> +        (*(eb->le_normalized_node_props_counter))++;
>>>      }
>>
>> This kind of code can be simplified (removing one level of indirection)
>> by making the baton hold the actual counter instead of a pointer to it.
>
> Okay.

Actually, I take that back. The pointer to the count comes from the
"public API" function svnsync_get_sync_editor(). I don't think that
this indirection can be removed without adding another API function to
retrieve the count. Something like:

int
svnsync_get_edit_baton_normalization_count(void *edit_baton)
{
  return ((edit_baton_t*) edit_baton)->normalized_node_props_count;
}

Maybe this change can be done in another patch as the patch is getting
large again.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-10 Thread Daniel Shahaf
Danny Trebbien wrote on Mon, Jan 10, 2011 at 08:55:35 -0800:
> >> Perhaps, though, I should use svn_subst_translate_string2() with the
> >> encoding set to "UTF-8". It would greatly simplify normalize_string(),
> >
> > +1.  Just call it with encoding=NULL; using the same API in both
> > branches makes life easier for the next person to revv that API.
> 
> One thing that worries me about using NULL for ENCODING is that
> svn_subst_translate_string2() calls svn_utf_cstring_to_utf8() on the
> string in that case, implying some sort of re-encoding from the native
> narrow string encoding to UTF-8. Couldn't this be problematic?

svn_subst_translate_string2()'s doc string explicitly blesses passing
encoding=NULL, so it's not of our concern in svnsync.

I agree, however, that it might be a bug in svn_subst_translate_string2().)
Feel free to check how svn_subst_translate_string() handled
encoding=NULL and, if needed, submit a patch to
svn_subst_translate_string2().


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-10 Thread Danny Trebbien
>> Perhaps, though, I should use svn_subst_translate_string2() with the
>> encoding set to "UTF-8". It would greatly simplify normalize_string(),
>
> +1.  Just call it with encoding=NULL; using the same API in both
> branches makes life easier for the next person to revv that API.

One thing that worries me about using NULL for ENCODING is that
svn_subst_translate_string2() calls svn_utf_cstring_to_utf8() on the
string in that case, implying some sort of re-encoding from the native
narrow string encoding to UTF-8. Couldn't this be problematic?


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-09 Thread Daniel Shahaf
Danny Trebbien wrote on Fri, Jan 07, 2011 at 13:03:18 -0800:
> > Refresh my memory please: did we decide in some thread that the
> > recodings should be counted too?
> 
> At one point Stefan asked me if I was interested:
> http://article.gmane.org/gmane.comp.version-control.subversion.devel/122540
> 

Okay.  I no longer remember what was decided (too many threads about
this), but I'll respond to your points below.

> >> * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump
> >>   A dump of a repository with the following features:
> >>   1. The log message (`svn:log` revision property) of revision 1 has CRLF 
> >> line
> >>      endings.
> >>   2. The log message of revision 2 has CR line endings.
> >>   3. Revision 3 introduces an `svn:ignore` node property with CRLF line 
> >> endings.
> >>   4. Revision 4 introduces a custom node property, `x:related-to`, with 
> >> CRLF
> >>      line endings.
> >>
> >
> > I don't understand
> >
> > * What is the difference between copy_bad_line_endings() and
> >  copy_bad_line_endings2()?
> 
> The difference is that copy_bad_line_endings2 tests that non-LF line
> endings in a non-translatable property (one that does not begin with
> `svn:`) are not affected.

Okay.  (I overlooked that detail.)

> I simply ported one of the three test
> repositories that I posted to the Developers' mailing list a while ago
> to the cmdline tests architecture.
> 
> > Why don't you call svn_utf_cstring_to_utf8(), like is done in almost
> > every other use of opt_arg?
> 
> Well, here's my reasoning for not reencoding into UTF-8: as opposed to
> strings such as the username and password, which must be represented
> independent of the local machine (so, reencoded into a standard
> character encoding), the representation of the value of the "source
> encoding" option is tied to the local machine's charset conversion
> faculties. Let's say that it's iconv for simplification. The charset
> of the values of the "to encoding" and "from encoding" parameters are
> system-dependent, so we don't know what charset they are encoded in.
> If somehow someone managed to install a charset with an unusual name
> (not representable in ASCII) and the system iconv implementation
> expects ISO-8859-1 strings for the to/from encoding values, then that
> someone might not be able to pick the special charset if we reencode
> the to/from encoding values into UTF-8. Presumably however, the user
> would know how to pass the ISO-8859-1 representation of the name of
> that charset to the program as one of the elements of ARGV, so pass it
> to iconv unmodified.
> 

Can you name a charset for which this is not a theoretical concern?

To put it succintly, it seems to me you're just worrying too much here.

Also: consistency is good.  If you're going to special-case one --option's
parsing from all others, you need a VERY good reason.

> >> +      *str = new_str;
> >> +    }
> >> +
> >> +  /* Only detect inconsistent line ending style. This is accomplished by 
> >> simply
> >> +     looking for carriage return (\r) characters. */
> >> +  else if (memchr((*str)->data, (unsigned char) '\r', (*str)->len) != 
> >> NULL)
> >
> > I see the reason for the s/strchr/memchr/ in the log message: because
> > the length is known.  I suppose the underlying motivation is to make the
> > condition slightly faster?
> 
> And to support NUL characters ('\0').
> 

I'm not sure that NUL is actually allowed in these propvalues, but
I don't feel strongly about this.

> >> +    {
> >>        /* Found some. Normalize. */
> >>        const char* cstring = NULL;
> >>        SVN_ERR(svn_subst_translate_cstring2((*str)->data, &cstring,
> >>                                             "\n", TRUE,
> >>                                             NULL, FALSE,
> >> -                                           pool));
> >
> > Should this call be to svn_subst_translate_string2()?
> >                                          ^^
> 

(I just noticed that translate_string2() and translate_cstring2() aren't
actually related; one is for keywords and one not.  Unfortunate naming.)

> I didn't call svn_subst_translate_string2() because this is the else
> condition for "source encoding is not NULL". In the documentation for
> the function, I added that if source encoding is not NULL, then no
> reencoding is performed because the string is presumed to be encoded
> in UTF-8.
> 
> Perhaps, though, I should use svn_subst_translate_string2() with the
> encoding set to "UTF-8". It would greatly simplify normalize_string(),

+1.  Just call it with encoding=NULL; using the same API in both
branches makes life easier for the next person to revv that API.

> but might do more work if converting UTF-8 to UTF-8 is not a no-op.

Do you know about the UTF-8 multiple-normalization-forms issues?
Because if not, you're just worrying too much again :-)

> I'm not sure if svn_utf_cstring_to_utf8_ex2() recognizes this special
> case.
> 
> > Most svnsync tests are use

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-07 Thread Danny Trebbien
> Seems to me it would be good for the name of the command-line option to
> include "property", e.g. "--prop-source-encoding".  I note that this
> option is not intended to be used frequently so I don't imagine it's a
> problem to have a long name.

How about "--source-prop-encoding"?

>> > Index: subversion/svnsync/sync.h
>> > ===
>> > svnsync_normalize_revprops(apr_hash_t *rev_props,
>> > -                           int *normalized_count,
>> > +                           int *normalized_le_count,
>> > +                           const char *encoding,
>> >                             apr_pool_t *pool);
>> >  svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
>> >                          void *wrapped_edit_baton,
>> >                          svn_revnum_t base_revision,
>> >                          const char *to_url,
>> > +                        const char *prop_encoding,
>> >                          svn_boolean_t quiet,
>> >                          const svn_delta_editor_t **editor,
>> >                          void **edit_baton,
>> > -                        int *normalized_node_props_counter,
>> > +                        int *le_normalized_node_props_counter,
>
> Can you use the same names for the counter parameters?  (I realize one
> is an output and the other is an in-out, but that doesn't appear to be
> the reason why they differ.)  Also we normally use "EOL" for "end of
> line"; I haven't noticed the use of "LE" before.

Okay.

Note that I will be reverting the renames for now, but the variables
will have the same name in a different patch to rename them.

> Not directly related to this change, but ...
>
>> -  int *normalized_node_props_counter;  /* Where to count normalizations? */
>> +  int *le_normalized_node_props_counter;  /* Where to count line ending
>> +                                             normalizations? */
>>  } edit_baton_t;
>>
>> @@ -407,9 +425,10 @@ change_file_prop(void *file_baton,
>>    if (svn_prop_needs_translation(name))
>>      {
>>        svn_boolean_t was_normalized;
>
> It would be a good idea to rename this variable also, otherwise this is
> a bit confusing.

I agree.

>> -      SVN_ERR(normalize_string(&value, &was_normalized, pool));
>> +      SVN_ERR(normalize_string(&value, &was_normalized, eb->prop_encoding,
>> +                               pool, pool));
>>        if (was_normalized)
>> -        (*(eb->normalized_node_props_counter))++;
>> +        (*(eb->le_normalized_node_props_counter))++;
>>      }
>
> This kind of code can be simplified (removing one level of indirection)
> by making the baton hold the actual counter instead of a pointer to it.

Okay.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-07 Thread Danny Trebbien
Hi Daniel and Julian,

My apologies for the delay in responding.

> Refresh my memory please: did we decide in some thread that the
> recodings should be counted too?

At one point Stefan asked me if I was interested:
http://article.gmane.org/gmane.comp.version-control.subversion.devel/122540

>>   (svnsync_normalize_revprops): Rename NORMALIZED_COUNT to
>>     NORMALIZED_LE_COUNT. Add the ENCODING parameter. Move the call to
>>     apr_hash_set() outside of the if block (if (le_normalized)), as the
>>     property value may have been changed because of re-encoding even when no
>>     line ending was normalized.
>
> The sentence "Move.*" is too much detail.  It's probably fine to just
> omit it --- there is no functional change, and the details of the hunk
> need no introduction.

Okay.

>> * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump
>>   A dump of a repository with the following features:
>>   1. The log message (`svn:log` revision property) of revision 1 has CRLF 
>> line
>>      endings.
>>   2. The log message of revision 2 has CR line endings.
>>   3. Revision 3 introduces an `svn:ignore` node property with CRLF line 
>> endings.
>>   4. Revision 4 introduces a custom node property, `x:related-to`, with CRLF
>>      line endings.
>>
>
> I don't understand
>
> * What is the difference between copy_bad_line_endings() and
>  copy_bad_line_endings2()?

The difference is that copy_bad_line_endings2 tests that non-LF line
endings in a non-translatable property (one that does not begin with
`svn:`) are not affected. I simply ported one of the three test
repositories that I posted to the Developers' mailing list a while ago
to the cmdline tests architecture.

> * How does copy_bad_line_endings2() relate to the subject --- which is
>  non-UTF-8 log messages, as opposed to non-LF-only log messages?
>
>  i.e., why aren't you adding a dump where the 'before' has an
>  ISO-8859-1 property, and the 'after' has that property recoded to
>  UTF-8?

I haven't done this yet. I still need to do this.

> I wonder: it might be a good idea to split those renames to their own
> patch, so as to not hide the needle (meat of the change) in a haystack
> of hunks.

Yes, probably. I will revert these changes and save for later.

> No whitespace-only hunks in a patch that makes a functional change,
> please.  Thank you.

Okay.

>>    apr_array_header_t *config_options = NULL;
>> +  opt_baton.source_encoding = NULL;
>>    apr_allocator_t *allocator;
>
> Oops, you can't have a declaration (of ALLOCATOR) after statement
> (non-declaration assignment) in C89.
>
> Also, we memset(&opt_baton, 0), so this NULL may not be necessary.
>
> (all 0-v-NULL discussions aside, I'd leave it in only if we init to NULL
> the other pointer members of the struct)

In a way, the pointer fields of the opt baton are all initialized to
NULL if the corresponding command line options are not specified
because there are local variables for each that are initialized to
NULL, and these are copied into the opt baton later on in main(). I
think that I will switch to this pattern (i.e. define `const char
*source_encoding = NULL;` and copy it into the opt baton later).

I vaguely recall using that pattern, but switching to the current
pattern for some reason.

>>
>>    if (svn_cmdline_init("svnsync", stderr) != EXIT_SUCCESS)
>> @@ -1831,6 +1854,10 @@ main(int argc, const char *argv[])
>>                return svn_cmdline_handle_exit_error(err, pool, "svnsync: ");
>>              break;
>>
>> +          case svnsync_opt_source_encoding:
>> +            opt_baton.source_encoding = apr_pstrdup(pool, opt_arg);
>> +            break;
>
> Why do you need to pstrdup() it?  Doesn't it have the proper lifetime
> already?

Oh, I guess it does. I used it because I was thinking that maybe the
string value could be modified by something, but I see now that the
program arguments are passed as an array of const C-strings.

> Why don't you call svn_utf_cstring_to_utf8(), like is done in almost
> every other use of opt_arg?

Well, here's my reasoning for not reencoding into UTF-8: as opposed to
strings such as the username and password, which must be represented
independent of the local machine (so, reencoded into a standard
character encoding), the representation of the value of the "source
encoding" option is tied to the local machine's charset conversion
faculties. Let's say that it's iconv for simplification. The charset
of the values of the "to encoding" and "from encoding" parameters are
system-dependent, so we don't know what charset they are encoded in.
If somehow someone managed to install a charset with an unusual name
(not representable in ASCII) and the system iconv implementation
expects ISO-8859-1 strings for the to/from encoding values, then that
someone might not be able to pick the special charset if we reencode
the to/from encoding values into UTF-8. Presumably however, the user
would know how to pass the ISO-8859-1 representation of 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-05 Thread Julian Foad
On Sun, 2011-01-02, Daniel Shahaf wrote:
> Danny Trebbien wrote on Sun, Dec 26, 2010 at 15:39:05 -0800:
> > Attached are version 3 of the patch and corresponding log message.
> > Also attached is a TGZ archive of two Subversion repository dumps that
> > are used by a new test case.

> > [[[
> > Add a command line option (--source-encoding) to the svnsync init, sync, and
> > copy-revprops subcommands that allows the user to specify the character
> > encoding of translatable properties from the source repository. This is 
> > needed
> > to allow svnsync to sync some older Subversion repositories that have
> > properties that were not encoded in UTF-8.
[...]

+1 to what danielsh said.

Seems to me it would be good for the name of the command-line option to
include "property", e.g. "--prop-source-encoding".  I note that this
option is not intended to be used frequently so I don't imagine it's a
problem to have a long name.

> > Index: subversion/svnsync/sync.h
> > ===
> > svnsync_normalize_revprops(apr_hash_t *rev_props,
> > -   int *normalized_count,
> > +   int *normalized_le_count,
> > +   const char *encoding,
> > apr_pool_t *pool); 
> >  svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
> >  void *wrapped_edit_baton,
> >  svn_revnum_t base_revision,
> >  const char *to_url,
> > +const char *prop_encoding,
> >  svn_boolean_t quiet,
> >  const svn_delta_editor_t **editor,
> >  void **edit_baton,
> > -int *normalized_node_props_counter,
> > +int *le_normalized_node_props_counter,

Can you use the same names for the counter parameters?  (I realize one
is an output and the other is an in-out, but that doesn't appear to be
the reason why they differ.)  Also we normally use "EOL" for "end of
line"; I haven't noticed the use of "LE" before.


Not directly related to this change, but ...

> -  int *normalized_node_props_counter;  /* Where to count normalizations? */
> +  int *le_normalized_node_props_counter;  /* Where to count line ending
> + normalizations? */
>  } edit_baton_t;
> 
> @@ -407,9 +425,10 @@ change_file_prop(void *file_baton,
>if (svn_prop_needs_translation(name))
>  {
>svn_boolean_t was_normalized;

It would be a good idea to rename this variable also, otherwise this is
a bit confusing.

> -  SVN_ERR(normalize_string(&value, &was_normalized, pool));
> +  SVN_ERR(normalize_string(&value, &was_normalized, eb->prop_encoding,
> +   pool, pool));
>if (was_normalized)
> -(*(eb->normalized_node_props_counter))++;
> +(*(eb->le_normalized_node_props_counter))++;
>  }

This kind of code can be simplified (removing one level of indirection)
by making the baton hold the actual counter instead of a pointer to it.

- Julian




Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2011-01-01 Thread Daniel Shahaf
[ trimming CC ]

Danny Trebbien wrote on Sun, Dec 26, 2010 at 15:39:05 -0800:
> Attached are version 3 of the patch and corresponding log message.
> Also attached is a TGZ archive of two Subversion repository dumps that
> are used by a new test case.

> [[[
> Add a command line option (--source-encoding) to the svnsync init, sync, and
> copy-revprops subcommands that allows the user to specify the character
> encoding of translatable properties from the source repository. This is needed
> to allow svnsync to sync some older Subversion repositories that have
> properties that were not encoded in UTF-8.
> 
> As discussed at:
>   http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
>   http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
>   http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
> 
> Around half of the changes are to rename the "normalized count" variables so
> that it is more clear that the counters only count line ending normalizations
> and not re-encoding normalizations. The other half of the changes are cosmetic
> (adding comments or trimming trailing whitespace in lines) or exist to pass 
> the
> argument to --source-encoding through to the functions that need it (mainly
> normalize_string() in subversion/svnsync/sync.c).
> 

+1.

Refresh my memory please: did we decide in some thread that the
recodings should be counted too?

> * subversion/svnsync/main.c
>   (svnsync__opt) Add svnsync_opt_source_encoding.
>   (svnsync_cmd_table): Add svnsync_opt_source_encoding to the list of
> acceptable options for the init, sync, and copy-revprops subcommands.
>   (svnsync_options): Add a description of the --source-encoding option.
>   (opt_baton_t, subcommand_baton_t): Add the SOURCE_ENCODING field.
>   (log_properties_normalized): Rename the NORMALIZED_REV_PROPS_COUNT parameter
> to LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT
> parameter to LE_NORMALIZED_NODE_PROPS_COUNT.
>   (copy_revprops): Add the PROP_ENCODING parameter. Rename the 
> NORMALIZED_COUNT
> parameter to LE_NORMALIZED_COUNT.
>   (make_subcommand_baton): Set the SOURCE_ENCODING field of the resulting
> subcommand_baton_t object to the value of SOURCE_ENCODING from the
> opt_baton_t object.
>   (do_initialize, do_synchronize, do_copy_revprops, replay_rev_started,
>replay_rev_finished): Pass SOURCE_ENCODING to svnsync_* functions and
> copy_revprops().
>   (replay_baton_t): Rename the NORMALIZED_REV_PROPS_COUNT field to
> LE_NORMALIZED_REV_PROPS_COUNT and the NORMALIZED_NODE_PROPS_COUNT field to
> LE_NORMALIZED_NODE_PROPS_COUNT.
>   (main): Handle the case when the command line option is --source-encoding.
> Set the SOURCE_ENCODING field of the opt_baton_t object to either OPT_ARG
> or NULL.
> 
> * subversion/svnsync/sync.c
>   (normalize_string): Rename WAS_NORMALIZED to WAS_LE_NORMALIZED to make clear
> that the counter only counts line ending normalizations. Add the ENCODING
> parameter. Handle the case when ENCODING is not NULL by calling
> svn_subst_translate_string2(). Switch to the "two pools" (result & scratch
> pools) pattern. Use memchr() instead of strchr() because the length of
> (*STR)->data is known.
>   (svnsync_normalize_revprops): Rename NORMALIZED_COUNT to
> NORMALIZED_LE_COUNT. Add the ENCODING parameter. Move the call to
> apr_hash_set() outside of the if block (if (le_normalized)), as the
> property value may have been changed because of re-encoding even when no
> line ending was normalized.

The sentence "Move.*" is too much detail.  It's probably fine to just
omit it --- there is no functional change, and the details of the hunk
need no introduction.

> * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump
>   A dump of a repository with the following features:
>   1. The log message (`svn:log` revision property) of revision 1 has CRLF line
>  endings.
>   2. The log message of revision 2 has CR line endings.
>   3. Revision 3 introduces an `svn:ignore` node property with CRLF line 
> endings.
>   4. Revision 4 introduces a custom node property, `x:related-to`, with CRLF
>  line endings.
> 

I don't understand

* What is the difference between copy_bad_line_endings() and
  copy_bad_line_endings2()?

* How does copy_bad_line_endings2() relate to the subject --- which is
  non-UTF-8 log messages, as opposed to non-LF-only log messages?

  i.e., why aren't you adding a dump where the 'before' has an
  ISO-8859-1 property, and the 'after' has that property recoded to
  UTF-8?

> * 
> subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.expected.dump
>   A dump of the expected result of using svnsync to sync
>   copy-bad-line-endings2.dump.
> ]]]

> Index: subversion/svnsync/sync.h
> ===
> --- subversion/svnsync/sync.h (revision 1052903)
> +++ s

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 3)

2010-12-26 Thread Danny Trebbien
Attached are version 3 of the patch and corresponding log message.
Also attached is a TGZ archive of two Subversion repository dumps that
are used by a new test case.
Index: subversion/svnsync/sync.h
===
--- subversion/svnsync/sync.h   (revision 1052903)
+++ subversion/svnsync/sync.h   (working copy)
@@ -33,15 +33,20 @@ extern "C" {
 #include "svn_delta.h"
 
 
-/* Normalize the line ending style of the values of properties in REV_PROPS
- * that "need translation" (according to svn_prop_needs_translation(),
- * currently all svn:* props) so that they contain only LF (\n) line endings.
- * The number of properties that needed normalization is returned in
- * *NORMALIZED_COUNT.
+/* Normalize the encoding and line ending style of the values of properties
+ * in REV_PROPS that "need translation" (according to
+ * svn_prop_needs_translation(), which is currently all svn:* props) so that
+ * they are encoded in UTF-8 and contain only LF (\n) line endings.
+ *
+ * The number of properties that needed line ending normalization is returned 
in
+ * *NORMALIZED_LE_COUNT.
+ *
+ * No re-encoding is performed if ENCODING is NULL.
  */
 svn_error_t *
 svnsync_normalize_revprops(apr_hash_t *rev_props,
-   int *normalized_count,
+   int *normalized_le_count,
+   const char *encoding,
apr_pool_t *pool);
 
 
@@ -51,19 +56,25 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
  * the commit.  TO_URL is the URL of the root of the repository into
  * which the commit is being made.
  *
+ * If PROP_ENCODING is NULL, then property values are presumed to be encoded
+ * in UTF-8 and are not re-encoded. Otherwise, the property values are
+ * presumed to be encoded in PROP_ENCODING, and are normalized to UTF-8.
+ *
  * As the sync editor encounters property values, it might see the need to
- * normalize them (to LF line endings). Each carried out normalization adds 1
- * to the *NORMALIZED_NODE_PROPS_COUNTER (for notification).
+ * normalize them (re-encode and/or change to LF line endings). Each 
carried-out
+ * line ending normalization adds 1 to the *LE_NORMALIZED_NODE_PROPS_COUNTER
+ * (for notification).
  */
 svn_error_t *
 svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
 void *wrapped_edit_baton,
 svn_revnum_t base_revision,
 const char *to_url,
+const char *prop_encoding,
 svn_boolean_t quiet,
 const svn_delta_editor_t **editor,
 void **edit_baton,
-int *normalized_node_props_counter,
+int *le_normalized_node_props_counter,
 apr_pool_t *pool);
 
 
Index: subversion/svnsync/main.c
===
--- subversion/svnsync/main.c   (revision 1052903)
+++ subversion/svnsync/main.c   (working copy)
@@ -61,6 +61,7 @@ enum svnsync__opt {
   svnsync_opt_sync_password,
   svnsync_opt_config_dir,
   svnsync_opt_config_options,
+  svnsync_opt_source_encoding,
   svnsync_opt_disable_locking,
   svnsync_opt_version,
   svnsync_opt_trust_server_cert,
@@ -105,8 +106,9 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "the destination repository by any method other than 'svnsync'.\n"
  "In other words, the destination repository should be a read-only\n"
  "mirror of the source repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
-svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_allow_non_empty, svnsync_opt_disable_locking,
+svnsync_opt_steal_lock } },
 { "synchronize", synchronize_cmd, { "sync" },
   N_("usage: svnsync synchronize DEST_URL [SOURCE_URL]\n"
  "\n"
@@ -118,8 +120,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "source URL.  Specifying SOURCE_URL is recommended in particular\n"
  "if untrusted users/administrators may have write access to the\n"
  "DEST_URL repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_disable_locking, svnsync_opt_steal_lock } },
 { "copy-revprops", copy_revprops_cmd, { 0 },
   N_("usage:\n"
  "\n"
@@ -139,8 +141,8 @@ static const svn_opt_subcommand_desc2_t svnsync_cm
  "DEST_URL repository.\n"
  "\n"
  "Form 2 is deprecated syntax, equivalent to specifying 
\"-rREV[:REV2]\".\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', 'r', svnsync_opt_disable_locking,
-svnsync_opt_steal_lock } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_enc

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-10-23 Thread Daniel Trebbien
On Mon, Oct 18, 2010 at 3:05 AM, Gavin Beau Baumanis
 wrote:
> Hi Daniel,
>
> This is just a friendly poke.
> Are you happy to have me "annoy" you every so often - or would just like me 
> leave it entirely with you ?
>
>
> Gavin "Beau" Baumanis

Hi Gavin,

I don't mind at all. Regarding this svnsync patch, I am waiting for
"[PATCH] extend svn_subst_translate_string() to record whether
re-encoding and/or line ending translation were performed (v. 2)"
(http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020/)
to be committed because this patch depends upon that one. Not to
worry, I am still interested in making the previously-discussed
enhancements to svnsync.

Daniel


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-10-18 Thread Gavin Beau Baumanis
Hi Daniel,

This is just a friendly poke.
Are you happy to have me "annoy" you every so often - or would just like me 
leave it entirely with you ?


Gavin "Beau" Baumanis


On 29/09/2010, at 11:14 AM, Daniel Trebbien wrote:

> Hi Gavin,
> 
> There have been some things that I have needed to do recently, so I
> haven't been working on the patches as much as I would like.  But, I
> am still interested in finishing them.
> 
> My apologies for the delay...
> 
> Daniel
> 
> 
> On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
>  wrote:
>> Hi Daniel (Trebbien),
>> 
>> Just thought I would ask for an update, please.
>> I still have you on my "watch - list" - so there is no rush - just making 
>> sure for my own sanity that I haven't missed an email / thread about your 
>> patch submission.
>> 
>> 
>> Gavin "Beau" Baumanis


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Gavin Beau Baumanis
Daniel,

Absolutely no apology required.
Just about everyone here is a volunteer of the own spare time.
I was checking for my own purposes, in case I had missed an alternate message 
thread.


Gavin "Beau" Baumanis


On 29/09/2010, at 11:14 AM, Daniel Trebbien wrote:

> Hi Gavin,
> 
> There have been some things that I have needed to do recently, so I
> haven't been working on the patches as much as I would like.  But, I
> am still interested in finishing them.
> 
> My apologies for the delay...
> 
> Daniel
> 
> 
> On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
>  wrote:
>> Hi Daniel (Trebbien),
>> 
>> Just thought I would ask for an update, please.
>> I still have you on my "watch - list" - so there is no rush - just making 
>> sure for my own sanity that I haven't missed an email / thread about your 
>> patch submission.
>> 
>> 
>> Gavin "Beau" Baumanis


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Daniel Trebbien
Hi Gavin,

There have been some things that I have needed to do recently, so I
haven't been working on the patches as much as I would like.  But, I
am still interested in finishing them.

My apologies for the delay...

Daniel


On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
 wrote:
> Hi Daniel (Trebbien),
>
> Just thought I would ask for an update, please.
> I still have you on my "watch - list" - so there is no rush - just making 
> sure for my own sanity that I haven't missed an email / thread about your 
> patch submission.
>
>
> Gavin "Beau" Baumanis


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Gavin Beau Baumanis
Hi Daniel (Trebbien),

Just thought I would ask for an update, please.
I still have you on my "watch - list" - so there is no rush - just making sure 
for my own sanity that I haven't missed an email / thread about your patch 
submission.


Gavin "Beau" Baumanis


On 22/09/2010, at 2:56 AM, Daniel Shahaf wrote:

> Daniel Trebbien wrote on Tue, Sep 21, 2010 at 08:12:38 -0700:
>> On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf  
>> wrote:
>>> Need the word "to" after the semicolon for the English to be correct.
>> 
>> Fixed.  (When I say "fixed", I mean that I committed a fix in my own
>> fork of Subversion:  http://github.com/dtrebbien/subversion )
>> 
> 
> Fine, but please post patches and log messages to this list as usual.
> It's easier for me/us to process them that way.  Thanks.
> 
 +svn_error_t *
 +svn_subst_translate_cstring2(const char *src,
 + const char **dst,
 + const char *eol_str,
 + svn_boolean_t repair,
 + apr_hash_t *keywords,
 + svn_boolean_t expand,
 + apr_pool_t *pool)
 +{
 +  return translate_cstring2(dst, NULL, src, eol_str, repair, keywords, 
 expand,
 +pool);
 +}
>>> 
>>> So, this is revved because svn_subst_translate_string2() needs it (and
>>> svn_subst_translate_string() doesn't).
>> 
>> I am unsure of what you mean.
>> 
>> I took the original implementation of `svn_subst_translate_cstring2`
>> and placed it in the new static utility function `translate_cstring2`.
>> `translate_cstring2` accepts another parameter, and its definition
>> (the old definition on `svn_subst_translate_cstring2`) was
>> appropriately modified to handle this new parameter. This way, I don't
>> have to modify the public API.
>> 
> 
> Yes, yes.  You revved three svn_subst_* functions; one publicly and two
> privately.  So I was pointing out (to myself), on each of the latter
> two, where is the call site that needs the additional functionality.
> 
>>> I suggest you go ahead (respond to the review, submit an updated patch,
>>> etc) without waiting; the svnsync part will eventually get its share of
>>> reviews.  Also, you may want to consider splitting this patch into two
>>> parts (subst work separate from svnsync work).
>> 
>> I like the idea of splitting this into two patches.
> 
> Okay.  Please post an updated patch to this list when you have it ready :-)
> 
> Best,
> 
> Daniel
> 


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Shahaf
Daniel Trebbien wrote on Tue, Sep 21, 2010 at 08:12:38 -0700:
> On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf  
> wrote:
> > Need the word "to" after the semicolon for the English to be correct.
> 
> Fixed.  (When I say "fixed", I mean that I committed a fix in my own
> fork of Subversion:  http://github.com/dtrebbien/subversion )
> 

Fine, but please post patches and log messages to this list as usual.
It's easier for me/us to process them that way.  Thanks.

> >> +svn_error_t *
> >> +svn_subst_translate_cstring2(const char *src,
> >> +                             const char **dst,
> >> +                             const char *eol_str,
> >> +                             svn_boolean_t repair,
> >> +                             apr_hash_t *keywords,
> >> +                             svn_boolean_t expand,
> >> +                             apr_pool_t *pool)
> >> +{
> >> +  return translate_cstring2(dst, NULL, src, eol_str, repair, keywords, 
> >> expand,
> >> +                            pool);
> >> +}
> >
> > So, this is revved because svn_subst_translate_string2() needs it (and
> > svn_subst_translate_string() doesn't).
> 
> I am unsure of what you mean.
> 
> I took the original implementation of `svn_subst_translate_cstring2`
> and placed it in the new static utility function `translate_cstring2`.
> `translate_cstring2` accepts another parameter, and its definition
> (the old definition on `svn_subst_translate_cstring2`) was
> appropriately modified to handle this new parameter. This way, I don't
> have to modify the public API.
> 

Yes, yes.  You revved three svn_subst_* functions; one publicly and two
privately.  So I was pointing out (to myself), on each of the latter
two, where is the call site that needs the additional functionality.

> > I suggest you go ahead (respond to the review, submit an updated patch,
> > etc) without waiting; the svnsync part will eventually get its share of
> > reviews.  Also, you may want to consider splitting this patch into two
> > parts (subst work separate from svnsync work).
> 
> I like the idea of splitting this into two patches.

Okay.  Please post an updated patch to this list when you have it ready :-)

Best,

Daniel



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Trebbien
Daniel Trebbien  gmail.com> writes:
> Yes, I did copy-and-paste. I see your point, though, and I am going to
> correct this by adding another `static` function that both will call.

Never mind the talk of a new static function. `svn_subst_translate_string` 
should
of course call `svn_subst_translate_string2` :)



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Trebbien
Daniel Shahaf  daniel.shahaf.name> writes:
> 
> Gavin, thanks for the information, but I think Daniel Trebbien already
> knows this. 
> 
> Gavin Beau Baumanis wrote on Sun, Sep 19, 2010 at 08:52:05 +1000:
> > Hi Daniel,
> > 
> > You should always feel comfortable with "pimping" your own patch 
> > submissions.
> > What you have done is completely fine.
> > 
> > But just in case you don't know - the Subversion project does have a Patch
Manager.
> > The role of the Patch Manager is to ensure that patch submissions (mostly)
by people who don;t have commit
> rights to the svn repository don't get forgotten.
> > 
> > Normally, I wait about six or seven days prior to "pinging" the mailing list
about a patch that has not
> received any review.
> > 
> > It can take a little while to get a patch reviewed because of a few reasons
that I am sure you can appreciate.
> > * The reviewer needs to have a solid understanding of the subject matter.
Some people have appropriate
> knowledge of only some parts of the code base.
> > * People are busy doing their "real" job / family.
> > * People are busy working on their own space of the SVN project.
> > 
> > Let me re-iterate that there is absolutely nothing wrong with you reminding
us all about your own patch
> submission. 
> > 
> > The ultimate aim of this email is that any and all patch submissions are
ALWAYS gratefully appreciated by
> the SVN project / community.
> > Sometimes it just takes a little while to get a patch appropriately reviewed
/ committed.
> > 
> > 
> > Gavin "Beau" Baumanis

I read http://subversion.apache.org/docs/community-guide/general.html#patches ,
but I think that it is good to know the preferred number of days to wait before
reposting. Gavin, I appreciated your response.



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Trebbien
On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf  wrote:
>> * subversion/include/svn_subst.h
>>   Added documentation of the new public API function
>>   svn_subst_translate_string2().
>>
>
> Should use this syntax:
>
> * subversion/include/svn_subst.h
>  (svn_subst_translate_string2): New function.

Okay.

>> * subversion/libsvn_subr/subst.c
>>   (translation_baton): Added TRANSLATED_EOL field. It's where 
>> translate_chunk()
>>     records that it translated a line ending.
>>   (create_translation_baton): Added a new parameter TRANSLATED_EOL. 
>> Initialize
>>     the TRANSLATED_EOL field of the resulting translation_baton to this 
>> value.
>>   (translate_chunk): After translating a line ending, set the value that is
>>     pointed to by TRANSLATED_EOL (from the translation baton) to TRUE.
>
> Could just say "Record that an EOL translation has been done" or similar.
>
> In general, the log message looks good, but to my taste it is borders
> the 'too verbose a log message' line.  (I might have written the same
> without the last sentence of each bullet.)  Remember that the log
> message must never /repeat/ the code (or in-code comments); rather, it
> should describe them from a high level.

Okay. I will attempt to make the log message less verbose.

>> Index: subversion/include/svn_subst.h
>> ===
>> --- subversion/include/svn_subst.h    (revision 996730)
>> +++ subversion/include/svn_subst.h    (working copy)
>> @@ -598,6 +598,26 @@ svn_error_t *svn_subst_translate_string(svn_string
>>                                          const char *encoding,
>>                                          apr_pool_t *pool);
>>
>> +/** Translate the data in @a value (assumed to be in encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default language encoding.
>> + * If @p translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>
> You use @a, @p, and @code.  Existing code uses only @a.  Why the
> difference?

Oops. @p should be @a.

I believe that in Doxygen, @c WORD is short for @code WORD @endode.

>> + * character of @p value in the source charset was translated to UTF-8;
>
> Need the word "to" after the semicolon for the English to be correct.

Fixed.  (When I say "fixed", I mean that I committed a fix in my own
fork of Subversion:  http://github.com/dtrebbien/subversion )

>> + * @c FALSE otherwise. If @p translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF; �...@c FALSE otherwise. Return the 
>> translated data in @a *new_value,
>> + * allocated in @a pool.
>> + *
>
> Need to rewrap.

Fixed.

>> + * @since New in 1.6
>
> s/1.6/1.7/

Fixed.

>> + */
>> +svn_error_t *svn_subst_translate_string2(svn_string_t **new_value,
>
> I realize that you followed the precedent of svn_subst_translate_string(),
> but please:
>
> Return type on own line, function name on the next line.

Fixed.

>> +                                         svn_boolean_t *translated_to_utf8,
>> +                                         svn_boolean_t 
>> *translated_line_endings,
>> +                                         const svn_string_t *value,
>> +                                         const char *encoding,
>> +                                         apr_pool_t *pool);
>> +
>>  /** Translate the data in @a value from UTF8 and LF line-endings into
>
> Need to update svn_subst_translate_string()'s docstring and
> implementation and mark it deprecated:
> http://subversion.apache.org/docs/community-guide/community-guide.html#deprecation
> http://mid.gmane.org/20100814132023.089a22388...@eris.apache.org

Okay.

>>   * native locale and native line-endings, or to the output locale if
>>   * @a for_output is TRUE.  Return the translated data in @a
>
>> Index: subversion/libsvn_subr/subst.c
>> ===
>> --- subversion/libsvn_subr/subst.c    (revision 996730)
>> +++ subversion/libsvn_subr/subst.c    (working copy)
>> @@ -765,6 +765,7 @@ svn_subst_keywords_differ2(apr_hash_t *a,
>>  struct translation_baton
>>  {
>>    const char *eol_str;
>> +  svn_boolean_t *translated_eol;
>>    svn_boolean_t repair;
>>    apr_hash_t *keywords;
>>    svn_boolean_t expand;
>> @@ -805,6 +806,7 @@ struct translation_baton
>>   */
>>  static struct translation_baton *
>>  create_translation_baton(const char *eol_str,
>> +                         svn_boolean_t *translated_eol,
>>                           svn_boolean_t repair,
>>                           apr_hash_t *keywords,
>>                           svn_boolean_t expand,
>> @@ -818,6 +820,7 @@ create_translation_baton(const char *eol_str,
>>
>>    b->eol_str = eol_str;
>>    b->eol_str_len = eol_str ? strlen(eol_str) : 0;
>> +  b

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-19 Thread Daniel Shahaf
(to stsp and other interested parties: I reviewed only the libsvn_subr
part of the patch; see last paragraph for details)

Daniel Trebbien wrote on Mon, Sep 13, 2010 at 19:22:38 -0700:
> Attached are the new patch and log message.
> 
> I have already shared one test repository with this mailing list that
> is useful in testing that revprops are re-encoded properly. Since
> then, I have created two more:  test2 is useful in testing that
> directory properties (in this case `svn:ignore`) are re-encoded, and
> test3 is useful in testing that line endings are normalized to
> Unix-style in revprops (and that the "# normalized" message is still
> there). Archives of these new repositories have also been attached.
> Also, this shell script makes testing with these repositories *much*
> easier:  http://pastebin.com/gz1pmngc
> 
> 
> #! /usr/bin/env sh
> TEST=test3  # change this line to test with a different repository
> rm -Rf ${TEST}_mirror
> svnadmin create ${TEST}_mirror
> cat <<'EOF' > ${TEST}_mirror/hooks/pre-revprop-change
> #! /usr/bin/env sh
> USER="$3"
> 
> if [ "$USER" = "svnsync" ]; then exit 0; fi
> 
> echo "Only the svnsync user can change revprops" >&2
> exit 1
> EOF
> chmod +x ${TEST}_mirror/hooks/pre-revprop-change
> svnsync init \
> --username svnsync --source-encoding ISO-8859-1 \
> file://`pwd`/${TEST}_mirror file://`pwd`/${TEST}
> svnsync sync \
> --username svnsync --source-encoding ISO-8859-1 \
> file://`pwd`/${TEST}_mirror

> Subject: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 
> (v. 2)
> 
> [[[
> Add a command line option (--source-encoding) to the svnsync init, sync, and
> copy-revprops subcommands that allows the user to specify the character
> encoding of translatable properties from the source repository. This is needed
> to allow svnsync to sync some older Subversion repositories that have
> properties that were not encoded in UTF-8.
> 

Could you mention in here (at the paragraphs beginning the log message)
that you're adding a new API, too?  I think it should be mentioned since
that forms a significant part of the change.

> As discussed at:
>   http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
>   http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
> 
> * subversion/include/svn_subst.h
>   Added documentation of the new public API function
>   svn_subst_translate_string2().
> 

Should use this syntax:

* subversion/include/svn_subst.h
  (svn_subst_translate_string2): New function.

(and mention svn_subst_translate_string() too, per below review comments)

> * subversion/libsvn_subr/subst.c
>   (translation_baton): Added TRANSLATED_EOL field. It's where 
> translate_chunk()
> records that it translated a line ending.
>   (create_translation_baton): Added a new parameter TRANSLATED_EOL. Initialize
> the TRANSLATED_EOL field of the resulting translation_baton to this value.
>   (translate_chunk): After translating a line ending, set the value that is
> pointed to by TRANSLATED_EOL (from the translation baton) to TRUE.

Could just say "Record that an EOL translation has been done" or similar.

In general, the log message looks good, but to my taste it is borders
the 'too verbose a log message' line.  (I might have written the same
without the last sentence of each bullet.)  Remember that the log
message must never /repeat/ the code (or in-code comments); rather, it
should describe them from a high level.

>   (stream_translated): New static function. Its implementation is the old
> implementation of svn_subst_stream_translated(), but accepting another
> parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
> that it creates.
>   (svn_subst_stream_translated): Now a wrapper for stream_translated().
>   (translate_cstring2): New static function. Its implementation is the old
> implementation of svn_subst_translate_cstring2(), but modified to accept
> another parameter, TRANSLATED_EOL, that is passed to stream_translated().
>   (svn_subst_translate_cstring2): Now a wrapper for translate_cstring2().

So, you had to "internally" revv both svn_subst_stream_translated() and
svn_subst_translate_cstring2().  I wonder why...

(self-answered hereinbelow)

>   (svn_subst_translate_string2): New function. It does what
> svn_subst_translate_string() does, but also records whether it
> translated a line ending or performed re-encoding.
> 
> * subversion/svnsync/main.c
>   (svnsync__opt) Added svnsync_opt_source_encoding.
>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list of
> acceptable options for the in

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-19 Thread Stefan Sperling
On Sun, Sep 19, 2010 at 11:14:02AM +0100, Daniel Shahaf wrote:
> Thanks for the reminder.  I've been at Hursley this weekend (and haven't
> had much time for coding!), but it's on my list.

Same here (at Hursely and interested in the patch).
Feel free to prod me about it if you don't hear from anyone else in a
while. I'll try to make time to look at it.

Stefan


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-19 Thread Daniel Shahaf
Gavin, thanks for the information, but I think Daniel Trebbien already
knows this. :-)

Gavin Beau Baumanis wrote on Sun, Sep 19, 2010 at 08:52:05 +1000:
> Hi Daniel,
> 
> You should always feel comfortable with "pimping" your own patch submissions.
> What you have done is completely fine.
> 
> But just in case you don't know - the Subversion project does have a Patch 
> Manager.
> The role of the Patch Manager is to ensure that patch submissions (mostly) by 
> people who don;t have commit rights to the svn repository don't get forgotten.
> 
> Normally, I wait about six or seven days prior to "pinging" the mailing list 
> about a patch that has not received any review.
> 
> It can take a little while to get a patch reviewed because of a few reasons 
> that I am sure you can appreciate.
> * The reviewer needs to have a solid understanding of the subject matter. 
> Some people have appropriate knowledge of only some parts of the code base.
> * People are busy doing their "real" job / family.
> * People are busy working on their own space of the SVN project.
> 
> Let me re-iterate that there is absolutely nothing wrong with you reminding 
> us all about your own patch submission. 
> 
> The ultimate aim of this email is that any and all patch submissions are 
> ALWAYS gratefully appreciated by the SVN project / community.
> Sometimes it just takes a little while to get a patch appropriately reviewed 
> / committed.
> 
> 
> Gavin "Beau" Baumanis
> 
> 
> 
> On 19/09/2010, at 7:33 AM, Daniel Trebbien wrote:
> 
> > It has been a few days since I submitted my revised patch
> > (http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550),
> > and I just want to make sure that it hasn't been forgotten. I am waiting for
> > the changes to be committed before I work on related enhancements.
> > 
> 


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-19 Thread Daniel Shahaf
Thanks for the reminder.  I've been at Hursley this weekend (and haven't
had much time for coding!), but it's on my list.

Daniel Trebbien wrote on Sat, Sep 18, 2010 at 21:33:07 +:
> It has been a few days since I submitted my revised patch
> (http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550),
> and I just want to make sure that it hasn't been forgotten. I am waiting for
> the changes to be committed before I work on related enhancements.
> 


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-18 Thread Gavin Beau Baumanis
Hi Daniel,

You should always feel comfortable with "pimping" your own patch submissions.
What you have done is completely fine.

But just in case you don't know - the Subversion project does have a Patch 
Manager.
The role of the Patch Manager is to ensure that patch submissions (mostly) by 
people who don;t have commit rights to the svn repository don't get forgotten.

Normally, I wait about six or seven days prior to "pinging" the mailing list 
about a patch that has not received any review.

It can take a little while to get a patch reviewed because of a few reasons 
that I am sure you can appreciate.
* The reviewer needs to have a solid understanding of the subject matter. Some 
people have appropriate knowledge of only some parts of the code base.
* People are busy doing their "real" job / family.
* People are busy working on their own space of the SVN project.

Let me re-iterate that there is absolutely nothing wrong with you reminding us 
all about your own patch submission. 

The ultimate aim of this email is that any and all patch submissions are ALWAYS 
gratefully appreciated by the SVN project / community.
Sometimes it just takes a little while to get a patch appropriately reviewed / 
committed.


Gavin "Beau" Baumanis



On 19/09/2010, at 7:33 AM, Daniel Trebbien wrote:

> It has been a few days since I submitted my revised patch
> (http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550),
> and I just want to make sure that it hasn't been forgotten. I am waiting for
> the changes to be committed before I work on related enhancements.
> 



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-18 Thread Daniel Trebbien
It has been a few days since I submitted my revised patch
(http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550),
and I just want to make sure that it hasn't been forgotten. I am waiting for
the changes to be committed before I work on related enhancements.



Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-13 Thread Daniel Shahaf
Stefan Sperling wrote on Mon, Sep 13, 2010 at 12:16:44 +0200:
> Having a similar reporting feature in "svnadmin verify", and ways to
> fix these problems with "svndumpfilter" or maybe even "svnadmin recover"
> wouldn't hurt either.

The verify functionality is part of the FS layer; but the FS layer
doesn't treat svn:* props specially --- the requirement for UTF-8/LF is
only introduced at the libsvn_repos layer (e.g., validate_prop() in
fs-wrap.c is where it's enforced).  So there are some layering/design
issues here.

Daniel
(compare path_valid() in fs_loader.c, which enforces an FS-level
requirement)


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-13 Thread Stefan Sperling
On Sun, Sep 12, 2010 at 04:21:03PM -0700, Daniel Trebbien wrote:
> I removed the normalization counting code without much thought in my
> hastened efforts to produce a version of `svnsync` that I could use to
> mirror the GNU Nano repository. Currently, I am thinking that Stefan
> Sperling's idea of a `svn_subst_translate_string2` function is the way
> to go.

FWIW, I agree that knowing how many properties were tweaked by svnsync
isn't that interesting. It could really be debugging message.

However, telling users that the source repository has problems is a
good thing, but we should provide more information than "X of your
props have bad line-endings and/or the wrong encoding".

Would you be willing to work on a follow-up patch that improves svnsync's
reporting about problems in the source repository?
E.g. it could print per-property information as it processes revisions.
Something like:

  Copied properties for revision 42.
  Normalized line-endings to LF in:
   svn:ignore (/trunk, /trunk/src)
   svn:externals (/trunk/ext)
   svn:log
  Re-encoded to UTF-8 from ISO-8859-1:
   svn:log
   svn:author

Having a similar reporting feature in "svnadmin verify", and ways to
fix these problems with "svndumpfilter" or maybe even "svnadmin recover"
wouldn't hurt either.

Stefan


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-13 Thread Daniel Shahaf
Daniel Trebbien wrote on Sun, Sep 12, 2010 at 16:21:03 -0700:
> On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf  
> wrote:
> > Unrelatedly, you mentioned that in the repository you work on there are
> > soem properties in latin1 and some in utf8.  So one will need (until
> > they fix the properties on their side) to svnsync a few revisions with
> > translation enabled, then kill svnsync and restart with translation
> > disabled, then restart again with it enabled etc.  Which makes me think,
> > do we want a "sync up to N revisions" (or, "sync up to rN") switch?
> 
> It's like you are reading my mind :)  I figured that I would work on
> getting this change implemented and then work on such a feature.
> 

Okay.  I just took a look at the code, and it seems a pretty
straightforward patch.

> > Have you sent a new version of the patch yet?
> 
> Oh, not yet. I'm still working on it.

Okay :)


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-12 Thread Daniel Trebbien
On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf  wrote:
>> When I was working on my changes, I was looking for a "to UTF-8"
>> function that would return whether it actually re-encoded the input
>> string, but did not find one. The re-encoding function that I used,
>> `svn_subst_translate_string`, actually converts line endings to LF as
>> well as re-encodes from the given encoding to UTF-8, but it does not
>> inform the caller of whether it took either action. I guess that I
>> could write a utility function, kind of like a `strcmp`, but which
>> ignores any differences at line endings. Unfortunately, this adds
>> another scan through every property value that is encountered. Already
>> there is a noticeable decrease in the performance of the modified
>> `svnsync` as a result of calling `svn_subst_translate_string` on
>> basically every property value, and adding an additional scan through
>> each property value would decrease performance further.
>>
>
> Or you could insert the reencoding magic after (and separately from) the
> dos2unix magic, if that would make counting easier.  That said, what are
> you trying to count?  The number of properties where the reencoding
> wasn't a noop?

To re-encode and then normalize the line endings would work.
Unfortunately, I didn't see a library function that only performed the
re-encoding;  `svn_subst_translate_string` does both simultaneously.

I removed the normalization counting code without much thought in my
hastened efforts to produce a version of `svnsync` that I could use to
mirror the GNU Nano repository. Currently, I am thinking that Stefan
Sperling's idea of a `svn_subst_translate_string2` function is the way
to go.

> Re performance, isn't svnsync bound by network speed?

Mostly yes. However, I have definitely noticed a decrease in
performance with my altered version (when using --source-encoding)
that cannot be explained by network speed. Granted, it's not that much
of a difference.

> Unrelatedly, you mentioned that in the repository you work on there are
> soem properties in latin1 and some in utf8.  So one will need (until
> they fix the properties on their side) to svnsync a few revisions with
> translation enabled, then kill svnsync and restart with translation
> disabled, then restart again with it enabled etc.  Which makes me think,
> do we want a "sync up to N revisions" (or, "sync up to rN") switch?

It's like you are reading my mind :)  I figured that I would work on
getting this change implemented and then work on such a feature.

> Have you sent a new version of the patch yet?

Oh, not yet. I'm still working on it.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-12 Thread Daniel Shahaf
Daniel Trebbien wrote on Fri, Sep 10, 2010 at 16:57:41 -0700:
> >> * subversion/svnsync/main.c
> >>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> >> of acceptable options for the init, sync, and copy-revprops
> >> subcommands.
> >
> > Missing indentation on the non-first lines.  Should be in one of these 
> > forms:
> 
> Gmail added in the line breaks, but they aren't in my log message text
> file. Should I wrap the log message to a certain number of characters?

Yes, 79 or 80.

> >>   (log_properties_normalized): Removed this obsolete function.
> >
> > In other words, you dropped the entire "count changes" feature, without
> > any discussion.  (It's good that the log message is clear about this
> > change, though.)
> >
> > Please don't do that.  Someone spent some time writing and debugging
> > this feature, you can't just come in and drop their code.  If you really
> > think this is redundant, then start a discussion and get consensus to
> > drop this functionality.
> >
> > And at any rate, your "recode log messages" work needn't depend on it.
> > It can (IMO: should) even introduce counters for how many properties it
> > recoded.
> 
> When I was working on my changes, I was looking for a "to UTF-8"
> function that would return whether it actually re-encoded the input
> string, but did not find one. The re-encoding function that I used,
> `svn_subst_translate_string`, actually converts line endings to LF as
> well as re-encodes from the given encoding to UTF-8, but it does not
> inform the caller of whether it took either action. I guess that I
> could write a utility function, kind of like a `strcmp`, but which
> ignores any differences at line endings. Unfortunately, this adds
> another scan through every property value that is encountered. Already
> there is a noticeable decrease in the performance of the modified
> `svnsync` as a result of calling `svn_subst_translate_string` on
> basically every property value, and adding an additional scan through
> each property value would decrease performance further.
> 

Or you could insert the reencoding magic after (and separately from) the
dos2unix magic, if that would make counting easier.  That said, what are
you trying to count?  The number of properties where the reencoding
wasn't a noop?

Re performance, isn't svnsync bound by network speed?

Unrelatedly, you mentioned that in the repository you work on there are
soem properties in latin1 and some in utf8.  So one will need (until
they fix the properties on their side) to svnsync a few revisions with
translation enabled, then kill svnsync and restart with translation
disabled, then restart again with it enabled etc.  Which makes me think,
do we want a "sync up to N revisions" (or, "sync up to rN") switch?

> I for one do not think that the final "NOTE: Normalized ## properties
> to LF line endings" messages are particularly helpful because they do
> not say *which* properties of specific revisions were normalized. As a
> possible compromise, I could modify a Perl script that I wrote for the
> purpose of identifying all, non-ASCII property values from the GNU
> Nano repository to list revisions and property values that *would*
> require normalization and/or re-encoding. Just a thought...
> 
> >> * subversion/svnsync/sync.c
> >>   Standardized terminology by replacing "normalize" with "translate".
> >
> > Just stick to the existing terminology please.
> 
> Okay. I'll change it back.
> 

Thanks.

> >> @@ -501,13 +507,11 @@
> >> +      SVN_ERR(translate_string(&value, &was_translated, 
> >> eb->prop_encoding, pool));
> >
> > Please wrap to 80 characters.
> 
> Fixed

Have you sent a new version of the patch yet?


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-12 Thread Karl Fogel
Daniel Shahaf  writes:
>Daniel Trebbien wrote on Fri, Sep 10, 2010 at 12:24:29 -0700:
>> After applying my patch and recompiling `svnsync`, QA can test with
>> the script at:  http://pastebin.com/g0n9uqYr
>
>QA?  That's not how open source works :-)
>
>Daniel
>(re QA remark, I couldn't find an appropriate section in Karl's book...)

  http://producingoss.com/en/funding-non-programming.html#subsidize-qa

:-)


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-11 Thread Stefan Sperling
On Fri, Sep 10, 2010 at 04:57:41PM -0700, Daniel Trebbien wrote:
> When I was working on my changes, I was looking for a "to UTF-8"
> function that would return whether it actually re-encoded the input
> string, but did not find one. The re-encoding function that I used,
> `svn_subst_translate_string`, actually converts line endings to LF as
> well as re-encodes from the given encoding to UTF-8, but it does not
> inform the caller of whether it took either action.

If you need that information, why not make svn_subst_translate_string()
return it? Note that because it's a public function, you'll have to create
a new revision of it: svn_subst_translate_string2(). The function as it is
now needs to stay around for backwards compatibility.

> > (As I said: if you think counters should be removed completely, make
> > your case in a separate thread.  That's orthogonal to the encoding work.)
> 
> Would dev@ or users@ be better?

dev@ would be better.

Thanks,
Stefan


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
>> * subversion/svnsync/main.c
>>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
>> of acceptable options for the init, sync, and copy-revprops
>> subcommands.
>
> Missing indentation on the non-first lines.  Should be in one of these forms:
>
> [[[
> * subversion/svnsync/main.c
>  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
>     of acceptable options for the init, sync, and copy-revprops subcommands.
> ]]]
>
> [[[
> * subversion/svnsync/main.c
>  (svnsync_cmd_table):
>     Added svnsync_opt_source_encoding to the list of acceptable options for
>     the init, sync, and copy-revprops subcommands.
> ]]]

Gmail added in the line breaks, but they aren't in my log message text
file. Should I wrap the log message to a certain number of characters?

>>   (log_properties_normalized): Removed this obsolete function.
>
> In other words, you dropped the entire "count changes" feature, without
> any discussion.  (It's good that the log message is clear about this
> change, though.)
>
> Please don't do that.  Someone spent some time writing and debugging
> this feature, you can't just come in and drop their code.  If you really
> think this is redundant, then start a discussion and get consensus to
> drop this functionality.
>
> And at any rate, your "recode log messages" work needn't depend on it.
> It can (IMO: should) even introduce counters for how many properties it
> recoded.

When I was working on my changes, I was looking for a "to UTF-8"
function that would return whether it actually re-encoded the input
string, but did not find one. The re-encoding function that I used,
`svn_subst_translate_string`, actually converts line endings to LF as
well as re-encodes from the given encoding to UTF-8, but it does not
inform the caller of whether it took either action. I guess that I
could write a utility function, kind of like a `strcmp`, but which
ignores any differences at line endings. Unfortunately, this adds
another scan through every property value that is encountered. Already
there is a noticeable decrease in the performance of the modified
`svnsync` as a result of calling `svn_subst_translate_string` on
basically every property value, and adding an additional scan through
each property value would decrease performance further.

I for one do not think that the final "NOTE: Normalized ## properties
to LF line endings" messages are particularly helpful because they do
not say *which* properties of specific revisions were normalized. As a
possible compromise, I could modify a Perl script that I wrote for the
purpose of identifying all, non-ASCII property values from the GNU
Nano repository to list revisions and property values that *would*
require normalization and/or re-encoding. Just a thought...

>> * subversion/svnsync/sync.c
>>   Standardized terminology by replacing "normalize" with "translate".
>
> Please don't do this.  It makes it harder to read the diff (it obscures
> the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
> and in my opinion "normalize" is the more accurate term for what you're
> doing now after the patch.  :-)
>
> Just stick to the existing terminology please.

Okay. I'll change it back.

>> @@ -785,7 +772,7 @@
>>  {
>>    const char *to_url, *from_url;
>>    svn_ra_session_t *to_session;
>> -  opt_baton_t *opt_baton = b;
>> +  opt_baton_t *opt_baton = (opt_baton_t*)b;
>
> Unnecessary change.  As a rule, patches shouldn't have them.

Reverted.

>> @@ -1625,9 +1589,9 @@
>>  /* SUBCOMMAND: help */
>>  static svn_error_t *
>> -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
>> +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
>>  {
>
> Why did you rename the formal parameter?  Isn't it another unnecessary
> change (as I explained above) that shouldn't be randomly included in
> a patch?

I renamed it because all of the other functions that take a void*
parameter for passing the opt_baton_t* call it `b`. It doesn't matter,
though. I'll change it back.

>> @@ -1982,6 +1951,8 @@
>>
>>    config = apr_hash_get(opt_baton.config, SVN_CONFIG_CATEGORY_CONFIG,
>>                          APR_HASH_KEY_STRING);
>> +
>> +  opt_baton.source_encoding = source_encoding;
>>
>
> It seems that most other options are parsed into opt_baton.foo directly,
> skipping a local variable.

Okay. I'll change that.

>> Index: subversion/svnsync/sync.c
>> ===
>> --- subversion/svnsync/sync.c (revision 995839)
>> +++ subversion/svnsync/sync.c (working copy)
>> @@ -45,29 +45,39 @@
>> -/* Normalize the line ending style of *STR, so that it contains only
>> - * LF (\n) line endings. After return, *STR may point at a new
>> - * svn_string_t* allocated from POOL.
>> +/* Translate the encoding and line ending style of *STR, so that it contains
>> + * only LF (\n) line endings and is encoded in UTF-8. After return, *STR may
>> + * point at a new svn_string_t* allocated from POOL if *WAS_TRA

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Shahaf
Daniel Trebbien wrote on Fri, Sep 10, 2010 at 12:24:29 -0700:
> The test repository (attached) was created with Subversion version
> 1.6.12 (r955767) on a Debian Squeeze AMD64 system.

Since you're working on this code, let me point out that 'svnadmin load'
works at a level below the UTF-8 validation (the latter occurs in
libsvn_repos).

> After applying my patch and recompiling `svnsync`, QA can test with
> the script at:  http://pastebin.com/g0n9uqYr

QA?  That's not how open source works :-)

Daniel
(re QA remark, I couldn't find an appropriate section in Karl's book...)


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Shahaf
Daniel Trebbien wrote on Fri, Sep 10, 2010 at 11:12:20 -0700:
> On Fri, Sep 10, 2010 at 9:16 AM, Daniel Shahaf  
> wrote:
> > The patch didn't make it to the list (only to my personal copy), could
> > you please re-send with the patch in "text/plain" MIME type?
> 
> Hmmm. I wonder if adding a .txt extension will work...

> [[[
> Add a command line option to the svnsync init, sync, and copy-revprops
> subcommands (--source-encoding) that allows the user to specify the
> character encoding of translatable properties from the source
> repository. This is needed to allow svnsync to import some older

s/import/sync/

> Subversion repositories that have properties that were not encoded in
> UTF-8.
> 
> As discussed at:
> http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
> 

Nice, I see in this link even the posts that were CCed only to d...@.  gmane++

> * subversion/svnsync/main.c
>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> of acceptable options for the init, sync, and copy-revprops
> subcommands.

Missing indentation on the non-first lines.  Should be in one of these forms:

[[[
* subversion/svnsync/main.c
  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
 of acceptable options for the init, sync, and copy-revprops subcommands.
]]]

[[[
* subversion/svnsync/main.c
  (svnsync_cmd_table):
 Added svnsync_opt_source_encoding to the list of acceptable options for
 the init, sync, and copy-revprops subcommands.
]]]


>   (log_properties_normalized): Removed this obsolete function.

In other words, you dropped the entire "count changes" feature, without
any discussion.  (It's good that the log message is clear about this
change, though.)

Please don't do that.  Someone spent some time writing and debugging
this feature, you can't just come in and drop their code.  If you really
think this is redundant, then start a discussion and get consensus to
drop this functionality.

And at any rate, your "recode log messages" work needn't depend on it.
It can (IMO: should) even introduce counters for how many properties it
recoded.

> * subversion/svnsync/sync.c
>   Standardized terminology by replacing "normalize" with "translate".

Please don't do this.  It makes it harder to read the diff (it obscures
the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
and in my opinion "normalize" is the more accurate term for what you're
doing now after the patch.  :-)

Just stick to the existing terminology please.

> * subversion/svnsync/sync.h

Overall, log message looks good, wrt amount of detail, etc.  (But
I haven't compared it point-by-point with the actual patch.)

> ]]]
> 

> +++ subversion/svnsync/main.c (working copy)
> @@ -104,8 +105,8 @@
>   "the destination repository by any method other than 'svnsync'.\n"
>   "In other words, the destination repository should be a read-only\n"
>   "mirror of the source repository.\n"),
> -  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
> -svnsync_opt_disable_locking } },
> +  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
> +svnsync_opt_allow_non_empty, svnsync_opt_disable_locking } },

Apparently our convention is to append new options to the end of the
array.  But that's not a major concern.

> @@ -200,6 +203,12 @@
> +{"source-encoding", svnsync_opt_source_encoding, 1,
> +   N_("convert translatable properties from encoding 
> ARG\n"
> +  "to UTF-8. If not specified, then properties are\n"
> +  "presumed to be encoded in UTF-8.")},

"Translatable properties" may be confusing to an end user, but I don't
have a better suggestion.

> @@ -227,6 +236,7 @@
>const char *sync_password;
>const char *config_dir;
>apr_hash_t *config;
> +  const char *source_encoding;
>svn_boolean_t disable_locking;
>svn_boolean_t quiet;
>svn_boolean_t allow_non_empty;
> @@ -352,6 +362,9 @@
>svn_boolean_t quiet;
>svn_boolean_t allow_non_empty;
>const char *to_url;
> +  
> +  /* initialize, synchronize, and copy-revprops only */
> +  const char *source_encoding;
>  
>/* initialize only */
>const char *from_url;

That's just my preference now, but could you use 'svn diff -x-p' so that
function/struct names show in the hunk headers?

@dev: how about making the patch submission guidelines recommend '-x-p'
be used regularly in patch submissions?

> @@ -785,7 +772,7 @@
>  {
>const char *to_url, *from_url;
>svn_ra_session_t *to_session;
> -  opt_baton_t *opt_baton = b;
> +  opt_baton_t *opt_baton = (opt_baton_t*)b;

Unnecessary change.  As a rule, patches shouldn't have them.

(the same cast is added again later in the diff)

> @@ -1625,9 +1589,9 @@
>  /* SUBCOMMAND: help */
>  static svn_error_t *
> -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
> +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
>  {

Why did you 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
Note:  I was using the GNU Nano repository (svn://svn.sv.gnu.org/nano)
to test my changes, but it's not ideal because the first commit where
the --source-encoding feature is needed is revision 619. Also, as of
now, 29 of the 4518 commits have non-ASCII log messages with 21
encoded in ISO-8859-1 and the rest encoded in UTF-8. Therefore, to
facilitate testing I created a "test" repository.

The test repository (attached) was created with Subversion version
1.6.12 (r955767) on a Debian Squeeze AMD64 system. The repository has
two commits, one made by "alíce" and the other made by "bob". The
first commit has an `svn:author` revprop that is ISO-8859-1-encoded
and the second has an `svn:log` revprop that is ISO-8859-1-encoded.
After applying my patch and recompiling `svnsync`, QA can test with
the script at:  http://pastebin.com/g0n9uqYr

#! /usr/bin/env sh
tar zxvf test.tar.gz
svn log file://`pwd`/test
svnadmin create test_mirror
cat <<'EOF' > test_mirror/hooks/pre-revprop-change
#! /usr/bin/env sh
USER="$3"

if [ "$USER" = "svnsync" ]; then exit 0; fi

echo "Only the svnsync user can change revprops" >&2
exit 1
EOF
chmod +x test_mirror/hooks/pre-revprop-change
svnsync init --username svnsync \
--source-encoding ISO-8859-1 \
file://`pwd`/test_mirror file://`pwd`/test
svnsync sync --username svnsync \
--source-encoding ISO-8859-1 \
file://`pwd`/test_mirror
svn log file://`pwd`/test_mirror


test.tar.gz
Description: GNU Zip compressed data


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
On Fri, Sep 10, 2010 at 9:16 AM, Daniel Shahaf  wrote:
> The patch didn't make it to the list (only to my personal copy), could
> you please re-send with the patch in "text/plain" MIME type?

Hmmm. I wonder if adding a .txt extension will work...
Index: subversion/svnsync/sync.h
===
--- subversion/svnsync/sync.h   (revision 995839)
+++ subversion/svnsync/sync.h   (working copy)
@@ -33,15 +33,15 @@
 #include "svn_delta.h"
 
 
-/* Normalize the line ending style of the values of properties in REV_PROPS
- * that "need translation" (according to svn_prop_needs_translation(),
- * currently all svn:* props) so that they contain only LF (\n) line endings.
- * The number of properties that needed normalization is returned in
- * *NORMALIZED_COUNT.
+/* Translate the encoding and line ending style of the values of properties
+ * in rev_props that "need translation" (according to
+ * svn_prop_needs_translation(), which is currently all svn:* props) so that
+ * they are encoded in UTF-8 and contain only LF (\n) line endings. No
+ * re-encoding is performed if encoding is NULL.
  */
 svn_error_t *
-svnsync_normalize_revprops(apr_hash_t *rev_props,
-   int *normalized_count,
+svnsync_translate_revprops(apr_hash_t *rev_props,
+   const char *encoding,
apr_pool_t *pool);
 
 
@@ -52,18 +52,20 @@
  * which the commit is being made.
  *
  * As the sync editor encounters property values, it might see the need to
- * normalize them (to LF line endings). Each carried out normalization adds 1
- * to the *NORMALIZED_NODE_PROPS_COUNTER (for notification).
+ * translate them (re-encode and/or change to LF line endings).
+ * If PROP_ENCODING is NULL, then property values are presumed to be encoded
+ * in UTF-8 and are not re-encoded. Otherwise, the property values are
+ * presumed to be encoded in PROP_ENCODING, and are translated to UTF-8.
  */
 svn_error_t *
 svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
 void *wrapped_edit_baton,
 svn_revnum_t base_revision,
 const char *to_url,
+const char *prop_encoding,
 svn_boolean_t quiet,
 const svn_delta_editor_t **editor,
 void **edit_baton,
-int *normalized_node_props_counter,
 apr_pool_t *pool);
 
 
Index: subversion/svnsync/main.c
===
--- subversion/svnsync/main.c   (revision 995839)
+++ subversion/svnsync/main.c   (working copy)
@@ -61,6 +61,7 @@
   svnsync_opt_sync_password,
   svnsync_opt_config_dir,
   svnsync_opt_config_options,
+  svnsync_opt_source_encoding,
   svnsync_opt_disable_locking,
   svnsync_opt_version,
   svnsync_opt_trust_server_cert,
@@ -104,8 +105,8 @@
  "the destination repository by any method other than 'svnsync'.\n"
  "In other words, the destination repository should be a read-only\n"
  "mirror of the source repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
-svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_allow_non_empty, svnsync_opt_disable_locking } },
 { "synchronize", synchronize_cmd, { "sync" },
   N_("usage: svnsync synchronize DEST_URL [SOURCE_URL]\n"
  "\n"
@@ -117,7 +118,8 @@
  "source URL.  Specifying SOURCE_URL is recommended in particular\n"
  "if untrusted users/administrators may have write access to the\n"
  "DEST_URL repository.\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_disable_locking } },
 { "copy-revprops", copy_revprops_cmd, { 0 },
   N_("usage:\n"
  "\n"
@@ -137,7 +139,8 @@
  "DEST_URL repository.\n"
  "\n"
  "Form 2 is deprecated syntax, equivalent to specifying 
\"-rREV[:REV2]\".\n"),
-  { SVNSYNC_OPTS_DEFAULT, 'q', 'r', svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q', 'r',
+svnsync_opt_disable_locking } },
 { "info", info_cmd, { 0 },
   N_("usage: svnsync info DEST_URL\n"
  "\n"
@@ -200,6 +203,12 @@
   "For example:\n"
   " "
   "servers:global:http-library=serf")},
+{"source-encoding", svnsync_opt_source_encoding, 1,
+   N_("convert translatable properties from encoding ARG\n"
+  " "
+  "to UTF-8. If not specified, then properties are\n"
+  " "
+ 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Philip Martin
Daniel Trebbien  writes:

> [[[
> Add a command line option to the svnsync init, sync, and copy-revprops
> subcommands (--source-encoding) that allows the user to specify the
> character encoding of translatable properties from the source
> repository. This is needed to allow svnsync to import some older
> Subversion repositories that have properties that were not encoded in
> UTF-8.

I don't see a patch.

-- 
Philip


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Shahaf
The patch didn't make it to the list (only to my personal copy), could
you please re-send with the patch in "text/plain" MIME type?

Daniel Trebbien wrote on Fri, Sep 10, 2010 at 09:10:08 -0700:
> [[[
> Add a command line option to the svnsync init, sync, and copy-revprops
> subcommands (--source-encoding) that allows the user to specify the
> character encoding of translatable properties from the source
> repository. This is needed to allow svnsync to import some older
> Subversion repositories that have properties that were not encoded in
> UTF-8.
> 
> As discussed at:
> http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
> 
> * subversion/svnsync/main.c
>   (svnsync__opt) Added svnsync_opt_source_encoding.
>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> of acceptable options for the init, sync, and copy-revprops
> subcommands.
>   (opt_baton_t, subcommand_baton_t): Added source_encoding field.
>   (log_properties_normalized): Removed this obsolete function.
>   (copy_revprops): Added a parameter to receive the
> subcommand_baton_t*. Removed the quiet parameter as this can be found
> from the subcommand baton. Removed the int* normalized_count
> parameter, as the counting of "normalized" properties is an obsolete
> function.
>   (make_subcommand_baton): Set the source_encoding field of the
> resulting subcommand_baton_t object to the value of source_encoding
> from opt_baton_t.
>   (do_initialize, do_synchronize, do_copy_revprops): Removed counting
> and reporting of "normalized" properties. Pass source_encoding through
> to svnsync_* functions.
>   (replay_baton_t): Removed obsolete fields normalized_rev_props_count
> and normalized_node_props_count
>   (main): Handled the case when the command line option is
> --source-encoding. Set the source_encoding field of the opt_baton_t*
> to either the ARG of --source-encoding or NULL.
> 
> * subversion/svnsync/sync.c
>   Standardized terminology by replacing "normalize" with "translate".
>   (translate_string): Added the encoding parameter. Handle the case
> when encoding is not NULL by calling svn_subst_translate_string().
>   (svnsync_translate_revprops): Added the encoding parameter. Removed
> counting of "normalized" properties.
>   (edit_baton_t): Added the prop_encoding field. Removed the obsolete
> int* normalized_node_props_counter field.
>   (svnsync_get_sync_editor): Added a prop_encoding parameter, the
> value of which is set in the prop_encoding field of the edit_baton_t*.
> 
> * subversion/svnsync/sync.h
>   Updated the declarations and documentation of the svnsync_* functions.
> ]]]



[PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
[[[
Add a command line option to the svnsync init, sync, and copy-revprops
subcommands (--source-encoding) that allows the user to specify the
character encoding of translatable properties from the source
repository. This is needed to allow svnsync to import some older
Subversion repositories that have properties that were not encoded in
UTF-8.

As discussed at:
http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020

* subversion/svnsync/main.c
  (svnsync__opt) Added svnsync_opt_source_encoding.
  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
of acceptable options for the init, sync, and copy-revprops
subcommands.
  (opt_baton_t, subcommand_baton_t): Added source_encoding field.
  (log_properties_normalized): Removed this obsolete function.
  (copy_revprops): Added a parameter to receive the
subcommand_baton_t*. Removed the quiet parameter as this can be found
from the subcommand baton. Removed the int* normalized_count
parameter, as the counting of "normalized" properties is an obsolete
function.
  (make_subcommand_baton): Set the source_encoding field of the
resulting subcommand_baton_t object to the value of source_encoding
from opt_baton_t.
  (do_initialize, do_synchronize, do_copy_revprops): Removed counting
and reporting of "normalized" properties. Pass source_encoding through
to svnsync_* functions.
  (replay_baton_t): Removed obsolete fields normalized_rev_props_count
and normalized_node_props_count
  (main): Handled the case when the command line option is
--source-encoding. Set the source_encoding field of the opt_baton_t*
to either the ARG of --source-encoding or NULL.

* subversion/svnsync/sync.c
  Standardized terminology by replacing "normalize" with "translate".
  (translate_string): Added the encoding parameter. Handle the case
when encoding is not NULL by calling svn_subst_translate_string().
  (svnsync_translate_revprops): Added the encoding parameter. Removed
counting of "normalized" properties.
  (edit_baton_t): Added the prop_encoding field. Removed the obsolete
int* normalized_node_props_counter field.
  (svnsync_get_sync_editor): Added a prop_encoding parameter, the
value of which is set in the prop_encoding field of the edit_baton_t*.

* subversion/svnsync/sync.h
  Updated the declarations and documentation of the svnsync_* functions.
]]]