Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

2010-11-24 Thread Daniel Trebbien
On Wed, Nov 24, 2010 at 3:43 AM, Gavin Beau Baumanis
gav...@thespidernet.com wrote:
 Hi Daniel (T),

 Since in your earlier post you mentioned that you didn't mind a friendly 
 reminder...

 I thought I would return this thread to the top of the list  - Just i case 
 you had missed Daniel (Shahaf) 's comments.


 Gavin Beau Baumanis


Hi Gavin,

I posted a new version of the patch on November 8
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/123657)
which addressed Daniel Shahaf's feedback from November 1.  Daniel S.
responded the next day with more feedback
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/123719).
 I then suggested a macro solution
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/123817)
and Julian commented that a macro solution is not preferred
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/123820).

Julian is right that I can feel why not, so now I am trying to think
of another approach to avoid the `if (translated_eol)` check from
being executed unnecessarily.  I feel stuck, though  :(


p.s. To avoid confusion, you can call me Danny if you would like.  I
like that name, and it comes in handy when working with other Daniels
:)


Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 3)

2010-11-13 Thread Daniel Trebbien
 The only difference between translate_newline() and translate_newline_alt()
 is the lines of code from here to the end of the function.  I think you
 could keep translate_newline() as is, and just move these checks elsewhere
 (to a new function or to the caller; I haven't thought about this in
 detail).

 In any case, please avoid the code duplication if possible.  (Function
 pointers are nice, but sometimes just a way to shoot yourself in the
 foot.)  If you're unsure what approach to take, do brainstorm on-list
 before sending a new iteration of the patch.  (That saves time to you
 and to us)

After thinking about this problem some more, I thought about using a
macro solution.  I define a macro that is able to output the
definitions of the two variants of translate_newline():
translate_newline() and translate_newline_alt().  This way, duplicate
code is avoided, but it is like having duplicate code where each copy
is slightly different.

Do you like this idea?  I have attached C code that I have tested.  In
this sample, the macro that outputs the definitions of
translate_newline() and translate_newline_alt() is called
DEFINE_TRANSLATE_NEWLINE_FN.
/* A boolean expression that evaluates to true if the string STR, having length
   STR_LEN, is one of the end-of-line strings LF, CR, or CRLF;  to false
   otherwise.  */
#define STRING_IS_EOL(str, str_len) (((str_len) == 2 \
  (str)[0] == '\r'   \
  (str)[1] == '\n') || \
 ((str_len) == 1 \
  ((str)[0] == '\n' || (str)[0] == '\r')))

/* A boolean expression that evaluates to true if the end-of-line string EOL1,
   having length EOL1_LEN, and the end-of-line string EOL2, having length
   EOL2_LEN, are different, assuming that EOL1 and EOL2 are both from the
   set {\n, \r, \r\n};  to false otherwise.
   
   Given that EOL1 and EOL2 are either \n, \r, or \r\n, then if
   EOL1_LEN is not the same as EOL2_LEN, then EOL1 and EOL2 are of course
   different. If EOL1_LEN and EOL2_LEN are both 2 then EOL1 and EOL2 are both
   \r\n. Otherwise, EOL1_LEN and EOL2_LEN are both 1. We need only check the
   one character for equality to determine whether EOL1 and  EOL2 are the same
   in that case. */
#define DIFFERENT_EOL_STRINGS(eol1, eol1_len, eol2, eol2_len) \
  (((eol1_len) != (eol2_len)) || (*(eol1) != *(eol2)))

static svn_error_t *
translate_newline_alt(const char *eol_str,
  apr_size_t eol_str_len,
  char *src_format,
  apr_size_t *src_format_len,
  const char *newline_buf,
  apr_size_t newline_len,
  svn_stream_t *dst,
  struct translation_baton *b);

#define DEFINE_TRANSLATE_NEWLINE_FN(fn, is_alt)\
static svn_error_t *   \
fn(const char *eol_str,\
   apr_size_t eol_str_len, \
   char *src_format,   \
   apr_size_t *src_format_len, \
   const char *newline_buf,\
   apr_size_t newline_len, \
   svn_stream_t *dst,  \
   struct translation_baton *b)\
{  \
  SVN_ERR_ASSERT(STRING_IS_EOL(eol_str, eol_str_len)); \
  SVN_ERR_ASSERT(STRING_IS_EOL(newline_buf, newline_len)); \
   \
  /* If we've seen a newline before, compare it with our cache to  \
 check for consistency, else cache it for future comparisons. */   \
  if (*src_format_len) \
{  \
  /* Comparing with cache.  If we are inconsistent and \
 we are NOT repairing the file, generate an error! */  \
  if ((! b-repair)  \
  ((*src_format_len != newline_len) || \
   (strncmp(src_format, newline_buf, newline_len   \
return svn_error_create(SVN_ERR_IO_INCONSISTENT_EOL, NULL, NULL);  \
}  \
  else 

Trouble building trunk: Undefined references to svn_repos_load_fs3() and svn_repos_get_fs_build_parser3()

2010-11-07 Thread Daniel Trebbien
I am having trouble building trunk. The make process always fails when
linking libsvn_repos-1:

cd subversion/libsvn_repos  /usr/share/apr-1.0/build/libtool
--tag=CC --silent --mode=link gcc  -Wno-system-headers
-Wold-style-definition -Wdeclaration-after-statement -Wpointer-arith
-Wwrite-strings -Wshadow -ansi -Wall -Wformat=2 -Wunused
-Waggregate-return -Wstrict-prototypes -Wmissing-prototypes
-Wmissing-declarations -Wno-multichar -Wredundant-decls
-Wnested-externs -Wunreachable-code -Winline -Wno-long-long -g
-pthread -D_LARGEFILE64_SOURCE -DNE_LFS
-Werror=implicit-function-declaration  -DSVN_DEBUG -DAP_DEBUG   -rpath
/usr/local/stow/dtrebbien-subversion-HEAD/lib -Wl,--no-undefined -o
libsvn_repos-1.la  authz.lo commit.lo delta.lo deprecated.lo dump.lo
fs-wrap.lo hooks.lo load.lo log.lo node_tree.lo notify.lo
obliterate.lo replay.lo reporter.lo repos.lo rev_hunt.lo
../../subversion/libsvn_fs/libsvn_fs-1.la
../../subversion/libsvn_delta/libsvn_delta-1.la
../../subversion/libsvn_subr/libsvn_subr-1.la -laprutil-1 -lapr-1
.libs/deprecated.o: In function `svn_repos_load_fs2':
/home/daniel/projects/subversion/subversion/libsvn_repos/deprecated.c:720:
undefined reference to `svn_repos_load_fs3'
.libs/deprecated.o: In function `svn_repos_get_fs_build_parser2':
/home/daniel/projects/subversion/subversion/libsvn_repos/deprecated.c:806:
undefined reference to `svn_repos_get_fs_build_parser3'
collect2: ld returned 1 exit status
make: *** [subversion/libsvn_repos/libsvn_repos-1.la] Error 1

Is anyone else experiencing this problem?


Re: Trouble building trunk: Undefined references to svn_repos_load_fs3() and svn_repos_get_fs_build_parser3()

2010-11-07 Thread Daniel Trebbien
On Sun, Nov 7, 2010 at 8:01 AM, Stefan Sperling s...@elego.de wrote:
 On Sun, Nov 07, 2010 at 07:56:51AM -0800, Daniel Trebbien wrote:
 I am having trouble building trunk. The make process always fails when
 linking libsvn_repos-1:

 Looks like the linker or libtool is picking up conflicting versions of
 the same library. Try deinstalling your system subversion packages, if any.
 Also try deleting anything within your trunk build installation prefix.
 Also run make clean to make sure everything is recompiled.

 Stefan

I tried your suggestion of removing the subversion and libsvn1
packages (I'm running Debian Sid), running `make clean`, and
recompiling, but I still have the same problem.


Re: Trouble building trunk: Undefined references to svn_repos_load_fs3() and svn_repos_get_fs_build_parser3()

2010-11-07 Thread Daniel Trebbien
Weird. It seems that `subversion/libsvn_repos/load-fs-vtable.c` was
the only C file in `subversion/libsvn_repos` that wasn't being
compiled. So, I modified the CC line that compiled
`subversion/libsvn_repos/log.c` to instead compile
`subversion/libsvn_repos/load-fs-vtable.c` and manually added
`load-fs-vtable.lo` to the link line that produces
`libsvn_repos-1.la`. That made the problem go away for now.


Re: Trouble building trunk: Undefined references to svn_repos_load_fs3() and svn_repos_get_fs_build_parser3()

2010-11-07 Thread Daniel Trebbien
On Sun, Nov 7, 2010 at 12:04 PM, Daniel Trebbien dtrebb...@gmail.com wrote:
 Weird. It seems that `subversion/libsvn_repos/load-fs-vtable.c` was
 the only C file in `subversion/libsvn_repos` that wasn't being
 compiled. So, I modified the CC line that compiled
 `subversion/libsvn_repos/log.c` to instead compile
 `subversion/libsvn_repos/load-fs-vtable.c` and manually added
 `load-fs-vtable.lo` to the link line that produces
 `libsvn_repos-1.la`. That made the problem go away for now.

Two other C files were also not built:
`subversion/svn/relocate-cmd.c` and
`subversion/tests/libsvn_wc/utils.c`. Except for these files and
`subversion/libsvn_repos/load-fs-vtable.c`, Subversion built just
fine.

Am I configuring Subversion incorrectly? I am using this config line:
./configure --enable-maintainer-mode
--prefix=/usr/local/stow/dtrebbien-subversion-HEAD --with-neon


Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 3)

2010-11-07 Thread Daniel Trebbien
Hi Daniel,

Thank you for your feedback.  I have attached a new patch and log
message that addresses the points that you have mentioned.

 If we can provide the information cheaply, I see no reason not to let
 the APIs provide it.  As some of these APIs are already revved for 1.7,
 we might as well add the new information to them now.

Maybe this should be done with other patches?  I am concerned that
this patch is getting long again :)

 And all of this is necessary just so svnsync can /report/ which of the
 two fixes it made, right?  If so, we might move on with teaching svnsync
 to fix newlines too, and add the Report back how many properties had
 their newlines changed functionality separately.  (easier to review,
 and I don't think 512 properties were re-newlined is /that/ useful to
 know for most people)

Yes.

Having svnsync do the newline normalization would be okay if there
were a function that only performed the re-encoding, which
unfortunately I do not see.

Someone (I forget who) suggested that the final summary message 512
properties were re-newlined be replaced with a revision-by-revision
listing of the exact properties that were changed in importing each
revision.  I like this idea, and was thinking about making patches to
implement it.

 [[[
 Adds a public API function, svn_subst_translate_string2(), an extension of
 svn_subst_translate_string(), that has two, additional output parameters for
 determining whether re-encoding and/or line ending translation were 
 performed.


 Grammar police: No comma after two.

Fixed.

 What's happening in this file is essentially Add the parameters to
 this public API, punch them through everything, and finally in
 translate_newline() make use of them.  Could you add something
 (a paragraph?) to the log message that makes that clear?

 Or, another option is two have two subst.c entries in the log message
 --- one for the 'punching through' and one for the interesting changes
 --- though we have less precedent for that.

 I'll leave it to you to find a way to make the overall picture clear :)

I added a couple of paragraphs.  One explains that most of the changes
are to send the TRANSLATED_EOL parameter all the way through to
translate_newline().  Another paragraph explains the idea behind my
solution for the efficiency concern that you had about the new
translate_newline() implementation.

 -/** Translate the data in @a value (assumed to be in encoded in charset
 +/** @deprecated Provided for backward compatibility with the 1.6 API. 
 Callers
 + * should use svn_subst_translate_string2().
 + *
 + * Translate the data in @a value (assumed to be encoded in charset
   * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
   * then assume that @a value is in the system-default language encoding.
   * Return the translated data in @a *new_value, allocated in @a pool.
   */

 Please make the old docstring only describe the differences from the new
 version of the function, and delete the rest of the docstring here.
 Follow the examples in the other header files.

Fixed.

 +SVN_DEPRECATED
  svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
                                          const svn_string_t *value,
                                          const char *encoding,
                                          apr_pool_t *pool);

 +/** Translate the data in @a value (assumed to be encoded in charset
 + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
 + * then assume that @a value is in the system-default character encoding.

 You changed 'language encoding' to 'character encoding'.

Changed back.  Note that the docstring has been reworded since version
2 of the patch.  I am now following the new wording and modified it as
appropriate.

 + * If @a translated_to_utf8 is not @c NULL, then
 + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
 + * character of @a value in the source charset was translated to UTF-8;  to
 + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
 + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
 + * line ending was changed to LF;  to @c FALSE otherwise.

 Use paragraphs please.

Made into a separate paragraph.

 + * Return the translated
 + * data in @a *new_value, allocated in @a pool.
 + *
 + * @since New in 1.7

 Trailing period missing.

Fixed.

 +svn_error_t *
 +svn_subst_translate_string2(svn_string_t **new_value,
 +                            svn_boolean_t *translated_to_utf8,
 +                            svn_boolean_t *translated_line_endings,
 +                            const svn_string_t *value,
 +                            const char *encoding,
 +                            apr_pool_t *pool);

 Can you switch this function to the two-pools paradigm
 (result_pool/scratch_pool) while here?  There are examples in
 libsvn_wc/wc_db.h (and elsewhere).

 It already uses scratch_pool internally.

I added 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-10-23 Thread Daniel Trebbien
On Mon, Oct 18, 2010 at 3:05 AM, Gavin Beau Baumanis
gav...@thespidernet.com wrote:
 Hi Daniel,

 This is just a friendly poke.
 Are you happy to have me annoy you every so often - or would just like me 
 leave it entirely with you ?


 Gavin Beau Baumanis

Hi Gavin,

I don't mind at all. Regarding this svnsync patch, I am waiting for
[PATCH] extend svn_subst_translate_string() to record whether
re-encoding and/or line ending translation were performed (v. 2)
(http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020/)
to be committed because this patch depends upon that one. Not to
worry, I am still interested in making the previously-discussed
enhancements to svnsync.

Daniel


[PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

2010-10-03 Thread Daniel Trebbien
On Fri, Oct 1, 2010 at 3:57 AM, Julian Foad julian.f...@wandisco.com wrote:
 Adds a public API function, svn_subst_translate_string2(), an extension of
 svn_subst_translate_string(), that has two, additional output parameters for
 determining whether re-encoding and/or line ending translation were 
 performed.

 If we're going to add this functionality to _translate_string(),
 shouldn't we also add it to the other variants - _detranslate_string,
 _translate_cstring2, _stream_translated, _copy_and_translate4?

 I see you're adding the infrastructure into the (new) bodies of
 _translate_cstring2 and _stream_translated, just not (yey) exposing it
 in their APIs.

 I'm not sure.  The set of 'translation' functions is already messy and
 it's not clear to me how useful this new functionality will be.

I originally began working on svn_subst_translate_string2() because I
could not find a combination of public API functions that would allow
me to determine whether the line endings changed when a string was
both re-encoded to UTF-8 and normalized to LF line endings. The most
applicable, svn_subst_translate_string(), performs both translations
without indicating whether it re-encoded the string or normalized a
line ending.

An alternative to extending svn_subst_translate_string() is to add a
public API function that does the re-encoding and another that
normalizes line endings. I think, however, that extending
svn_subst_translate_string() is better because though the current
implementation of svn_subst_translate_string() re-encodes, then
normalizes line endings, the single API function allows for the
possibility of a future improvement in efficiency as a result of
performing both translations simultaneously.



Attached are a revised patch and log message.
[[[
Adds a public API function, svn_subst_translate_string2(), an extension of
svn_subst_translate_string(), that has two, additional output parameters for
determining whether re-encoding and/or line ending translation were performed.


As discussed at:
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020


* subversion/include/svn_subst.h
  (svn_subst_translate_string2): New function.
  (svn_subst_translate_string): Deprecated in favor of
svn_subst_translate_string2().

* subversion/libsvn_subr/subst.c
  (translate_newline): Added the new parameter TRANSLATED_EOL, the value at
which is set to TRUE if the newline string in EOL_STR is different than
the newline string in NEWLINE_BUF.
  (translation_baton): Added the TRANSLATED_EOL field.
  (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
passed to the resulting translation_baton.
  (translate_chunk): Modified the three calls to translate_newline() to pass
the TRANSLATED_EOL pointer from the translation baton.
  (stream_translated): New static function. Its implementation is the old
implementation of svn_subst_stream_translated(), but accepting another
parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
that it creates.
  (svn_subst_stream_translated): Now a wrapper for stream_translated().
  (translate_cstring): New static function. Its implementation is the old
implementation of svn_subst_translate_cstring2(), but modified to accept
another parameter, TRANSLATED_EOL, that is passed to stream_translated().
  (svn_subst_translate_cstring2): Now a wrapper for translate_cstring().
  (svn_subst_translate_string2): New function. The task of recording whether
it translated a line ending is delegated to translate_cstring().
]]]Index: subversion/include/svn_subst.h
===
--- subversion/include/svn_subst.h  (revision 1004022)
+++ subversion/include/svn_subst.h  (working copy)
@@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea
 
 /* EOL conversion and character encodings */
 
-/** Translate the data in @a value (assumed to be in encoded in charset
+/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
+ * should use svn_subst_translate_string2().
+ * 
+ * Translate the data in @a value (assumed to be encoded in charset
  * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
  * then assume that @a value is in the system-default language encoding.
  * Return the translated data in @a *new_value, allocated in @a pool.
  */
+SVN_DEPRECATED
 svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
 const svn_string_t *value,
 const char *encoding,
 apr_pool_t *pool);
 
+/** Translate the data in @a value (assumed to be encoded in charset
+ * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
+ * then assume that @a value is in the system-default character 

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed

2010-10-02 Thread Daniel Trebbien
On Fri, Oct 1, 2010 at 4:14 AM, Bert Huijben b...@qqmail.nl wrote:
  @@ -650,7 +651,13 @@ translate_newline(const char *eol_str,
         *src_format_len = newline_len;
       }
     /* Translate the newline */
  -  return translate_write(dst, eol_str, eol_str_len);
  +  svn_error_t *err = translate_write(dst, eol_str, eol_str_len);

 No declarations mixed in with statements - we stick to C'89 rules.  But
 I don't think there is any need to insert this new code *after* the
 write - it can just as well go before the write, leaving the 'return'
 how it was.

 The code can just use SVN_ERR() here, as you can't be sure the output is 
 available in error conditions anyway, so the extra check can be avoided on 
 errors.

  +  if (eol_translated) {
  +    if (newline_len != eol_str_len ||
  +        strncmp(newline_buf, eol_str, newline_len))
  +      *eol_translated = TRUE;
  +  }
  +  return err;

 And this can be a return SVN_NO_ERROR;

I am not sure what the extra check is.

Is this preferred?:

  /* Translate the newline */
  SVN_ERR(translate_write(dst, eol_str, eol_str_len));

  if (translated_eol) {
if (newline_len != eol_str_len ||
strncmp(newline_buf, eol_str, newline_len))
  *translated_eol = TRUE;
  }
  return SVN_NO_ERROR;


[PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed

2010-09-30 Thread Daniel Trebbien
Following Daniel Shahaf's suggestion of splitting the allow svnsync
to translate non-UTF-8 log messages to UTF-8 work into two patches, I
have revised my changes, prepared one of the two patches, and have
attached the one patch and suggested log message to this e-mail.
Adds a public API function, svn_subst_translate_string2(), an extension of
svn_subst_translate_string(), that has two, additional output parameters for
determining whether re-encoding and/or line ending translation were performed.


As discussed at:
  http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122518
  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550


* subversion/include/svn_subst.h
  (svn_subst_translate_string2): New function.
  (svn_subst_translate_string): Deprecated in favor of
svn_subst_translate_string2().

* subversion/libsvn_subr/subst.c
  (translate_newline): Added the new parameter EOL_TRANSLATED, the value at
which is set to TRUE if translate_newline() wrote out the line ending in
NEWLINE_BUF with a different line ending in EOL_STR.
  (translation_baton): Added the TRANSLATED_EOL field.
  (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
passed to the resulting translation_baton.
  (translate_chunk): Modified the three calls to translate_newline() to pass
the TRANSLATED_EOL pointer from the translation baton.
  (stream_translated): New static function. Its implementation is the old
implementation of svn_subst_stream_translated(), but accepting another
parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
that it creates.
  (svn_subst_stream_translated): Now a wrapper for stream_translated().
  (translate_cstring2): New static function. Its implementation is the old
implementation of svn_subst_translate_cstring2(), but modified to accept
another parameter, TRANSLATED_EOL, that is passed to stream_translated().
  (svn_subst_translate_cstring2): Now a wrapper for translate_cstring2().
  (svn_subst_translate_string2): New function. The task of recording whether
it translated a line ending is delegated to translate_cstring2().
Index: subversion/include/svn_subst.h
===
--- subversion/include/svn_subst.h  (revision 1003341)
+++ subversion/include/svn_subst.h  (working copy)
@@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea
 
 /* EOL conversion and character encodings */
 
-/** Translate the data in @a value (assumed to be in encoded in charset
+/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
+ * should use svn_subst_translate_string2().
+ * 
+ * Translate the data in @a value (assumed to be encoded in charset
  * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
  * then assume that @a value is in the system-default language encoding.
  * Return the translated data in @a *new_value, allocated in @a pool.
  */
+SVN_DEPRECATED
 svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
 const svn_string_t *value,
 const char *encoding,
 apr_pool_t *pool);
 
+/** Translate the data in @a value (assumed to be encoded in charset
+ * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
+ * then assume that @a value is in the system-default character encoding.
+ * If @a translated_to_utf8 is not @c NULL, then
+ * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
+ * character of @a value in the source charset was translated to UTF-8;  to
+ * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
+ * @code *translated_line_endings @endcode is set to @c TRUE if at least one
+ * line ending was changed to LF;  to @c FALSE otherwise. Return the translated
+ * data in @a *new_value, allocated in @a pool.
+ *
+ * @since New in 1.7
+ */
+svn_error_t *
+svn_subst_translate_string2(svn_string_t **new_value,
+svn_boolean_t *translated_to_utf8,
+svn_boolean_t *translated_line_endings,
+const svn_string_t *value,
+const char *encoding,
+apr_pool_t *pool);
+
 /** Translate the data in @a value from UTF8 and LF line-endings into
  * native locale and native line-endings, or to the output locale if
  * @a for_output is TRUE.  Return the translated data in @a
Index: subversion/libsvn_subr/subst.c
===
--- subversion/libsvn_subr/subst.c  (revision 1003341)
+++ subversion/libsvn_subr/subst.c  (working copy)
@@ -628,7 +628,8 @@ translate_newline(const char *eol_str,
   char *newline_buf,
   

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Daniel Trebbien
Hi Gavin,

There have been some things that I have needed to do recently, so I
haven't been working on the patches as much as I would like.  But, I
am still interested in finishing them.

My apologies for the delay...

Daniel


On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
gav...@thespidernet.com wrote:
 Hi Daniel (Trebbien),

 Just thought I would ask for an update, please.
 I still have you on my watch - list - so there is no rush - just making 
 sure for my own sanity that I haven't missed an email / thread about your 
 patch submission.


 Gavin Beau Baumanis


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Trebbien
On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 * subversion/include/svn_subst.h
   Added documentation of the new public API function
   svn_subst_translate_string2().


 Should use this syntax:

 * subversion/include/svn_subst.h
  (svn_subst_translate_string2): New function.

Okay.

 * subversion/libsvn_subr/subst.c
   (translation_baton): Added TRANSLATED_EOL field. It's where 
 translate_chunk()
     records that it translated a line ending.
   (create_translation_baton): Added a new parameter TRANSLATED_EOL. 
 Initialize
     the TRANSLATED_EOL field of the resulting translation_baton to this 
 value.
   (translate_chunk): After translating a line ending, set the value that is
     pointed to by TRANSLATED_EOL (from the translation baton) to TRUE.

 Could just say Record that an EOL translation has been done or similar.

 In general, the log message looks good, but to my taste it is borders
 the 'too verbose a log message' line.  (I might have written the same
 without the last sentence of each bullet.)  Remember that the log
 message must never /repeat/ the code (or in-code comments); rather, it
 should describe them from a high level.

Okay. I will attempt to make the log message less verbose.

 Index: subversion/include/svn_subst.h
 ===
 --- subversion/include/svn_subst.h    (revision 996730)
 +++ subversion/include/svn_subst.h    (working copy)
 @@ -598,6 +598,26 @@ svn_error_t *svn_subst_translate_string(svn_string
                                          const char *encoding,
                                          apr_pool_t *pool);

 +/** Translate the data in @a value (assumed to be in encoded in charset
 + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
 + * then assume that @a value is in the system-default language encoding.
 + * If @p translated_to_utf8 is not @c NULL, then
 + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one

 You use @a, @p, and @code.  Existing code uses only @a.  Why the
 difference?

Oops. @p should be @a.

I believe that in Doxygen, @c WORD is short for @code WORD @endode.

 + * character of @p value in the source charset was translated to UTF-8;

 Need the word to after the semicolon for the English to be correct.

Fixed.  (When I say fixed, I mean that I committed a fix in my own
fork of Subversion:  http://github.com/dtrebbien/subversion )

 + * @c FALSE otherwise. If @p translated_line_endings is not @c NULL, then
 + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
 + * line ending was changed to LF; �...@c FALSE otherwise. Return the 
 translated data in @a *new_value,
 + * allocated in @a pool.
 + *

 Need to rewrap.

Fixed.

 + * @since New in 1.6

 s/1.6/1.7/

Fixed.

 + */
 +svn_error_t *svn_subst_translate_string2(svn_string_t **new_value,

 I realize that you followed the precedent of svn_subst_translate_string(),
 but please:

 Return type on own line, function name on the next line.

Fixed.

 +                                         svn_boolean_t *translated_to_utf8,
 +                                         svn_boolean_t 
 *translated_line_endings,
 +                                         const svn_string_t *value,
 +                                         const char *encoding,
 +                                         apr_pool_t *pool);
 +
  /** Translate the data in @a value from UTF8 and LF line-endings into

 Need to update svn_subst_translate_string()'s docstring and
 implementation and mark it deprecated:
 http://subversion.apache.org/docs/community-guide/community-guide.html#deprecation
 http://mid.gmane.org/20100814132023.089a22388...@eris.apache.org

Okay.

   * native locale and native line-endings, or to the output locale if
   * @a for_output is TRUE.  Return the translated data in @a

 Index: subversion/libsvn_subr/subst.c
 ===
 --- subversion/libsvn_subr/subst.c    (revision 996730)
 +++ subversion/libsvn_subr/subst.c    (working copy)
 @@ -765,6 +765,7 @@ svn_subst_keywords_differ2(apr_hash_t *a,
  struct translation_baton
  {
    const char *eol_str;
 +  svn_boolean_t *translated_eol;
    svn_boolean_t repair;
    apr_hash_t *keywords;
    svn_boolean_t expand;
 @@ -805,6 +806,7 @@ struct translation_baton
   */
  static struct translation_baton *
  create_translation_baton(const char *eol_str,
 +                         svn_boolean_t *translated_eol,
                           svn_boolean_t repair,
                           apr_hash_t *keywords,
                           svn_boolean_t expand,
 @@ -818,6 +820,7 @@ create_translation_baton(const char *eol_str,

    b-eol_str = eol_str;
    b-eol_str_len = eol_str ? strlen(eol_str) : 0;
 +  b-translated_eol = translated_eol;
    b-repair = repair;
    b-keywords = keywords;
    b-expand = expand;
 @@ -884,6 +887,8 @@ translate_chunk(svn_stream_t 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-21 Thread Daniel Trebbien
Daniel Trebbien dtrebbien at gmail.com writes:
 Yes, I did copy-and-paste. I see your point, though, and I am going to
 correct this by adding another `static` function that both will call.

Never mind the talk of a new static function. `svn_subst_translate_string` 
should
of course call `svn_subst_translate_string2` :)



Re: Public API documentation

2010-09-15 Thread Daniel Trebbien
Hyrum K. Wright hyrum_wright at mail.utexas.edu writes:
 Thanks.  Though I'll probably wait a day or two, to let any dissenters
 make themselves known. :)

-1/2. There are parts of your idea that I like, such as linking to common
documentation, enumerating all of the errors that can be returned, and
specifying whether parameters are in, out, or in/out, but in general I do not
like the idea of changing the style of documentation to focus on documentation
of the parameters.

I am a new user of the API, so in some ways I think that my initial impression
is representative of others newbies' initial impression of the documentation.
And, my initial impression was that I liked how the all of the parameters were
explained in enough detail to be able to actually use the function. In the case
of `svn_client_checkout3`, for example, I like how the documentation currently
points out that the most important field of the ctx is `notify_func2`. The
proposed idea of simply linking to the documentation of `svn_client_ctx_t`
would not be very helpful to a beginner, I think, because there are *so many*
fields of the `svn_client_ctx_t` structure, and it is not immediately clear
which ones affect the behavior of `svn_client_checkout3`. Also, other details
were trimmed away in the mock-up (e.g.: the basic description went from the
detailed Checkout a working copy of `URL` at `revision`, looked up at
`peg_revision`, using `path` as the root directory of the newly checked out
working copy, and authenticating with the authentication baton cached in `ctx`
to Checkout a working copy from a repository;  the description of
`result_rev` is currently more detailed than in the proposed description).

The use of structures to pass in bundles of parameters is problematic for
the proposed style of documentation, because it is not obvious how the fields
of the bundles of parameters should be listed as parameters. Do
you document `ctx-notify_func2` as an [in] parameter? Also, Julian's example
of `svn_client_checkout2` is another problem with the proposed style.



In svn_subst_translate_string(), shouldn't REPAIR in the call to svn_subst_translate_cstring2() be TRUE?

2010-09-13 Thread Daniel Trebbien
Looking through the source code for `svn_subst_translate_string` in
`subversion/libsvn_subr/subst.c`, it seems odd to me that the `repair`
parameter to the call to `svn_subst_translate_cstring2` (for
translating the line endings) is `FALSE`. `FALSE` means that if the
line endings are inconsistent, then `svn_subst_translate_cstring2`
fails.

Given that the documentation for `svn_subst_translate_string` says
that `value` is translated to LF line endings, which implies to me
that all line endings are changed to Unix-style, shouldn't
`svn_subst_translate_cstring2` be called with `repair` set to `TRUE`?


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-12 Thread Daniel Trebbien
On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 When I was working on my changes, I was looking for a to UTF-8
 function that would return whether it actually re-encoded the input
 string, but did not find one. The re-encoding function that I used,
 `svn_subst_translate_string`, actually converts line endings to LF as
 well as re-encodes from the given encoding to UTF-8, but it does not
 inform the caller of whether it took either action. I guess that I
 could write a utility function, kind of like a `strcmp`, but which
 ignores any differences at line endings. Unfortunately, this adds
 another scan through every property value that is encountered. Already
 there is a noticeable decrease in the performance of the modified
 `svnsync` as a result of calling `svn_subst_translate_string` on
 basically every property value, and adding an additional scan through
 each property value would decrease performance further.


 Or you could insert the reencoding magic after (and separately from) the
 dos2unix magic, if that would make counting easier.  That said, what are
 you trying to count?  The number of properties where the reencoding
 wasn't a noop?

To re-encode and then normalize the line endings would work.
Unfortunately, I didn't see a library function that only performed the
re-encoding;  `svn_subst_translate_string` does both simultaneously.

I removed the normalization counting code without much thought in my
hastened efforts to produce a version of `svnsync` that I could use to
mirror the GNU Nano repository. Currently, I am thinking that Stefan
Sperling's idea of a `svn_subst_translate_string2` function is the way
to go.

 Re performance, isn't svnsync bound by network speed?

Mostly yes. However, I have definitely noticed a decrease in
performance with my altered version (when using --source-encoding)
that cannot be explained by network speed. Granted, it's not that much
of a difference.

 Unrelatedly, you mentioned that in the repository you work on there are
 soem properties in latin1 and some in utf8.  So one will need (until
 they fix the properties on their side) to svnsync a few revisions with
 translation enabled, then kill svnsync and restart with translation
 disabled, then restart again with it enabled etc.  Which makes me think,
 do we want a sync up to N revisions (or, sync up to rN) switch?

It's like you are reading my mind :)  I figured that I would work on
getting this change implemented and then work on such a feature.

 Have you sent a new version of the patch yet?

Oh, not yet. I'm still working on it.


[PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
[[[
Add a command line option to the svnsync init, sync, and copy-revprops
subcommands (--source-encoding) that allows the user to specify the
character encoding of translatable properties from the source
repository. This is needed to allow svnsync to import some older
Subversion repositories that have properties that were not encoded in
UTF-8.

As discussed at:
http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020

* subversion/svnsync/main.c
  (svnsync__opt) Added svnsync_opt_source_encoding.
  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
of acceptable options for the init, sync, and copy-revprops
subcommands.
  (opt_baton_t, subcommand_baton_t): Added source_encoding field.
  (log_properties_normalized): Removed this obsolete function.
  (copy_revprops): Added a parameter to receive the
subcommand_baton_t*. Removed the quiet parameter as this can be found
from the subcommand baton. Removed the int* normalized_count
parameter, as the counting of normalized properties is an obsolete
function.
  (make_subcommand_baton): Set the source_encoding field of the
resulting subcommand_baton_t object to the value of source_encoding
from opt_baton_t.
  (do_initialize, do_synchronize, do_copy_revprops): Removed counting
and reporting of normalized properties. Pass source_encoding through
to svnsync_* functions.
  (replay_baton_t): Removed obsolete fields normalized_rev_props_count
and normalized_node_props_count
  (main): Handled the case when the command line option is
--source-encoding. Set the source_encoding field of the opt_baton_t*
to either the ARG of --source-encoding or NULL.

* subversion/svnsync/sync.c
  Standardized terminology by replacing normalize with translate.
  (translate_string): Added the encoding parameter. Handle the case
when encoding is not NULL by calling svn_subst_translate_string().
  (svnsync_translate_revprops): Added the encoding parameter. Removed
counting of normalized properties.
  (edit_baton_t): Added the prop_encoding field. Removed the obsolete
int* normalized_node_props_counter field.
  (svnsync_get_sync_editor): Added a prop_encoding parameter, the
value of which is set in the prop_encoding field of the edit_baton_t*.

* subversion/svnsync/sync.h
  Updated the declarations and documentation of the svnsync_* functions.
]]]


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
On Fri, Sep 10, 2010 at 9:16 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 The patch didn't make it to the list (only to my personal copy), could
 you please re-send with the patch in text/plain MIME type?

Hmmm. I wonder if adding a .txt extension will work...
Index: subversion/svnsync/sync.h
===
--- subversion/svnsync/sync.h   (revision 995839)
+++ subversion/svnsync/sync.h   (working copy)
@@ -33,15 +33,15 @@
 #include svn_delta.h
 
 
-/* Normalize the line ending style of the values of properties in REV_PROPS
- * that need translation (according to svn_prop_needs_translation(),
- * currently all svn:* props) so that they contain only LF (\n) line endings.
- * The number of properties that needed normalization is returned in
- * *NORMALIZED_COUNT.
+/* Translate the encoding and line ending style of the values of properties
+ * in rev_props that need translation (according to
+ * svn_prop_needs_translation(), which is currently all svn:* props) so that
+ * they are encoded in UTF-8 and contain only LF (\n) line endings. No
+ * re-encoding is performed if encoding is NULL.
  */
 svn_error_t *
-svnsync_normalize_revprops(apr_hash_t *rev_props,
-   int *normalized_count,
+svnsync_translate_revprops(apr_hash_t *rev_props,
+   const char *encoding,
apr_pool_t *pool);
 
 
@@ -52,18 +52,20 @@
  * which the commit is being made.
  *
  * As the sync editor encounters property values, it might see the need to
- * normalize them (to LF line endings). Each carried out normalization adds 1
- * to the *NORMALIZED_NODE_PROPS_COUNTER (for notification).
+ * translate them (re-encode and/or change to LF line endings).
+ * If PROP_ENCODING is NULL, then property values are presumed to be encoded
+ * in UTF-8 and are not re-encoded. Otherwise, the property values are
+ * presumed to be encoded in PROP_ENCODING, and are translated to UTF-8.
  */
 svn_error_t *
 svnsync_get_sync_editor(const svn_delta_editor_t *wrapped_editor,
 void *wrapped_edit_baton,
 svn_revnum_t base_revision,
 const char *to_url,
+const char *prop_encoding,
 svn_boolean_t quiet,
 const svn_delta_editor_t **editor,
 void **edit_baton,
-int *normalized_node_props_counter,
 apr_pool_t *pool);
 
 
Index: subversion/svnsync/main.c
===
--- subversion/svnsync/main.c   (revision 995839)
+++ subversion/svnsync/main.c   (working copy)
@@ -61,6 +61,7 @@
   svnsync_opt_sync_password,
   svnsync_opt_config_dir,
   svnsync_opt_config_options,
+  svnsync_opt_source_encoding,
   svnsync_opt_disable_locking,
   svnsync_opt_version,
   svnsync_opt_trust_server_cert,
@@ -104,8 +105,8 @@
  the destination repository by any method other than 'svnsync'.\n
  In other words, the destination repository should be a read-only\n
  mirror of the source repository.\n),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
-svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_allow_non_empty, svnsync_opt_disable_locking } },
 { synchronize, synchronize_cmd, { sync },
   N_(usage: svnsync synchronize DEST_URL [SOURCE_URL]\n
  \n
@@ -117,7 +118,8 @@
  source URL.  Specifying SOURCE_URL is recommended in particular\n
  if untrusted users/administrators may have write access to the\n
  DEST_URL repository.\n),
-  { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
+svnsync_opt_disable_locking } },
 { copy-revprops, copy_revprops_cmd, { 0 },
   N_(usage:\n
  \n
@@ -137,7 +139,8 @@
  DEST_URL repository.\n
  \n
  Form 2 is deprecated syntax, equivalent to specifying 
\-rREV[:REV2]\.\n),
-  { SVNSYNC_OPTS_DEFAULT, 'q', 'r', svnsync_opt_disable_locking } },
+  { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q', 'r',
+svnsync_opt_disable_locking } },
 { info, info_cmd, { 0 },
   N_(usage: svnsync info DEST_URL\n
  \n
@@ -200,6 +203,12 @@
   For example:\n

   servers:global:http-library=serf)},
+{source-encoding, svnsync_opt_source_encoding, 1,
+   N_(convert translatable properties from encoding ARG\n
+   
+  to UTF-8. If not specified, then properties are\n
+   
+  presumed to be 

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

2010-09-10 Thread Daniel Trebbien
 * subversion/svnsync/main.c
   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
 of acceptable options for the init, sync, and copy-revprops
 subcommands.

 Missing indentation on the non-first lines.  Should be in one of these forms:

 [[[
 * subversion/svnsync/main.c
  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
     of acceptable options for the init, sync, and copy-revprops subcommands.
 ]]]

 [[[
 * subversion/svnsync/main.c
  (svnsync_cmd_table):
     Added svnsync_opt_source_encoding to the list of acceptable options for
     the init, sync, and copy-revprops subcommands.
 ]]]

Gmail added in the line breaks, but they aren't in my log message text
file. Should I wrap the log message to a certain number of characters?

   (log_properties_normalized): Removed this obsolete function.

 In other words, you dropped the entire count changes feature, without
 any discussion.  (It's good that the log message is clear about this
 change, though.)

 Please don't do that.  Someone spent some time writing and debugging
 this feature, you can't just come in and drop their code.  If you really
 think this is redundant, then start a discussion and get consensus to
 drop this functionality.

 And at any rate, your recode log messages work needn't depend on it.
 It can (IMO: should) even introduce counters for how many properties it
 recoded.

When I was working on my changes, I was looking for a to UTF-8
function that would return whether it actually re-encoded the input
string, but did not find one. The re-encoding function that I used,
`svn_subst_translate_string`, actually converts line endings to LF as
well as re-encodes from the given encoding to UTF-8, but it does not
inform the caller of whether it took either action. I guess that I
could write a utility function, kind of like a `strcmp`, but which
ignores any differences at line endings. Unfortunately, this adds
another scan through every property value that is encountered. Already
there is a noticeable decrease in the performance of the modified
`svnsync` as a result of calling `svn_subst_translate_string` on
basically every property value, and adding an additional scan through
each property value would decrease performance further.

I for one do not think that the final NOTE: Normalized ## properties
to LF line endings messages are particularly helpful because they do
not say *which* properties of specific revisions were normalized. As a
possible compromise, I could modify a Perl script that I wrote for the
purpose of identifying all, non-ASCII property values from the GNU
Nano repository to list revisions and property values that *would*
require normalization and/or re-encoding. Just a thought...

 * subversion/svnsync/sync.c
   Standardized terminology by replacing normalize with translate.

 Please don't do this.  It makes it harder to read the diff (it obscures
 the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
 and in my opinion normalize is the more accurate term for what you're
 doing now after the patch.  :-)

 Just stick to the existing terminology please.

Okay. I'll change it back.

 @@ -785,7 +772,7 @@
  {
    const char *to_url, *from_url;
    svn_ra_session_t *to_session;
 -  opt_baton_t *opt_baton = b;
 +  opt_baton_t *opt_baton = (opt_baton_t*)b;

 Unnecessary change.  As a rule, patches shouldn't have them.

Reverted.

 @@ -1625,9 +1589,9 @@
  /* SUBCOMMAND: help */
  static svn_error_t *
 -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
 +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
  {

 Why did you rename the formal parameter?  Isn't it another unnecessary
 change (as I explained above) that shouldn't be randomly included in
 a patch?

I renamed it because all of the other functions that take a void*
parameter for passing the opt_baton_t* call it `b`. It doesn't matter,
though. I'll change it back.

 @@ -1982,6 +1951,8 @@

    config = apr_hash_get(opt_baton.config, SVN_CONFIG_CATEGORY_CONFIG,
                          APR_HASH_KEY_STRING);
 +
 +  opt_baton.source_encoding = source_encoding;


 It seems that most other options are parsed into opt_baton.foo directly,
 skipping a local variable.

Okay. I'll change that.

 Index: subversion/svnsync/sync.c
 ===
 --- subversion/svnsync/sync.c (revision 995839)
 +++ subversion/svnsync/sync.c (working copy)
 @@ -45,29 +45,39 @@
 -/* Normalize the line ending style of *STR, so that it contains only
 - * LF (\n) line endings. After return, *STR may point at a new
 - * svn_string_t* allocated from POOL.
 +/* Translate the encoding and line ending style of *STR, so that it contains
 + * only LF (\n) line endings and is encoded in UTF-8. After return, *STR may
 + * point at a new svn_string_t* allocated from POOL if *WAS_TRANSLATED. If
 + * ENCODING is not NULL, then *STR is presumed to be encoded in UTF-8.
   *
 - * *WAS_NORMALIZED is set to TRUE when 

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

2010-09-09 Thread Daniel Trebbien
On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 CC += dev@, and let me point you to our patch submission guidelines:
 http://subversion.apache.org/docs/community-guide/general.html#patches

 Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.

 (more below)

 Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
 On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien dtrebb...@gmail.com wrote:
  I think that a call to `svn_subst_translate_string`
  (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
  added in the `svnsync_normalize_revprops` function when `propname` is
  svn:log.

 After applying the following patch to `subversion/svnsync/sync.c`, I
 can confirm that the svnsync: Error while replaying commit error
 disappears, allowing the sync to complete, and that the problem is
 indeed a log message encoding issue:
 diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
 index 8f7be9e..c7a5e38 100644
 --- a/subversion/svnsync/sync.c
 +++ b/subversion/svnsync/sync.c
 @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
                /* And count this. */
                (*normalized_count)++;
              }
 +
 +          if (strcmp(propname, svn:log) == 0)
 +            {

 Should this use svn_prop_needs_translation()?

Though it does not appear so, the added lines are within the check for
`svn_prop_needs_translation`.

 +              svn_string_t *new_propval;
 +              SVN_ERR(svn_subst_translate_string(new_propval, propval, 
 ISO-8859-1, pool));
 +              apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, 
 propval = new_propval);

 Please, please, please, no assignment inside a function call.           
 ^

Fixed

 +              (*normalized_count)++;
 +            }
          }
      }
    return SVN_NO_ERROR;


 Here I hard-coded the encoding, but I think that the encoding to use
 should be retrieved from the Subversion config file. Now the questions
 are:

 1. What is the best way to pass the `log-encoding` option of the
 [miscellany] section to the `svnsync_normalize_revprops` function?


 What is 'log-encoding' normally used for?  Only for recoding the commit
 message supplied by an editor, right?  So I'm not sure we should use it
 here, perhaps a new command-line option will be better.  (svnsync
 $subcommand --source-encoding=%s)

I like the idea of adding a command line option, so I think that this
is the way to go.

Do other properties need to be re-encoded, potentially? I was only
thinking about `svn:log`, but perhaps other properties might need
re-encoding as well. If not, then I think that the command line option
should be more specific (e.g.: `--source-log-encoding=%s`).

 And either way, the default behaviour should be to translate nothing.
 (If you always translate from latin1, you break syncs for people who,
 correctly, used UTF-8 log messages the entire time.)

I agree. The default behavior should definitely be to not re-encode
the log messages.

 2. Should `svnsync` always store the log messages in UTF-8 even though
 the original repository log message encoding is different?


 You have no choice on the matter: the RA API instructs you to supply it
 a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
 already-UTF-8 message.)

 3. Should there be separate configuration options for the log message
 encoding of the source repository and the log message encoding of the
 destination repository?

 No, see (2).


Another question:  which line of code raises the svnsync: Error while
replaying commit error message? I would like to try to make this more
helpful.