Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

Daniel Trebbien Tue, 21 Sep 2010 08:13:08 -0700

On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf <d...@daniel.shahaf.name> wrote:
>> * subversion/include/svn_subst.h
>>   Added documentation of the new public API function
>>   svn_subst_translate_string2().
>>
>
> Should use this syntax:
>
> * subversion/include/svn_subst.h
>  (svn_subst_translate_string2): New function.


Okay.

>> * subversion/libsvn_subr/subst.c
>>   (translation_baton): Added TRANSLATED_EOL field. It's where 
>> translate_chunk()
>>     records that it translated a line ending.
>>   (create_translation_baton): Added a new parameter TRANSLATED_EOL. 
>> Initialize
>>     the TRANSLATED_EOL field of the resulting translation_baton to this 
>> value.
>>   (translate_chunk): After translating a line ending, set the value that is
>>     pointed to by TRANSLATED_EOL (from the translation baton) to TRUE.
>
> Could just say "Record that an EOL translation has been done" or similar.
>
> In general, the log message looks good, but to my taste it is borders
> the 'too verbose a log message' line.  (I might have written the same
> without the last sentence of each bullet.)  Remember that the log
> message must never /repeat/ the code (or in-code comments); rather, it
> should describe them from a high level.

Okay. I will attempt to make the log message less verbose.

>> Index: subversion/include/svn_subst.h
>> ===================================================================
>> --- subversion/include/svn_subst.h    (revision 996730)
>> +++ subversion/include/svn_subst.h    (working copy)
>> @@ -598,6 +598,26 @@ svn_error_t *svn_subst_translate_string(svn_string
>>                                          const char *encoding,
>>                                          apr_pool_t *pool);
>>
>> +/** Translate the data in @a value (assumed to be in encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default language encoding.
>> + * If @p translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>
> You use @a, @p, and @code.  Existing code uses only @a.  Why the
> difference?

Oops. @p should be @a.

I believe that in Doxygen, @c WORD is short for @code WORD @endode.

>> + * character of @p value in the source charset was translated to UTF-8;
>
> Need the word "to" after the semicolon for the English to be correct.

Fixed.  (When I say "fixed", I mean that I committed a fix in my own
fork of Subversion:  http://github.com/dtrebbien/subversion )

>> + * @c FALSE otherwise. If @p translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF; �...@c FALSE otherwise. Return the 
>> translated data in @a *new_value,
>> + * allocated in @a pool.
>> + *
>
> Need to rewrap.

Fixed.

>> + * @since New in 1.6
>
> s/1.6/1.7/

Fixed.

>> + */
>> +svn_error_t *svn_subst_translate_string2(svn_string_t **new_value,
>
> I realize that you followed the precedent of svn_subst_translate_string(),
> but please:
>
> Return type on own line, function name on the next line.

Fixed.

>> +                                         svn_boolean_t *translated_to_utf8,
>> +                                         svn_boolean_t 
>> *translated_line_endings,
>> +                                         const svn_string_t *value,
>> +                                         const char *encoding,
>> +                                         apr_pool_t *pool);
>> +
>>  /** Translate the data in @a value from UTF8 and LF line-endings into
>
> Need to update svn_subst_translate_string()'s docstring and
> implementation and mark it deprecated:
> http://subversion.apache.org/docs/community-guide/community-guide.html#deprecation
> http://mid.gmane.org/20100814132023.089a22388...@eris.apache.org

Okay.

>>   * native locale and native line-endings, or to the output locale if
>>   * @a for_output is TRUE.  Return the translated data in @a
>
>> Index: subversion/libsvn_subr/subst.c
>> ===================================================================
>> --- subversion/libsvn_subr/subst.c    (revision 996730)
>> +++ subversion/libsvn_subr/subst.c    (working copy)
>> @@ -765,6 +765,7 @@ svn_subst_keywords_differ2(apr_hash_t *a,
>>  struct translation_baton
>>  {
>>    const char *eol_str;
>> +  svn_boolean_t *translated_eol;
>>    svn_boolean_t repair;
>>    apr_hash_t *keywords;
>>    svn_boolean_t expand;
>> @@ -805,6 +806,7 @@ struct translation_baton
>>   */
>>  static struct translation_baton *
>>  create_translation_baton(const char *eol_str,
>> +                         svn_boolean_t *translated_eol,
>>                           svn_boolean_t repair,
>>                           apr_hash_t *keywords,
>>                           svn_boolean_t expand,
>> @@ -818,6 +820,7 @@ create_translation_baton(const char *eol_str,
>>
>>    b->eol_str = eol_str;
>>    b->eol_str_len = eol_str ? strlen(eol_str) : 0;
>> +  b->translated_eol = translated_eol;
>>    b->repair = repair;
>>    b->keywords = keywords;
>>    b->expand = expand;
>> @@ -884,6 +887,8 @@ translate_chunk(svn_stream_t *dst,
>>                                          b->src_format,
>>                                          &b->src_format_len, b->newline_buf,
>>                                          b->newline_off, dst, b->repair));
>> +              if (b->translated_eol)
>> +                *b->translated_eol = TRUE;
>>
>
> I don't have the patience to dive into this function right now, but the
> patch is large enough that I'll review the rest and come back to this
> some other time.
>
> <appendum>
> There are several calls to translate_newline() in this function, but you
> added the '*translated_eol = TRUE' line only after one of them.  That
> triggers a red light for me...
> </appendum>
>
> (PS: "appendum"? anyone got a better suggestion?)
> (PPS: "addendum".)

Oh, I didn't notice. I will have to look into this issue.

>>    /* Jam the text into the destination stream (to translate it). */
>>    SVN_ERR(svn_stream_write(dst_stream, src, &len));
>> @@ -1362,6 +1386,19 @@ svn_subst_read_specialfile(svn_stream_t **stream,
>>    return SVN_NO_ERROR;
>>  }
>>
>> +svn_error_t *
>> +svn_subst_translate_cstring2(const char *src,
>> +                             const char **dst,
>> +                             const char *eol_str,
>> +                             svn_boolean_t repair,
>> +                             apr_hash_t *keywords,
>> +                             svn_boolean_t expand,
>> +                             apr_pool_t *pool)
>> +{
>> +  return translate_cstring2(dst, NULL, src, eol_str, repair, keywords, 
>> expand,
>> +                            pool);
>> +}
>
> So, this is revved because svn_subst_translate_string2() needs it (and
> svn_subst_translate_string() doesn't).

I am unsure of what you mean.

I took the original implementation of `svn_subst_translate_cstring2`
and placed it in the new static utility function `translate_cstring2`.
`translate_cstring2` accepts another parameter, and its definition
(the old definition on `svn_subst_translate_cstring2`) was
appropriately modified to handle this new parameter. This way, I don't
have to modify the public API.

>> +
>>  /* Given a special file at SRC, generate a textual representation of
>>     it in a normal file at DST.  Perform all allocations in POOL. */
>>  /* ### this should be folded into svn_subst_copy_and_translate3 */
>> @@ -1714,6 +1751,54 @@ svn_subst_translate_string(svn_string_t **new_valu
>>
>>
>>  svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> +                            svn_boolean_t *translated_to_utf8,
>> +                            svn_boolean_t *translated_line_endings,
>> +                            const svn_string_t *value,
>> +                            const char *encoding,
>> +                            apr_pool_t *pool)
>> +{
>
> Why is this adding entirely new lines of code (the diff shows only /^+/
> lines)?  Normally we make the new function by editing the old one
> in-place, and then re-create the old one as a wrapper (like
> translate_cstring2()).
>
> The fact you didn't do so makes me suspect you just copied the code of
> svn_subst_translate_string(), which is bad for several reasons:
>
> * It's code duplication.  If we ever make a bugfix, we'll have to
>  remember to make it to both versions of the function.
>
> * It makes it impossible for me to see /from the unidiff/ what's the
>  code difference between the current and being-added versions of the
>  function.

Yes, I did copy-and-paste. I see your point, though, and I am going to
correct this by adding another `static` function that both will call.

> Out of patience for one sitting, so not reviewing these now.  (I feel they're
> orthogonal/independent of the parts I'd already reviewed.)
>
> I suggest you go ahead (respond to the review, submit an updated patch,
> etc) without waiting; the svnsync part will eventually get its share of
> reviews.  Also, you may want to consider splitting this patch into two
> parts (subst work separate from svnsync work).

I like the idea of splitting this into two patches.

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

Reply via email to