Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

Gavin Beau Baumanis Wed, 24 Nov 2010 03:44:11 -0800

Hi Daniel (T),

Since in your earlier post you mentioned that you didn't mind a friendly 
reminder...


I thought I would return this thread to the top of the list  - Just i case you 
had missed Daniel (Shahaf) 's comments.


Gavin "Beau" Baumanis


On 02/11/2010, at 12:35 AM, Daniel Shahaf wrote:

> Gavin Beau Baumanis wrote on Mon, Oct 18, 2010 at 21:06:31 +1100:
>> Ping. This patch submission has received no comments.
>> 
> 
> Thanks, Gavin.  @Daniel, sorry for the delay.
> 
>> On 04/10/2010, at 5:55 AM, Daniel Trebbien wrote:
>> 
>>> On Fri, Oct 1, 2010 at 3:57 AM, Julian Foad <julian.f...@wandisco.com> 
>>> wrote:
>>>>> Adds a public API function, svn_subst_translate_string2(), an extension of
>>>>> svn_subst_translate_string(), that has two, additional output parameters 
>>>>> for
>>>>> determining whether re-encoding and/or line ending translation were 
>>>>> performed.
>>>> 
>>>> If we're going to add this functionality to _translate_string(),
>>>> shouldn't we also add it to the other variants - _detranslate_string,
>>>> _translate_cstring2, _stream_translated, _copy_and_translate4?
>>>> 
>>>> I see you're adding the infrastructure into the (new) bodies of
>>>> _translate_cstring2 and _stream_translated, just not (yey) exposing it
>>>> in their APIs.
>>>> 
> 
> If we can provide the information cheaply, I see no reason not to let
> the APIs provide it.  As some of these APIs are already revved for 1.7,
> we might as well add the new information to them now.
> 
>>>> I'm not sure.  The set of 'translation' functions is already messy and
>>>> it's not clear to me how useful this new functionality will be.
>>> 
>>> I originally began working on svn_subst_translate_string2() because I
>>> could not find a combination of public API functions that would allow
>>> me to determine whether the line endings changed when a string was
>>> both re-encoded to UTF-8 and normalized to LF line endings. The most
>>> applicable, svn_subst_translate_string(), performs both translations
>>> without indicating whether it re-encoded the string or normalized a
>>> line ending.
>>> 
> 
> And all of this is necessary just so svnsync can /report/ which of the
> two fixes it made, right?  If so, we might move on with teaching svnsync
> to fix newlines too, and add the "Report back how many properties had
> their newlines changed" functionality separately.  (easier to review,
> and I don't think "512 properties were re-newlined" is /that/ useful to
> know for most people)
> 
>>> An alternative to extending svn_subst_translate_string() is to add a
>>> public API function that does the re-encoding and another that
>>> normalizes line endings. I think, however, that extending
>>> svn_subst_translate_string() is better because though the current
>>> implementation of svn_subst_translate_string() re-encodes, then
>>> normalizes line endings, the single API function allows for the
>>> possibility of a future improvement in efficiency as a result of
>>> performing both translations simultaneously.
>>> 
>>> 
>>> 
>>> Attached are a revised patch and log message.
>>> <log message.txt><patch.txt>
> 
>> [[[
>> Adds a public API function, svn_subst_translate_string2(), an extension of
>> svn_subst_translate_string(), that has two, additional output parameters for
>> determining whether re-encoding and/or line ending translation were 
>> performed.
>> 
> 
> Grammar police: No comma after "two".
> 
>> 
>> As discussed at:
>>  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
>>  http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020
>> 
>> 
>> * subversion/include/svn_subst.h
>>  (svn_subst_translate_string2): New function.
>>  (svn_subst_translate_string): Deprecated in favor of
>>    svn_subst_translate_string2().
>> 
>> * subversion/libsvn_subr/subst.c
> 
> What's happening in this file is essentially "Add the parameters to
> <this> public API, punch them through everything, and finally in
> translate_newline() make use of them".  Could you add something
> (a paragraph?) to the log message that makes that clear?
> 
> Or, another option is two have two subst.c entries in the log message
> --- one for the 'punching through' and one for the interesting changes
> --- though we have less precedent for that.
> 
> I'll leave it to you to find a way to make the overall picture clear :)
> 
>>  (translate_newline): Added the new parameter TRANSLATED_EOL, the value at
>>    which is set to TRUE if the newline string in EOL_STR is different than
>>    the newline string in NEWLINE_BUF.
>>  (translation_baton): Added the TRANSLATED_EOL field.
>>  (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
>>    passed to the resulting translation_baton.
>>  (translate_chunk): Modified the three calls to translate_newline() to pass
>>    the TRANSLATED_EOL pointer from the translation baton.
>>  (stream_translated): New static function. Its implementation is the old
>>    implementation of svn_subst_stream_translated(), but accepting another
>>    parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
>>    that it creates.
>>  (svn_subst_stream_translated): Now a wrapper for stream_translated().
>>  (translate_cstring): New static function. Its implementation is the old
>>    implementation of svn_subst_translate_cstring2(), but modified to accept
>>    another parameter, TRANSLATED_EOL, that is passed to stream_translated().
>>  (svn_subst_translate_cstring2): Now a wrapper for translate_cstring().
>>  (svn_subst_translate_string2): New function. The task of recording whether
>>    it translated a line ending is delegated to translate_cstring().
>> ]]]
> 
>> Index: subversion/include/svn_subst.h
>> ===================================================================
>> --- subversion/include/svn_subst.h   (revision 1004022)
>> +++ subversion/include/svn_subst.h   (working copy)
>> @@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea
>> 
>> /* EOL conversion and character encodings */
>> 
>> -/** Translate the data in @a value (assumed to be in encoded in charset
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. 
>> Callers
>> + * should use svn_subst_translate_string2().
>> + * 
>> + * Translate the data in @a value (assumed to be encoded in charset
>>  * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>>  * then assume that @a value is in the system-default language encoding.
>>  * Return the translated data in @a *new_value, allocated in @a pool.
>>  */
> 
> Please make the old docstring only describe the differences from the new
> version of the function, and delete the rest of the docstring here.
> Follow the examples in the other header files.
> 
>> +SVN_DEPRECATED
>> svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>>                                         const svn_string_t *value,
>>                                         const char *encoding,
>>                                         apr_pool_t *pool);
>> 
>> +/** Translate the data in @a value (assumed to be encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default character encoding.
> 
> You changed 'language encoding' to 'character encoding'.
> 
>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>> + * character of @a value in the source charset was translated to UTF-8;  to
>> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF;  to @c FALSE otherwise.
> 
> Use paragraphs please.
> 
>> + * Return the translated
>> + * data in @a *new_value, allocated in @a pool.
>> + *
>> + * @since New in 1.7
> 
> Trailing period missing.
> 
> (Vim tip: ^X^L completes a line.)
> 
>> + */
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> +                            svn_boolean_t *translated_to_utf8,
>> +                            svn_boolean_t *translated_line_endings,
>> +                            const svn_string_t *value,
>> +                            const char *encoding,
>> +                            apr_pool_t *pool);
> 
> Can you switch this function to the two-pools paradigm
> (result_pool/scratch_pool) while here?  There are examples in
> libsvn_wc/wc_db.h (and elsewhere).
> 
> It already uses scratch_pool internally.
> 
>> +
>> /** Translate the data in @a value from UTF8 and LF line-endings into
>>  * native locale and native line-endings, or to the output locale if
>>  * @a for_output is TRUE.  Return the translated data in @a
>> Index: subversion/libsvn_subr/subst.c
>> ===================================================================
>> --- subversion/libsvn_subr/subst.c   (revision 1004022)
>> +++ subversion/libsvn_subr/subst.c   (working copy)
>> @@ -620,6 +620,10 @@ translate_keyword(char *buf,
>>    newline in the file, and copy it to {SRC_FORMAT, *SRC_FORMAT_LEN} to
>>    use for later consistency checks.
>> 
>> +   Sets *TRANSLATED_EOL to TRUE if TRANSLATED_EOL is not NULL and the 
>> newline
>> +   string that was written (EOL_STR) is not the same as the newline string
>> +   that was translated (NEWLINE_BUF).
>> +
>>    Note: all parameters are required even if REPAIR is TRUE.
>>    ### We could require that REPAIR must not change across a sequence of
>>        calls, and could then optimize by not using SRC_FORMAT at all if
>> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str,
>>                   const char *newline_buf,
>>                   apr_size_t newline_len,
>>                   svn_stream_t *dst,
>> -                  svn_boolean_t repair)
>> +                  svn_boolean_t repair,
>> +                  svn_boolean_t *translated_eol)
>> {
>>   /* If we've seen a newline before, compare it with our cache to
>>      check for consistency, else cache it for future comparisons. */
>> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str,
>>       strncpy(src_format, newline_buf, newline_len);
>>       *src_format_len = newline_len;
>>     }
>> -  /* Write the desired newline */
>> -  return translate_write(dst, eol_str, eol_str_len);
>> +
>> +  /* Translate the newline */
>> +  SVN_ERR(translate_write(dst, eol_str, eol_str_len));
>> +
>> +  if (translated_eol)
>> +    {
> 
> Right now, this if() will be done every time translate_newline() is
> called.  Could you rework the logic to check it fewer times?
> 
> e.g., to only check it once per translation_baton if possible, or to
> skip checking it if it had been set already, or if REPAIR is false and
> a newline had been seen already, etc.  (These are just ideas, you don't
> have to implement all of them!)
> 
> Stefan2 indicates that the subst() code is a rather expensive part of
> 'checkout' and 'export', so doing the check once-per-file (instead of
> once-per-line) will be nice.  (However, I realize that right now you
> only expose translated_eol for the svn_string_t API.)
> 
>> +      if (newline_len != eol_str_len ||
>> +          strncmp(newline_buf, eol_str, newline_len))
>> +        *translated_eol = TRUE;
>> +    }
>> +  return SVN_NO_ERROR;
>> }
>> 
>> 
>> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu
>>                            const char *encoding,
>>                            apr_pool_t *pool)
>> {
>> +  return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding,
>> +                                     pool);
>> +}
>> +
> 
> Please move the above wrapper to deprecated.c.
> 
>> +
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> +                            svn_boolean_t *translated_to_utf8,
>> +                            svn_boolean_t *translated_line_endings,
>> +                            const svn_string_t *value,
>> +                            const char *encoding,
>> +                            apr_pool_t *pool)
>> +{
>>   const char *val_utf8;
>>   const char *val_utf8_lf;
>>   apr_pool_t *scratch_pool = svn_pool_create(pool);
>> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu
>>       SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool));
>>     }
>> 
> 
> Not related to your patch, but the above if/else block calls the UTF-8
> translation routines on value->data.  Since this is specifically the
> translate_string() API (and there is a separate translate_cstring()
> API), I think either we should fix this (the whole reason svn_string_t
> exists is to allow embedded NULs) or at least document this limitation
> in the API docs.
> 
>> -  SVN_ERR(svn_subst_translate_cstring2(val_utf8,
>> -                                       &val_utf8_lf,
>> -                                       "\n",  /* translate to LF */
>> -                                       FALSE, /* no repair */
>> -                                       NULL,  /* no keywords */
>> -                                       FALSE, /* no expansion */
>> -                                       scratch_pool));
>> +  if (translated_to_utf8)
>> +    *translated_to_utf8 = (strcmp(value->data, val_utf8) != 0);
>> 
>> +  SVN_ERR(translate_cstring(&val_utf8_lf,
>> +                             translated_line_endings,
>> +                             val_utf8,
>> +                             "\n",  /* translate to LF */
>> +                             FALSE, /* no repair */
>> +                             NULL,  /* no keywords */
>> +                             FALSE, /* no expansion */
>> +                             scratch_pool));
>> +
>>   *new_value = svn_string_create(val_utf8_lf, pool);
>>   svn_pool_destroy(scratch_pool);
>>   return SVN_NO_ERROR;
> 
> Lots of plumbing, but looks good :-)
> 
> Thanks,
> 
> Daniel
>

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

Reply via email to