Hi Daniel (T), Since in your earlier post you mentioned that you didn't mind a friendly reminder...
I thought I would return this thread to the top of the list - Just i case you had missed Daniel (Shahaf) 's comments. Gavin "Beau" Baumanis On 02/11/2010, at 12:35 AM, Daniel Shahaf wrote: > Gavin Beau Baumanis wrote on Mon, Oct 18, 2010 at 21:06:31 +1100: >> Ping. This patch submission has received no comments. >> > > Thanks, Gavin. @Daniel, sorry for the delay. > >> On 04/10/2010, at 5:55 AM, Daniel Trebbien wrote: >> >>> On Fri, Oct 1, 2010 at 3:57 AM, Julian Foad <julian.f...@wandisco.com> >>> wrote: >>>>> Adds a public API function, svn_subst_translate_string2(), an extension of >>>>> svn_subst_translate_string(), that has two, additional output parameters >>>>> for >>>>> determining whether re-encoding and/or line ending translation were >>>>> performed. >>>> >>>> If we're going to add this functionality to _translate_string(), >>>> shouldn't we also add it to the other variants - _detranslate_string, >>>> _translate_cstring2, _stream_translated, _copy_and_translate4? >>>> >>>> I see you're adding the infrastructure into the (new) bodies of >>>> _translate_cstring2 and _stream_translated, just not (yey) exposing it >>>> in their APIs. >>>> > > If we can provide the information cheaply, I see no reason not to let > the APIs provide it. As some of these APIs are already revved for 1.7, > we might as well add the new information to them now. > >>>> I'm not sure. The set of 'translation' functions is already messy and >>>> it's not clear to me how useful this new functionality will be. >>> >>> I originally began working on svn_subst_translate_string2() because I >>> could not find a combination of public API functions that would allow >>> me to determine whether the line endings changed when a string was >>> both re-encoded to UTF-8 and normalized to LF line endings. The most >>> applicable, svn_subst_translate_string(), performs both translations >>> without indicating whether it re-encoded the string or normalized a >>> line ending. >>> > > And all of this is necessary just so svnsync can /report/ which of the > two fixes it made, right? If so, we might move on with teaching svnsync > to fix newlines too, and add the "Report back how many properties had > their newlines changed" functionality separately. (easier to review, > and I don't think "512 properties were re-newlined" is /that/ useful to > know for most people) > >>> An alternative to extending svn_subst_translate_string() is to add a >>> public API function that does the re-encoding and another that >>> normalizes line endings. I think, however, that extending >>> svn_subst_translate_string() is better because though the current >>> implementation of svn_subst_translate_string() re-encodes, then >>> normalizes line endings, the single API function allows for the >>> possibility of a future improvement in efficiency as a result of >>> performing both translations simultaneously. >>> >>> >>> >>> Attached are a revised patch and log message. >>> <log message.txt><patch.txt> > >> [[[ >> Adds a public API function, svn_subst_translate_string2(), an extension of >> svn_subst_translate_string(), that has two, additional output parameters for >> determining whether re-encoding and/or line ending translation were >> performed. >> > > Grammar police: No comma after "two". > >> >> As discussed at: >> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550 >> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020 >> >> >> * subversion/include/svn_subst.h >> (svn_subst_translate_string2): New function. >> (svn_subst_translate_string): Deprecated in favor of >> svn_subst_translate_string2(). >> >> * subversion/libsvn_subr/subst.c > > What's happening in this file is essentially "Add the parameters to > <this> public API, punch them through everything, and finally in > translate_newline() make use of them". Could you add something > (a paragraph?) to the log message that makes that clear? > > Or, another option is two have two subst.c entries in the log message > --- one for the 'punching through' and one for the interesting changes > --- though we have less precedent for that. > > I'll leave it to you to find a way to make the overall picture clear :) > >> (translate_newline): Added the new parameter TRANSLATED_EOL, the value at >> which is set to TRUE if the newline string in EOL_STR is different than >> the newline string in NEWLINE_BUF. >> (translation_baton): Added the TRANSLATED_EOL field. >> (create_translation_baton): Added a new parameter TRANSLATED_EOL that is >> passed to the resulting translation_baton. >> (translate_chunk): Modified the three calls to translate_newline() to pass >> the TRANSLATED_EOL pointer from the translation baton. >> (stream_translated): New static function. Its implementation is the old >> implementation of svn_subst_stream_translated(), but accepting another >> parameter, TRANSLATED_EOL, that is passed to the in/out translation batons >> that it creates. >> (svn_subst_stream_translated): Now a wrapper for stream_translated(). >> (translate_cstring): New static function. Its implementation is the old >> implementation of svn_subst_translate_cstring2(), but modified to accept >> another parameter, TRANSLATED_EOL, that is passed to stream_translated(). >> (svn_subst_translate_cstring2): Now a wrapper for translate_cstring(). >> (svn_subst_translate_string2): New function. The task of recording whether >> it translated a line ending is delegated to translate_cstring(). >> ]]] > >> Index: subversion/include/svn_subst.h >> =================================================================== >> --- subversion/include/svn_subst.h (revision 1004022) >> +++ subversion/include/svn_subst.h (working copy) >> @@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea >> >> /* EOL conversion and character encodings */ >> >> -/** Translate the data in @a value (assumed to be in encoded in charset >> +/** @deprecated Provided for backward compatibility with the 1.6 API. >> Callers >> + * should use svn_subst_translate_string2(). >> + * >> + * Translate the data in @a value (assumed to be encoded in charset >> * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL, >> * then assume that @a value is in the system-default language encoding. >> * Return the translated data in @a *new_value, allocated in @a pool. >> */ > > Please make the old docstring only describe the differences from the new > version of the function, and delete the rest of the docstring here. > Follow the examples in the other header files. > >> +SVN_DEPRECATED >> svn_error_t *svn_subst_translate_string(svn_string_t **new_value, >> const svn_string_t *value, >> const char *encoding, >> apr_pool_t *pool); >> >> +/** Translate the data in @a value (assumed to be encoded in charset >> + * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL, >> + * then assume that @a value is in the system-default character encoding. > > You changed 'language encoding' to 'character encoding'. > >> + * If @a translated_to_utf8 is not @c NULL, then >> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one >> + * character of @a value in the source charset was translated to UTF-8; to >> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then >> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one >> + * line ending was changed to LF; to @c FALSE otherwise. > > Use paragraphs please. > >> + * Return the translated >> + * data in @a *new_value, allocated in @a pool. >> + * >> + * @since New in 1.7 > > Trailing period missing. > > (Vim tip: ^X^L completes a line.) > >> + */ >> +svn_error_t * >> +svn_subst_translate_string2(svn_string_t **new_value, >> + svn_boolean_t *translated_to_utf8, >> + svn_boolean_t *translated_line_endings, >> + const svn_string_t *value, >> + const char *encoding, >> + apr_pool_t *pool); > > Can you switch this function to the two-pools paradigm > (result_pool/scratch_pool) while here? There are examples in > libsvn_wc/wc_db.h (and elsewhere). > > It already uses scratch_pool internally. > >> + >> /** Translate the data in @a value from UTF8 and LF line-endings into >> * native locale and native line-endings, or to the output locale if >> * @a for_output is TRUE. Return the translated data in @a >> Index: subversion/libsvn_subr/subst.c >> =================================================================== >> --- subversion/libsvn_subr/subst.c (revision 1004022) >> +++ subversion/libsvn_subr/subst.c (working copy) >> @@ -620,6 +620,10 @@ translate_keyword(char *buf, >> newline in the file, and copy it to {SRC_FORMAT, *SRC_FORMAT_LEN} to >> use for later consistency checks. >> >> + Sets *TRANSLATED_EOL to TRUE if TRANSLATED_EOL is not NULL and the >> newline >> + string that was written (EOL_STR) is not the same as the newline string >> + that was translated (NEWLINE_BUF). >> + >> Note: all parameters are required even if REPAIR is TRUE. >> ### We could require that REPAIR must not change across a sequence of >> calls, and could then optimize by not using SRC_FORMAT at all if >> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str, >> const char *newline_buf, >> apr_size_t newline_len, >> svn_stream_t *dst, >> - svn_boolean_t repair) >> + svn_boolean_t repair, >> + svn_boolean_t *translated_eol) >> { >> /* If we've seen a newline before, compare it with our cache to >> check for consistency, else cache it for future comparisons. */ >> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str, >> strncpy(src_format, newline_buf, newline_len); >> *src_format_len = newline_len; >> } >> - /* Write the desired newline */ >> - return translate_write(dst, eol_str, eol_str_len); >> + >> + /* Translate the newline */ >> + SVN_ERR(translate_write(dst, eol_str, eol_str_len)); >> + >> + if (translated_eol) >> + { > > Right now, this if() will be done every time translate_newline() is > called. Could you rework the logic to check it fewer times? > > e.g., to only check it once per translation_baton if possible, or to > skip checking it if it had been set already, or if REPAIR is false and > a newline had been seen already, etc. (These are just ideas, you don't > have to implement all of them!) > > Stefan2 indicates that the subst() code is a rather expensive part of > 'checkout' and 'export', so doing the check once-per-file (instead of > once-per-line) will be nice. (However, I realize that right now you > only expose translated_eol for the svn_string_t API.) > >> + if (newline_len != eol_str_len || >> + strncmp(newline_buf, eol_str, newline_len)) >> + *translated_eol = TRUE; >> + } >> + return SVN_NO_ERROR; >> } >> >> >> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu >> const char *encoding, >> apr_pool_t *pool) >> { >> + return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding, >> + pool); >> +} >> + > > Please move the above wrapper to deprecated.c. > >> + >> +svn_error_t * >> +svn_subst_translate_string2(svn_string_t **new_value, >> + svn_boolean_t *translated_to_utf8, >> + svn_boolean_t *translated_line_endings, >> + const svn_string_t *value, >> + const char *encoding, >> + apr_pool_t *pool) >> +{ >> const char *val_utf8; >> const char *val_utf8_lf; >> apr_pool_t *scratch_pool = svn_pool_create(pool); >> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu >> SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool)); >> } >> > > Not related to your patch, but the above if/else block calls the UTF-8 > translation routines on value->data. Since this is specifically the > translate_string() API (and there is a separate translate_cstring() > API), I think either we should fix this (the whole reason svn_string_t > exists is to allow embedded NULs) or at least document this limitation > in the API docs. > >> - SVN_ERR(svn_subst_translate_cstring2(val_utf8, >> - &val_utf8_lf, >> - "\n", /* translate to LF */ >> - FALSE, /* no repair */ >> - NULL, /* no keywords */ >> - FALSE, /* no expansion */ >> - scratch_pool)); >> + if (translated_to_utf8) >> + *translated_to_utf8 = (strcmp(value->data, val_utf8) != 0); >> >> + SVN_ERR(translate_cstring(&val_utf8_lf, >> + translated_line_endings, >> + val_utf8, >> + "\n", /* translate to LF */ >> + FALSE, /* no repair */ >> + NULL, /* no keywords */ >> + FALSE, /* no expansion */ >> + scratch_pool)); >> + >> *new_value = svn_string_create(val_utf8_lf, pool); >> svn_pool_destroy(scratch_pool); >> return SVN_NO_ERROR; > > Lots of plumbing, but looks good :-) > > Thanks, > > Daniel >