On Thu, Oct 7, 2010 at 8:09 AM, Julian Foad <julian.f...@wandisco.com> wrote: > Hi Paul. > > It's good to see a fix. I think, however, you could do this more simply > and avoid some of the difficulties completely.
Julian, Thanks as always for the review, comments inline. > On Thu, 2010-09-30, pbu...@apache.org wrote: >> Author: pburba >> Date: Thu Sep 30 20:21:19 2010 >> New Revision: 1003238 >> >> URL: http://svn.apache.org/viewvc?rev=1003238&view=rev >> Log: >> Fix issue #3721 'redirection of svn propget output corrupted with large >> property values'. > > And this change also fixes an EOL-style inconsistency, which was > presumably happening consistently with property values of any size, > doesn't it? Yes. > I think this change also introduces a new, similar, EOL-style > inconsistency, in a different place; see last review comment at end of > email. You are right, there is a problem, though it might not be exactly what you think: If the property requires translation per svn_prop_needs_translation (i.e. it's an svn:* prop) we will already have run the property value through svn_subst_detranslate_string(), which gives us native line endings. So in these cases the only non-native newline will be the trailing newline after the propval: /* Add an extra newline to the value before indenting, so that * every line of output has the indentation whether the value * already ended in a newline or not. */ const char *newval = apr_psprintf(pool, "%s\n", propval->data); ^^ If we return to the pre-r1003238 behavior of using printf, then multiline svn:* prop values (e.g. svn:mergeinfo) are passed to printf with native newlines and then printf tries (on my Windows box at any rate) to convert those eols to native, resulting in redirected output like this: Properties on 'iota':<CR><LF> propname<CR><LF> subversion/branches/1.5.x:872364-874936<CR><CR><LF> subversion/branches/1.5.x-34184:874657-874741<CR><CR><LF> subversion/branches/1.5.x-34432:874744-874798<CR><CR><LF> . . So avoiding printf's eol conversion by writing the the out stream works better here. > The difficulty with EOLs is that the native printf() family of functions > typically converts to native EOLs, and the svn_stream_write() family > doesn't, even when writing to svn_stream_for_stdout(). Therefore we > have to be careful when mixing the two families. > >> This change makes all the writes to stdout, by svn pg, via the svn_stream_t * >> that we already use to print the 'Properties on' header. Note that this >> stream *is* attached to stdout, but avoiding the mix of stream writes and >> printfs solves issue #3721 on Windows. > > It looks like the root of the corruption is also the mixing of native > 'printf' functions with writes to a 'svn_stream_for_stdout()' stream. > Do you agree that that's the case, and that therefore I should write a > warning about this in the documentation of 'svn_stream_for_stdout()'? I agree that is what the symptoms point to. I couldn't definitely pinpoint the exact cause however. > Perhaps not now, but we should consider whether we can eliminate the > corruption by doing something to svn_stream_for_stdout() and/or the > native 'stdout' stream. It might be possible by turning off some > buffering that is happening, or something like that, or probably it > could be done by implementing the stream_for_stdout as a stream class > that forwards to the printf family of functions. > > What we probably should do now is avoid mixing printf with svn streams > in this 'svn propget' code. From what I can see, there is little or no > reason why svn_cl__propget() was using an svn stream for some of its > printing at all. It looks like it would be much simpler to remove that > stream (the variable called 'out') and just use printf-style functions > throughout, as is done for 'proplist' and most of the rest of the UI > (except for 'cat' and 'blame'). Note that we have svn_cmdline_printf() > which is a handy encoding-converting variant of the native 'printf' > family. > > Does that sound good? Assuming there was never a reason to mix printfs with stream writes in the first place, then yes, this would be a much simpler fix. As mentioned above though, we will need to deal with printf's conversion to native eols when the string it's printing already has them. Paul > If so, then all the comments below become moot. > (I wrote some of them before I figured all this out, and they may still > be relevant if I'm wrong about some of this.) > > >> * subversion/svn/cl.h >> >> (svn_cl__print_prop_hash): Support printing to a passed in svn_stream_t *. >> >> * subversion/svn/propget-cmd.c >> >> (print_properties): Make the header's line endings native. Print the >> property names and values to the same svn_stream_t * we already use >> print the headers to. >> >> * subversion/svn/proplist-cmd.c >> >> (proplist_receiver, svn_cl__proplist): Update calls to >> svn_cl__print_prop_hash(), keeping old behavior. >> >> * subversion/svn/props.c >> >> (svn_cl__print_prop_hash): Support printing to a passed in svn_stream_t *, >> otherwise fall-back to old behavior of using printf(). >> >> >> * subversion/tests/cmdline/prop_tests.py >> >> (propget_redirection): Remove comment re failure status. >> >> (test_list): Remove XFail. > > >> Modified: subversion/trunk/subversion/svn/cl.h >> URL: >> http://svn.apache.org/viewvc/subversion/trunk/subversion/svn/cl.h?rev=1003238&r1=1003237&r2=1003238&view=diff >> ============================================================================== >> --- subversion/trunk/subversion/svn/cl.h (original) >> +++ subversion/trunk/subversion/svn/cl.h Thu Sep 30 20:21:19 2010 >> @@ -417,15 +417,18 @@ svn_cl__print_status_xml(const char *pat >> apr_pool_t *pool); >> >> >> -/* Print a hash that maps property names (char *) to property values >> - (svn_string_t *). The names are assumed to be in UTF-8 format; >> +/* Print to stdout a hash that maps property names (char *) to property >> + values (svn_string_t *). The names are assumed to be in UTF-8 format; >> the values are either in UTF-8 (the special Subversion props) or >> plain binary values. >> >> + If OUT is not NULL, then write to it rather than stdout. > > This could do with pointing out that the two modes should not be mixed. > > This seems like unnecessary complexity, allowing the caller to choose > whether to write to the native stdout stream or write to a supplied > stream which in practice (at present) is always connected to stdout. > Just support one or the other and make both callers consistent. > >> + >> If NAMES_ONLY is true, print just names, else print names and >> values. */ >> svn_error_t * >> -svn_cl__print_prop_hash(apr_hash_t *prop_hash, >> +svn_cl__print_prop_hash(svn_stream_t *out, >> + apr_hash_t *prop_hash, >> svn_boolean_t names_only, >> apr_pool_t *pool); >> >> >> Modified: subversion/trunk/subversion/svn/propget-cmd.c >> URL: >> http://svn.apache.org/viewvc/subversion/trunk/subversion/svn/propget-cmd.c?rev=1003238&r1=1003237&r2=1003238&view=diff >> ============================================================================== >> --- subversion/trunk/subversion/svn/propget-cmd.c (original) >> +++ subversion/trunk/subversion/svn/propget-cmd.c Thu Sep 30 20:21:19 2010 >> @@ -140,6 +140,12 @@ print_properties(svn_stream_t *out, >> ? _("Properties on '%s':\n") >> : "%s - ", filename); >> SVN_ERR(svn_cmdline_cstring_from_utf8(&header, header, iterpool)); >> + SVN_ERR(svn_subst_translate_cstring2(header, &header, >> + APR_EOL_STR, /* 'native' >> eol */ >> + FALSE, /* no repair */ >> + NULL, /* no keywords */ >> + FALSE, /* no expansion */ >> + iterpool)); > > Strictly speaking, the newline conversion should come before the > translation from UTF8 to native encoding, in case a newline sequence is > encoded differently in the native encoding. (I guess we probably don't > support such encodings. If we do, there are other places where we do it > wrongly.) > > That's a lot of source code just to print a string in native style. > Four statements in total. Can't we write and use a dedicated function > or a dedicated stream class that does this? > > Or write > > ? _("Properties on '%s':" APR_EOL_STR) > > and omit the translate_cstring. > > >> SVN_ERR(stream_write(out, header, strlen(header))); >> } >> >> @@ -149,7 +155,7 @@ print_properties(svn_stream_t *out, >> apr_hash_t *hash = apr_hash_make(iterpool); >> >> apr_hash_set(hash, pname_utf8, APR_HASH_KEY_STRING, propval); >> - svn_cl__print_prop_hash(hash, FALSE, iterpool); >> + svn_cl__print_prop_hash(out, hash, FALSE, iterpool); >> } >> else >> { >> >> Modified: subversion/trunk/subversion/svn/proplist-cmd.c >> URL: >> http://svn.apache.org/viewvc/subversion/trunk/subversion/svn/proplist-cmd.c?rev=1003238&r1=1003237&r2=1003238&view=diff >> ============================================================================== >> --- subversion/trunk/subversion/svn/proplist-cmd.c (original) >> +++ subversion/trunk/subversion/svn/proplist-cmd.c Thu Sep 30 20:21:19 2010 >> @@ -98,7 +98,8 @@ proplist_receiver(void *baton, >> >> if (!opt_state->quiet) >> SVN_ERR(svn_cmdline_printf(pool, _("Properties on '%s':\n"), >> name_local)); >> - return svn_cl__print_prop_hash(prop_hash, (! opt_state->verbose), pool); >> + return svn_cl__print_prop_hash(NULL, prop_hash, (! opt_state->verbose), >> + pool); >> } >> >> >> @@ -159,7 +160,7 @@ svn_cl__proplist(apr_getopt_t *os, >> rev)); >> >> SVN_ERR(svn_cl__print_prop_hash >> - (proplist, (! opt_state->verbose), scratch_pool)); >> + (NULL, proplist, (! opt_state->verbose), scratch_pool)); >> } >> } >> else /* operate on normal, versioned properties (not revprops) */ >> >> Modified: subversion/trunk/subversion/svn/props.c >> URL: >> http://svn.apache.org/viewvc/subversion/trunk/subversion/svn/props.c?rev=1003238&r1=1003237&r2=1003238&view=diff >> ============================================================================== >> --- subversion/trunk/subversion/svn/props.c (original) >> +++ subversion/trunk/subversion/svn/props.c Thu Sep 30 20:21:19 2010 >> @@ -82,7 +82,8 @@ svn_cl__revprop_prepare(const svn_opt_re >> >> >> svn_error_t * >> -svn_cl__print_prop_hash(apr_hash_t *prop_hash, >> +svn_cl__print_prop_hash(svn_stream_t *out, >> + apr_hash_t *prop_hash, >> svn_boolean_t names_only, >> apr_pool_t *pool) >> { >> @@ -93,6 +94,7 @@ svn_cl__print_prop_hash(apr_hash_t *prop >> const char *pname = svn__apr_hash_index_key(hi); >> svn_string_t *propval = svn__apr_hash_index_val(hi); >> const char *pname_stdout; >> + apr_size_t len; >> >> if (svn_prop_needs_translation(pname)) >> SVN_ERR(svn_subst_detranslate_string(&propval, propval, >> @@ -100,18 +102,45 @@ svn_cl__print_prop_hash(apr_hash_t *prop >> >> SVN_ERR(svn_cmdline_cstring_from_utf8(&pname_stdout, pname, pool)); >> >> - /* ### We leave these printfs for now, since if propval wasn't >> translated >> - * above, we don't know anything about its encoding. In fact, it >> - * might be binary data... */ >> - printf(" %s\n", pname_stdout); > > So this 'printf' used to translate to native EOLs... > >> + if (out) >> + { >> + pname_stdout = apr_psprintf(pool, " %s\n", pname_stdout); >> + SVN_ERR(svn_subst_translate_cstring2(pname_stdout, &pname_stdout, >> + APR_EOL_STR, /* 'native' eol >> */ >> + FALSE, /* no repair */ >> + NULL, /* no keywords */ >> + FALSE, /* no expansion */ >> + pool)); >> + >> + len = strlen(pname_stdout); >> + SVN_ERR(svn_stream_write(out, pname_stdout, &len)); > > ... whereas svn_stream_write() doesn't, so you had to insert a manual > translation in order to preserve the functionality while switching to > stream functions. (Unlike in propget-cmd.c above, here we don't convert > from UTF8 because we don't know whether it is UTF8.) > > Same comments as above: maybe use SVN_EOL_STR or a dedicated printing > function, instead of the translation function. (Neither a file name nor > a prop name is allowed to contain a newline.) > >> + } >> + else >> + { >> + /* ### We leave these printfs for now, since if propval wasn't >> + translated above, we don't know anything about its encoding. >> + In fact, it might be binary data... */ >> + printf(" %s\n", pname_stdout); > > The comment that said "We leave these printfs for now ..." referred to > the fact that we aren't attempting to translate the character encoding > to the native one, and that comment should apply to both branches of > this new if/else. > > The comment applicable to this branch is "If OUT is null we explicitly > use the native printf family, not an svn_stream." > >> + } >> + >> if (!names_only) >> { >> /* Add an extra newline to the value before indenting, so that >> * every line of output has the indentation whether the value >> * already ended in a newline or not. */ >> const char *newval = apr_psprintf(pool, "%s\n", propval->data); >> - >> - printf("%s", svn_cl__indent_string(newval, " ", pool)); >> + const char *indented_newval = svn_cl__indent_string(newval, >> + " ", >> + pool); >> + if (out) >> + { >> + len = strlen(indented_newval); >> + SVN_ERR(svn_stream_write(out, indented_newval, &len)); > > Bug? You lost the conversion to native newlines that 'printf' provided, > and haven't replaced it. > > >> + } >> + else >> + { >> + printf("%s", indented_newval); >> + } >> } >> } > [...] > > > - Julian > > >