Re: The failing test in utf8-cmdline

Timofei Zhakov Mon, 18 May 2026 14:01:05 -0700

On Sat, May 16, 2026 at 3:06 PM Timofei Zhakov <[email protected]> wrote:
>
> On Fri, May 15, 2026 at 4:02 PM Jun Omae <[email protected]> wrote:
> >
> > On 2026/05/15 0:28, Timofei Zhakov wrote:
> > > There is a test called basic_tests.py:argv_with_best_fit_chars. It
> > > checks that svn rejects Unicode symbols. Functionality which was
> > > illegal before changes introduced in that branch.
> >
> > In the branch, svn command receives the arguments as utf-8 bytes, but the
> > output of the pipe is applied best-fit encoding conversion.
> >
> > [[[
> > diff --git a/subversion/tests/cmdline/basic_tests.py 
> > b/subversion/tests/cmdline/basic_tests.py
> > index 88f43bfae7..edb697b795 100755
> > --- a/subversion/tests/cmdline/basic_tests.py
> > +++ b/subversion/tests/cmdline/basic_tests.py
> > @@ -3357,20 +3357,22 @@ def argv_with_best_fit_chars(sbox):
> >        yield chr(c), mbcs
> >
> >    count = 0
> > -  # E721113: Conversion from UTF-16 failed: No mapping for the Unicode
> > -  # character exists in the target multi-byte code page.
> > -  expected_stderr = 'svn: E721113: '
> > +  # The argument is received as utf-8 bytes, but the output to the pipe
> > +  # is applied best-fit encoding conversion.
> >    for wc, mbcs in iter_bestfit_chars():
> >      count += 1
> >      logger.info('Code page %r - U+%04x -> 0x%s', codepage, ord(wc), 
> > mbcs.hex())
> >      if mbcs == b'"':
> > -      svntest.actions.run_and_verify_svn2(None, expected_stderr, 1, 'help',
> > +      expected_stderr = r'^"foo" "bar": unknown command'
> > +      svntest.actions.run_and_verify_svn2(None, expected_stderr, 0, 'help',
> >                                            'foo{0} {0}bar'.format(wc))
> >      elif mbcs == b'\\':
> > -      svntest.actions.run_and_verify_svn2(None, expected_stderr, 1, 'help',
> > +      expected_stderr = r'^"foo\\" \\"bar": unknown command'
> > +      svntest.actions.run_and_verify_svn2(None, expected_stderr, 0, 'help',
> >                                            'foo{0}" {0}"bar'.format(wc))
> >      elif mbcs == b' ':
> > -      svntest.actions.run_and_verify_svn2(None, expected_stderr, 1, 'help',
> > +      expected_stderr = r'^"foo bar": unknown command'
> > +      svntest.actions.run_and_verify_svn2(None, expected_stderr, 0, 'help',
> >                                            'foo{0}bar'.format(wc))
> >    if count == 0:
> >      raise svntest.Skip('No best fit characters in code page %r' % codepage)
> > ]]]
>
> I tested this patch and can confirm that it works. I don't know why
> but as far as I remember I was doing exactly the same thing, but it
> didn't work for me.
>
> I remember I once heard that "everything looks like physics if you
> don't know magic". That's exactly the case. Sometimes we just need a
> pair of fresh eyes. :-)
>
> +1 for the changes
>
> > Recently, I'm trying 1.14.x with utf-8 code page using activeCodePage
> > manifest [1]. It almost works fine (e.g. add emoji filenames and checkout,
> > ...) however output to stderr is garbled and not fixed yet.
> >
> > [1] 
> > https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
>
> That sounds interesting. If as you are saying output is converted to
> the local encoding, it introduces a lot of inconsistency and yeah we
> have no emojis.
>
> Since it's almost always that the encoding is UTF-8 on the majority of
> Unix systems, I think it makes a lot of sense to take the same
> approach on Windows.


I saw you committed the patch in r1934334. Thank you so much!

I guess the CI is back to green! I'll post an email on dev@ about
merging the branch into trunk.

-- 
Timofei Zhakov

Re: The failing test in utf8-cmdline

Reply via email to