On Wed, May 20, 2026 at 9:20 AM Branko Čibej <[email protected]> wrote:

> On Wed, 20 May 2026, 14:40 Daniel Sahlberg, <[email protected]>
> wrote:
>
>> Den ons 20 maj 2026 kl 10:55 skrev <[email protected]>:
>> >
>> > Author: rinrab
>> > Date: Wed May 20 08:55:33 2026
>> > New Revision: 1934426
>> >
>> > Log:
>> > Use UTF-8 alignement for the 'author' column in the 'svn blame' command.
>> >
>> > * subversion/svn/blame-cmd.c
>> >   (#include): Add svn_utf_private.h.
>> >   (print_line_info): Call svn_utf__cstring_utf8_align_right() to
>> >    prepare author.
>> >
>> > Modified:
>> >    subversion/trunk/subversion/svn/blame-cmd.c
>> >
>> > Modified: subversion/trunk/subversion/svn/blame-cmd.c
>> >
>> ==============================================================================
>> > --- subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:30:24
>> 2026        (r1934425)
>> > +++ subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:55:33
>> 2026        (r1934426)
>> > @@ -24,6 +24,7 @@
>> >
>> >  /*** Includes. ***/
>> >
>> > +#include "private/svn_utf_private.h"
>> >  #include "svn_client.h"
>> >  #include "svn_error.h"
>> >  #include "svn_dirent_uri.h"
>> > @@ -150,8 +151,9 @@ print_line_info(svn_stream_t *out,
>> >            time_stdout = "                                           -";
>> >          }
>> >
>> > -      SVN_ERR(svn_stream_printf(out, pool, "%s %10s %s ", rev_str,
>> > -                                author ? author : "         -",
>> > +      SVN_ERR(svn_stream_printf(out, pool, "%s %s %s ", rev_str,
>> > +                                svn_utf__cstring_utf8_align_right(
>> > +                                    author ? author : "-", 10, pool),
>> >                                  time_stdout));
>>
>> After this change the output of svn blame is different from before if
>> there is a very long author name.
>>
>> I have tested with svn compiled about a month ago (the version in
>> $PATH) and from a brand new (in ./subversion/svn). I have prepared a
>> repo with a file where all lines are authored by "dsg" and the
>> remaining by "averylongauthor" (15 characters, ASCII).
>>
>> This is my commit #2 by the long author:
>> [[[
>> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn proplist -v
>> --revprop -r2 ../wc/foo
>> Unversioned properties on revision 2:
>>   svn:author
>>     averylongauthor
>>   svn:date
>>     2026-05-20T11:52:35.534418Z
>>   svn:log
>>     Modify line 4
>> ]]]
>>
>> Blame before the change above:
>> [[[
>> dsg@devi-25-01:~/svn_trunk3$ svn blame ../wc/foo
>>      1        dsg 1
>>      1        dsg 2
>>      1        dsg 3
>>      2 averylonga Line 4
>>      1        dsg 5
>>      1        dsg 6
>>      1        dsg 7
>>      1        dsg 8
>>      1        dsg 9
>> ]]]
>> Author names are right adjusted but when overflowing, the first 10
>> characters are displayed.
>>
>> Blame after the change above:
>> [[[
>> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn blame ../wc/foo
>>      1        dsg 1
>>      1        dsg 2
>>      1        dsg 3
>>      2 longauthor Line 4
>>      1        dsg 5
>>      1        dsg 6
>>      1        dsg 7
>>      1        dsg 8
>>      1        dsg 9
>> ]]]
>> Author names are right adjusted but when overflowing, the last 10
>> characters are displayed.
>>
>> (I'm aware there are more instances of svn_stream_printf and I haven't
>> analysed exactly which one is involved here).
>>
>> I think we need to keep the precision in the formatting string and use
>> the _align_left version.
>>
>> Kind regards,
>> Daniel
>>
>
>
> Agreed, this is a very breaking/broken change. Changes that affect program
> output need to be discussed on list and tested. This comment caught my
> attention:
>


I've lost track a little bit of what this change was related to. What's the
motivation for changing the output format? (Not saying I agree or disagree,
just trying to get context.)


+ * Please note, there might be a little artifact when there is a wider
> + * character, then the string won't be perfectly aligned.
>
>
> If true, it implies that svn_utf8_width() or whatever the function is
> called isn't returning correct results.
>
> I can't find the discussion about this now but I'd just note that
> calculating the width of a Unicode string by only looking at individual
> code points is not correct. Therefore, pruning away individual code points
> without context in order to get a shorter string is not correct, either.
> Some Unicode glyphs can use up to 5 code points.
>


I also remember a discussion from several years back. It might be the same
one you're thinking of. AFK right now but I'll try to find it.

In fact, I'm also confused about the column width and truncation after 10
characters: I thought it starts with some column width and if a line is
encountered which has a longer user name that doesn't fit, then the column
width is increased for that line and all subsequent lines. (The rationale
was, it's ugly, but better to be accurate than pretty.) Has that changed
sometime in the last few years?


-- Brane
>
> Whoever sold us Unicode as a fixed-width encoding was running a pyramid
> scheme. 😏
>


I have bigger complaints about it than just the pyramid scheme :-)

Reply via email to