On Mon, May 25, 2026 at 9:04 PM Branko Čibej <[email protected]> wrote:

> I took another look at how 'svn blame' aligns its output.
>
>
> static svn_error_t *
> print_line_info(svn_stream_t *out,
>                 svn_revnum_t revision,
>                 const char *author,
>                 const char *date,
>                 const char *path,
>                 svn_boolean_t verbose,
>                 int rev_maxlength,
>                 apr_pool_t *pool)
> {
>   const char *time_utf8;
>   const char *time_stdout;
>   const char *rev_str;
>
>   rev_str = SVN_IS_VALID_REVNUM(revision)
>     ? apr_psprintf(pool, "%*ld", rev_maxlWith propeties oength, revision)
>     : apr_psprintf(pool, "%*s", rev_maxlength, "-");
>
>   if (verbose)
>     {
>       if (date)
>         {
>           SVN_ERR(svn_cl__time_cstring_to_human_cstring(&time_utf8,
>                                                         date, pool));
>           SVN_ERR(svn_cmdline_cstring_from_utf8(&time_stdout, time_utf8,
>                                                 pool));
>
>
> Converts timestamp to locale encoding ...
>
>         }
>       else
>         {
>           /* ### This is a 44 characters long string. It assumes the
> current
>              format of svn_time_to_human_cstring and also 3 letter
>              abbreviations for the month and weekday names.  Else, the
>              line contents will be misaligned. */
>           time_stdout = "                                           -";
>         }
>
>       SVN_ERR(svn_stream_printf(out, pool, "%s %10s %s ", rev_str,
>                                 author ? author : "         -",
>                                 time_stdout));
>
>
> But author remains in UTF-8? The author name is extracted from properties,
> I don't recall if we enforce UTF-8 in svn:author. I know that we do in
> svn:log.
>
>       if (path)
>         SVN_ERR(svn_stream_printf(out, pool, "%-14s ", path));
>
>
> And so does the path? The blame-receiver's docstring says nothing about
> that.
>
>     }
>   else
>     {
>       return svn_stream_printf(out, pool, "%s %10.10s ", rev_str,
>                                author ? author : "         -");
>     }
>
>   return SVN_NO_ERROR;
> }
>
>
> I guess most of the time, locale encoding is UTF-8 or some other Unicode
> format that's lossless. Otherwise I can't imagine how this could work
> correctly, in general.
>
> What am I missing?
>


I think all API should assume UTF-8 string (with certain exceptions like
let's say the svn_utf.h itself).

However, the problem is what it'd actually do. Since both the path and the
properties at the end are stored as binary blobs on the disk, they could
technically be anything. But I assume if the path wasn't UTF-8/ASCII - then
FSFS wouldn't parse them properly which would lead to a corrupted
repository.

On the other hand properties could be anything unless there are some
specific enforcements as for example you say we have for svn:log. But I'm
pretty sure it's safe to assume UTF-8 for them if they store text. If it
wasn't UTF-8, we'd have problems when printing it to the console anyways
because it converts encoding from UTF to locale encoding.

-- 
Timofei Zhakov

Reply via email to