Re: extending the blame callback

Julian Foad Mon, 14 Jan 2019 04:36:30 -0800

Stefan, thanks for taking account of the feedback and updating the doc string 
in r1851197.


I took a look and thought to rewrite the part about encoding and line splitting 
like this:

 * Character Encoding and Line Splitting:
 *
 * It is up to the client to determine the character encoding. The @a line
 * content is delivered without any encoding conversion. The line splitting
 * is designed to work with ASCII-compatible encodings including UTF-8. Any
 * of the byte sequences LF ("\n"), CR ("\n"), CR LF ("\r\n") ends a line
 * and is not included in @a line. The @a line content can include all other
 * byte values including zero (ASCII NUL).

I dropped the reference to svn_subst_stream_translated() because it wasn't much 
use without saying what parameters it is given, and instead I was able to say 
exactly what happens overall.

Problem 1: Using this blame function on a 16-bit character encoding is still 
really ugly: the receiver cannot know which byte sequences were stripped out.

We should address this issue properly by passing a "line splitter" function in 
to svn_client_blame6().

Problem 2: Then I noticed that where this splitting algorithm is used, it is in 
the second pass over the file data, where we associate each "struct blame" item 
with a line of text.

It looks like a different algorithm may be used in the first pass, when 
calculating the differences and constructing the blame list. It diffs all 
repository revisions of the file using svn_diff_file_diff_2() which splits 
lines according to the "ignore_eol_style" option from 
svn_client_blame6(diff_options.ignore_eol_style). The effect of 
"ignore_eol_style" is not entirely clear to me; it is used in 
svn_diff__normalize_buffer(). In addition, if svn_client_blame6() ends up 
processing a local WC revision of the file, that is processed through 
svn_client__get_normalized_stream(normalize_eols=(svn:eol-style == native)) 
before being diffed.

This is horrible. Surely we should use a consistent "line splitter" everywhere.

I would expect this means it is possible for blame output to go wrong, with 
revision numbers assigned to the wrong lines. I have been unable to construct 
such a wrong output, so far, after trying for half an hour or so. Possibly 
there is no code path that produces such a wrong output. But surely this is the 
sort of issue that leads to problems down the road.

Can we re-write this properly?

-- 
- Julian

Re: extending the blame callback

Reply via email to