On Tue, May 31, 2016 at 3:45 AM, Eric Wong <[email protected]> wrote:
> Eric Sunshine <[email protected]> wrote:
>> On Mon, May 30, 2016 at 7:21 PM, Eric Wong <[email protected]> wrote:
>> > + if (pp->fmt == CMIT_FMT_MBOXRD &&
>> > + !regexec(mboxrd_from, line, 0, 0,
>> > 0))
>> > + strbuf_addch(sb, '>');
>>
>> At first glance, this seems dangerous since it's handing 'line' to
>> regexec() without paying attention to 'linelen'. For an arbitrary
>> regex, this could result in erroneous matches on subsequent "lines",
>> however, since this expression is anchored with '^', it's not a
>> problem. But, it is a bit subtle.
>
> Maybe having more context of the pp_remainder function and
> seeing the get_one_line call would've helped in the diff;
> I didn't think of this issue once I figured out where to
> place the change.
No, extra context wouldn't have helped. The problem is that
get_one_line() merely returns the length of the line, which might be
where the NUL-terminator is or it might be where the next newline is.
Therefore, you can't rely upon the "current line" being
NUL-terminated. So, in general, it's not "safe" to pass it to a
function expecting the "line" to end at a NUL-terminator.
> On the other hand, not being too familiar with git C APIs, I was
> more worried strbuf was not NUL-terminated for regexec, but it
> seems to be.
Yes, that's a guarantee, but it doesn't help in this case. Given
line="foo\nbar", get_one_line(line) will return 4, the length of
"foo\n", but regexec() won't know to stop looking until it hits the
NUL after the 'r'. An arbitrary regex, such as /bar/ will match beyond
what get_one_line() considers the end-of-line, which is why this code
looks scary (wrong) at first glance.
>> I wonder if hand-coding, rather than using a regex, could be an improvement:
>>
>> static int is_mboxrd_from(const char *s, size_t n)
>> {
>> size_t f = strlen("From ");
>> const char *t = s + n;
>>
>> while (s < t && *s == '>')
>> s++;
>> return t - s >= f && !memcmp(s, "From ", f);
>> }
>>
>> or something.
>
> Yikes. I mostly work in high-level languages and do my best to
> avoid string parsing in C; so that scares me. A lot.
The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
(I think) typical of how such a function would be coded in Git itself,
so it looks normal and easy to grok to me (but, of course, I'm
probably biased since I wrote it).
> I admit a regex isn't necessary, but I'm wondering if the above
> could be made less frightening to someone like me.
Perhaps, but it's difficult to say without knowing how it frightens you.
> Maybe extra test cases + valgrind can quell my fears :)
I can envision tests such as:
""
"F"
"From"
"From "
"From "
"From foobar"
and so on, if that's what you mean.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html