Am 19.06.2018 um 18:35 schrieb Jeff King:
> On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:
>> The notable case that it does _not_ cover is matching the following
>> line:
>>
>>    a ... b
>>
>> with the following expression
>>
>>    git grep --column -e b --or -e a
>>
>> This will produce the column for 'b' rather than the column for 'a',
>> since we short-circuit an --or when the left child finds a match, in
>> this case 'b'. So, we break the semantics for this case, at the benefit
>> of not having to do twice the work.
>>
>> In the future, I'd like to revisit this, since any performance gains
>> that we _do_ make in this area are moot when we rescan all lines in
>> show_line() with --color. A path forward, I imagine, would look like a
>> list of regmatch_t's, or a set of locations in the expression tree, such
>> that we could either enumerate the list or walk the tree in order to
>> colorize the line. But, I think for now that is #leftoverbits.
> 
> The key thing about this iteration is that it doesn't regress
> performance, because we always short-circuit where we used to. The other
> obvious route is to stop short-circuiting only when "--column" is in
> effect, which would have the same property (at the expense of a little
> extra code in match_expr_eval()).

The performance impact of the exhaustive search for --color scales with
the number of shown lines, while it would scale with the total number of
lines for --column.  Coloring the results of highly selective patterns
is relatively cheap, short-circuiting them still helps significantly.

Disabling that optimization for --column wouldn't be a regression since
it's a new option..  Picking a random result (based on the order of
evaluation) seems sloppy and is probably going to surprise users.

We could add an optimizer pass to reduce the number of regular
expressions in certain cases if that is really too slow.  E.g. this:

        $ git grep -e b -e a

... is equivalent to:

        $ git grep -e '\(b\)\|\(a\)'

In that example the optimizer should use a single kwset instead of a
regex, but you get the idea, namely to leave the short-circuiting to the
regex engine or kwset, which probably do a good job of it.

René

Reply via email to