sammccall added inline comments.
================ Comment at: clangd/FuzzyMatch.cpp:96 return None; return ScoreScale * std::min(PerfectBonus * PatN, std::max<int>(0, Best)); } ---------------- MaskRay wrote: > sammccall wrote: > > MaskRay wrote: > > > I also don't understand why it clamps the value to zero here. Negative > > > values are also meaningful to me. Given that perfectBonus is only 3 it is > > > very easy to get a negative value here. > > An important part of the contract of `match()` is that it returns a value > > in `[0,1]`. > > We rely on this range to combine this with other scoring signals - we > > multiply this by a quality signal in code completion. > > (Currently the quality signal is just derived from Sema, but the global > > index will provide the number of uses). > > > > It would be possible to use a different squash function here, but I found > > max(kFloor,x) worked well for the examples I looked at - anything <= some > > floor value was "not really a useful match at all", and most of the > > variance below the floor seemed to be noise to me. > > (Then I tuned the bonuses/penalties so the floor was at zero) > We could try other criteria in the future. I believe the current one can be > improved because negative scores may be returned but the scoring shouldn't > return 0 for all the cases. Sure, we can try other things, and to gather more data. (To be clear though - with the data I *did* look at, including the scores <0 did not add more information, only noise) ================ Comment at: clangd/FuzzyMatch.cpp:230 void FuzzyMatcher::buildGraph() { + Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss}; for (int W = 0; W < WordN; ++W) { ---------------- MaskRay wrote: > sammccall wrote: > > why this change? > > this has also been moved from the cheaper constructor to the more expensive > > per-match call. (also the diagonal assignment added in the next loop) > > > > Also, shouldn't [0][0][Match] be AwfulScore? > > > "The more expensive per-match call" is just two value assignments. > > I have removed the expensive table initialization in the constructor. > > [0][0][*] can be any value. > "The more expensive per-match call" is just two value assignments. Oops, sorry - by "more expensive" I mean "called thousands of times more often". > I have removed the expensive table initialization in the constructor. I don't want to be rude, but I asked why you changed this, and you didn't answer. Unless there's a strong reason, I'd prefer to revert this change, as I find this harder to reason about. (Roughly: in the old version of the code, any data that didn't need to change for the life of the object was initialized in the constructor. That way I didn't need to worry what was performance-critical and what wasn't - match() only did what was strictly necessary). > [0][0][*] can be any value. Can you please explain why? ================ Comment at: clangd/FuzzyMatch.cpp:325 + int W = PatN; + for (int I = PatN; ++I <= WordN; ) + if (Scores[PatN][I][Match].Score > Scores[PatN][W][Match].Score) ---------------- MaskRay wrote: > sammccall wrote: > > nit: I -> P, move increment to the increment expression of the for loop? > I -> P. > > > move increment to the increment expression of the for loop? > > Not sure about the coding standard here, but if you insist I'll have to > change it as you are the reviewer. If the loop variable was an iterator, `for > (It I = std::next(...); I != E; ++I)` would be uglier than `for (It I = ...; > ++I != E; )` Uglier is subjective, but side-effects in the condition of a for-loop is sufficiently unusual and surprising that I'd prefer to avoid it in both cases. ================ Comment at: clangd/FuzzyMatch.cpp:340 + A[WordN] = Miss; + for (int I = WordN, P = PatN; I > 0; I--) { + if (I == W) ---------------- MaskRay wrote: > sammccall wrote: > > sammccall wrote: > > > sammccall wrote: > > > > W is the right name in this file for a variable iterating over word > > > > indices, please don't change this. > > > > The new variable above could be EndW or so? > > > As far as I can see, this loop is setting `A[W+1:...] = Miss` and > > > populating `A[0...W]` with the exsting logic. > > > I think this would be clearer as two loops, currently there's a lot of > > > conditionals around Last that obscure what's actually happening. > > You've shifted P (and the old W, new I) by 1. This does reduce the number > > of +1 and -1 in this function, but it's inconsistent with how these are > > used elsewhere: P should be the index into Pat of the character that we're > > considering. > I don't understand the rationale not to use the shifted indices. The code > actually use `Scores[P][W][*]` to mean the optimal match of the first `P` > characters of the pattern with the first `W` characters of the word, not the > position of the character. > > On the other hand, C++ reverse iterators use the shifted one for `for (I = > rend(); I != rbegin(); ++I)`. The shifted one makes ending condition check > easier. > I don't understand the rationale not to use the shifted indices The rationale is entirely consistency with the surrounding code. The consistency helps avoid off-by-one errors when similar loops have different conventions. In this file, when looping over word or pattern dimensions, P and W respectively are used for loop variables, and can be interpreted as indices into Pat/Word. Here the interpretation would be "did we match or miss character Word[W]" ================ Comment at: clangd/FuzzyMatch.cpp:93 + for (int I = PatN; I <= WordN; I++) + Best = std::max(Best, Scores[PatN][I][Match].Score); if (isAwful(Best)) ---------------- MaskRay wrote: > sammccall wrote: > > MaskRay wrote: > > > sammccall wrote: > > > > this looks like a behavior change - why? > > > This is a behavior change. Instead of choosing between `Match/Miss` in > > > the last position, we enumerate the last matching position in `Word`. > > > > > > This saves `if (P < PatN - 1) {` check in the main loop at the cost of a > > > for loop here (use sites of ending values) > > Ah, I see - the case where we match only part of the word is handled up > > here now. > > (I think you mean this is not a behavior change? The result is the same > > AFAICS) > > > > That does make more sense, but it's pretty subtle. > > Can you add a comment like > > `// The pattern doesn't have to match the whole word (but the whole > > pattern must match).` > Added > ``` > // Find the optimal prefix of Word to match Pattern. > ``` > > I meant this is a behavior change but it makes the first row and the rest > rows of the score table more consistent. That comment really doesn't capture what's significant about this line - it's the *policy*, rather than the mechanism, that needs highlighting here. (Re: behavior change - I *think* there's no inputs for which we produce a different match result/score because of this patch, right?) ================ Comment at: clangd/FuzzyMatch.h:61 bool allowMatch(int P, int W) const; - int skipPenalty(int W, Action Last) const; - int matchBonus(int P, int W, Action Last) const; + int missScore(int W, Action Last) const; + int matchScore(int P, int W, Action Last) const; ---------------- MaskRay wrote: > MaskRay wrote: > > sammccall wrote: > > > FWIW, I don't think this is an improvement - I think the clarity of > > > purpose in names is more useful than having consistent signs in this case. > > Keep `matchBonus` but rename `skipPenalty` to `missPenalty` ? > Also note in the original scheme, the skip score does not need to be > negative. Because Scores[PatN][WordN][] is used and each path takes the same > number of positions (WordN). We can add an offset to all positional scores to > make all of them non-negative. In the new scheme it does make sense to make > them negative, as each path has now different number of positions. missPenalty SGTM. (I don't see any particular reason to avoid negative numbers here - it has a natural interpretation: a positive increment means the match is better than if that action wasn't taken, negative means it's worse, etc) Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D44720 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits