Re: [PATCH v4 5/8] Start adding the infrastructure for handling TypoExprs.

David Blaikie Wed, 22 Oct 2014 14:25:35 -0700

On Tue, Oct 21, 2014 at 1:37 PM, Kaelyn Takata <[email protected]> wrote:


>
>
> On Fri, Oct 17, 2014 at 9:52 AM, David Blaikie <[email protected]> wrote:
>
>>
>>
>> On Thu, Oct 9, 2014 at 10:12 AM, David Blaikie <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Thu, Oct 9, 2014 at 9:58 AM, Kaelyn Takata <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Oct 7, 2014 at 5:03 PM, David Blaikie <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:29 PM, Kaelyn Takata <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 11:43 AM, David Blaikie <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 6, 2014 at 3:46 PM, Kaelyn Takata <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Fri, Oct 3, 2014 at 11:45 AM, David Blaikie <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Given that these are parts of a patch series, some of which have
>>>>>>>>> already been signed off on/I haven't looked at, it's a bit hard to 
>>>>>>>>> review
>>>>>>>>> the overall design - but I believe you & Richard talked that over, so 
>>>>>>>>> I'm
>>>>>>>>> not too worried about that (just explaining why my review feedback's 
>>>>>>>>> going
>>>>>>>>> to be a bit narrow)
>>>>>>>>>
>>>>>>>>> +  if (NumTypos > 0 && !ExprEvalContexts.empty())
>>>>>>>>> +    ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>>> +  else
>>>>>>>>> +    assert(NumTypos == 0 && "There are outstanding typos after 
>>>>>>>>> popping the "
>>>>>>>>> +                            "last 
>>>>>>>>> ExpressionEvaluationContextRecord");
>>>>>>>>>
>>>>>>>>> Could this be simplified? I think it's equivalent to:
>>>>>>>>>
>>>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>>>     ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>>>   else
>>>>>>>>>     assert(NumTypos == 0 ... );
>>>>>>>>>
>>>>>>>>> or:
>>>>>>>>>
>>>>>>>>>   assert(ExprEvalContexts.empty() || NumTypos != 0 && ...)
>>>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>>>     ...
>>>>>>>>>
>>>>>>>>> Yes, the "NumTypos > 0" was unnecessary and has been removed.
>>>>>>>> (FYI, the second form you suggested doesn't quite work because the 
>>>>>>>> assert
>>>>>>>> would fail on the common case where ExprEvalContexts is not empty and 
>>>>>>>> there
>>>>>>>> are no typos.)
>>>>>>>>
>>>>>>>
>>>>>>> Hmm, right, maybe:
>>>>>>>
>>>>>>>   assert(ExprEvalContexts.empty() ^ NumTypos != 0 && ... )
>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>     ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>
>>>>>>> but perhaps that's just too confusing. I just find an assertion as
>>>>>>> the only thing inside a block of code to be a bit off... but it's 
>>>>>>> certainly
>>>>>>> not unprecedented in LLVM and is probably more legible - just my own 
>>>>>>> hangup
>>>>>>> to get over :)
>>>>>>>
>>>>>>>
>>>>>>>> There's an unnecessary semicolon at the end of the "while (true)" loop 
>>>>>>>> in TransformTypos::Transform
>>>>>>>>>
>>>>>>>>> Removed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'd roll this up a bit:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +      auto &State = SemaRef.getTypoExprState(TE);
>>>>>>>>> +      if (State.DiagHandler)
>>>>>>>>> +        (*State.DiagHandler)(State.Consumer->getCurrentCorrection());
>>>>>>>>>
>>>>>>>>> into:
>>>>>>>>>
>>>>>>>>>   if (auto *Handler = SemaRef.getTypoExprState(TE).DiagHandler)
>>>>>>>>>     (*DiagHandler)(State.Consumer->getCurrentCorrection());
>>>>>>>>>
>>>>>>>>> That rollup doesn't work because State would be unknown when
>>>>>>>> accessing State.Consumer in the argument to the DiagHandler.
>>>>>>>>
>>>>>>>
>>>>>>> Oh, right!
>>>>>>>
>>>>>>>> count in the condition, plus insert in the body of the if in 
>>>>>>>> TransformTypos::TransformTypoExpr would be more efficient as a single 
>>>>>>>> insert-and-check (by avoiding two separate lookups into the set):
>>>>>>>>>   if (TypoExprs.insert(E).second) // means this element previously 
>>>>>>>>> didn't exist in the set, and now does
>>>>>>>>>
>>>>>>>>> I've changed the count-then-insert to just an insert since
>>>>>>>> SmallPtrSetImpl::insert returns true when the item wasn't already in 
>>>>>>>> the
>>>>>>>> set.
>>>>>>>>
>>>>>>>>> This:
>>>>>>>>>
>>>>>>>>>   TransformCache.find(E) != TransformCache.end()
>>>>>>>>>
>>>>>>>>> might commonly be written as:
>>>>>>>>>
>>>>>>>>>   TransformCache.count(E)
>>>>>>>>>
>>>>>>>>> Optionally with "!= 0", but often count is used directly as a boolean 
>>>>>>>>> test, especially when it's a single set/map (not a multi set/map).
>>>>>>>>>
>>>>>>>>> Changed.
>>>>>>>>
>>>>>>>>> This comment didn't quite make sense to me:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +      // If corrections for the first TypoExpr have been exhausted, 
>>>>>>>>> retry them
>>>>>>>>> +      // against the next combination of substitutions for all of 
>>>>>>>>> the other
>>>>>>>>> +      // TypoExprs.
>>>>>>>>>
>>>>>>>>> When you say "retry them" which them do you mean? The corrections for 
>>>>>>>>> the first TypoExpr? But what does it mean for them to be exhausted 
>>>>>>>>> then, if they can still be considered?
>>>>>>>>>
>>>>>>>>> Maybe I need a bit of help with the higher level algorithm here - I 
>>>>>>>>> assume the idea is:
>>>>>>>>>
>>>>>>>>> Call transform.
>>>>>>>>> As it visits TypoExpr nodes, it tries their first typo candidate 
>>>>>>>>> available.
>>>>>>>>> If we get to the end and have invalid code, reset the state of all 
>>>>>>>>> the typo corrections except the first one (on its next query, it'll 
>>>>>>>>> give back its second alternative).
>>>>>>>>>
>>>>>>>>> Here is where the confusion is at. As the transform visits each
>>>>>>>> TypoExpr node it tries the first candidate available. If the first
>>>>>>>> candidates of all the TypoExprs don't work, it then tries the next (and
>>>>>>>> subsequent) candidates of only the first TypoExpr that was seen. When 
>>>>>>>> it
>>>>>>>> exhausts the candidates from the first TypoExpr, it then resets the 
>>>>>>>> stream
>>>>>>>> of candidates for the first TypoExpr and fetches the next candidate of 
>>>>>>>> the
>>>>>>>> TypoExpr, continuing in this way until all combinations of candidates 
>>>>>>>> of
>>>>>>>> all TypoExprs have been tried.
>>>>>>>>
>>>>>>>
>>>>>>> So I'm still a bit confused about which parts of the retry logic are
>>>>>>> in which parts of the code.
>>>>>>>
>>>>>>> OK, I think I've got it. Transform is called and we don't even know
>>>>>>> which TypoExprs are in the Expr. So we transform.
>>>>>>>
>>>>>>> As transform runs it comes across TypoExprs, and attempts to
>>>>>>> transform them. For each one, it tries candidates until it finds a good 
>>>>>>> one
>>>>>>> (based on the immediate context - what are the cases where
>>>>>>> BuildDeclarationNameExpr fails here? (when does NE.isInvalid fail?)).
>>>>>>>
>>>>>>
>>>>>> I think there is a bit of confusion on what a "retry" is. Within
>>>>>> TransformTypoExpr, for either the first TypoExpr ever encountered by this
>>>>>> TreeTransform or for a TypoExpr without a cached substitution, it tries 
>>>>>> to
>>>>>> build a valid subexpression from the TypoCorrection candidates (e.g. a
>>>>>> DeclRefExpr) and continues trying candidates until it finds one for 
>>>>>> which a
>>>>>> valid subexpression can be formed, which it caches and returns. If none 
>>>>>> of
>>>>>> the candidates worked, it will cache and return an invalid ExprResult. 
>>>>>> The
>>>>>> TreeTransform then takes the returned subexpression and tries to rebuild
>>>>>> the full expression (possibly trying to transform multiple TypoExprs 
>>>>>> along
>>>>>> with CallExprs, etc).
>>>>>>
>>>>>> I don't remember off the top of my head without looking at the body,
>>>>>> but I think BuildDeclarationNameExpr will fail in cases where the
>>>>>> correction is e.g. a type name, or a member reference name--basically,
>>>>>> BuildDeclarationNameExpr returns an ExprResult and the correct behavior 
>>>>>> in
>>>>>> principle is to check for an invalid result and try the next correction
>>>>>> candidate when one is seen--even if the Consumer never returns a 
>>>>>> candidate
>>>>>> for which BuildDeclarationNameExpr would fail.
>>>>>>
>>>>>>>
>>>>>>> So, in your example below, maybe the right answer is BEH, but AF is
>>>>>>> valid, with no correct candidate for the 3 choice.
>>>>>>>
>>>>>>
>>>>>> No, AF with no correction for the 3rd TypoExpr would result in the
>>>>>> entire Expr result being invalid within the main Transform method.
>>>>>>
>>>>>>
>>>>>>> Consumers start at "before the first" element, calling
>>>>>>> "getNextCorrection" on a new consumer gives you the first correction.
>>>>>>>
>>>>>>
>>>>>> That is correct. And the "before the first" element is also an empty
>>>>>> correction for use with getCurrentCorrection. ;)
>>>>>>
>>>>>>>
>>>>>>> On the first call to TransformExpr, the first call to
>>>>>>> TransformTypoExpr chooses A, records it, continues.
>>>>>>> Gets to the second TransformTypoExpr, tries D and E, succeeds at F.
>>>>>>>
>>>>>>
>>>>>> it would only try D then E then F within a single call to
>>>>>> TransformTypoExpr if BuildDeclarationNameExpr failed for D and E.
>>>>>>
>>>>>>> On the third TransformTypoExpr, it tries H and I, both fail, so it
>>>>>>> returns from the last line ("return TransformCache[E] = ExprError]).
>>>>>>>
>>>>>>
>>>>>> Same case here--that last line would only be reached if H and I both
>>>>>> failed within the call to BuildDeclarationNameExpr.
>>>>>>
>>>>>>>
>>>>>>> We get back to Transform with a failure. So we walk the TypoExprs
>>>>>>> (in CheckAndAdvanceTypoExprCorrectionStreams) we reset the 3rd consumer
>>>>>>> (because it was finished).
>>>>>>>
>>>>>>
>>>>>> No, CheckAndAdvanceTypoExprCorrectionStreams would do nothing if the
>>>>>> first TypoExpr is not finished. If the first TypoExpr is finished, then 
>>>>>> it
>>>>>> would reset it and clear the cached result for the second TypoExpr (so 
>>>>>> that
>>>>>> the next call to TransformTypoExpr on the second TypoExpr would try a new
>>>>>> candidate instead of using the cached result). Only if both the first and
>>>>>> second TypoExprs were *both* finished would the cached value of the third
>>>>>> TypoExpr be cleared (and the candidate streams for both the first and
>>>>>> second TypoExprs would be reset in addition to their cached results being
>>>>>> removed from the cache). The only way
>>>>>> CheckAndAdvanceTypoExprCorrectionStreams would see if the third TypoExpr
>>>>>> was finished would be if both the first and second TypoExprs were also
>>>>>> finished, and upon resetting it the tree transform would only be 
>>>>>> attempted
>>>>>> again if there were at least four known TypoExprs (i.e. the number of 
>>>>>> known
>>>>>> TypoExprs is greater than the number of TypoExprs that were finished and
>>>>>> had to be reset by CheckAndAdvanceTypoExprCorrectionStreams).
>>>>>>
>>>>>>>
>>>>>>> We transform again:
>>>>>>>
>>>>>>> First typo) We are at A so we call next and get B
>>>>>>>
>>>>>>
>>>>>> Yes. The first TypoExpr would always have it's next candidate tried
>>>>>> by a call to TransformTypoExpr.
>>>>>>
>>>>>>
>>>>>>> Second typo) From F we continue and try G, fail, then we're finished
>>>>>>> for the second typo.
>>>>>>>
>>>>>>
>>>>>> No, G would only be tried if the result from transforming F was
>>>>>> removed from the cache.
>>>>>>
>>>>>>
>>>>>>> Third typo) Perhaps we find some solution for this, or not, it
>>>>>>> doesn't matter.
>>>>>>>
>>>>>>> We get back to Transform with a failure (second typo), So we walk
>>>>>>> the TypoExprs again, resetting the second (and maybe the third).
>>>>>>>
>>>>>>
>>>>>> If there are only three TypoExprs, then resetting the third TypoExpr
>>>>>> would mean there are no valid combinations of corrections that yield a
>>>>>> valid expression (where the final result of the transform is neither an
>>>>>> invalid ExprResult nor caused any errors to be reported by the 
>>>>>> SFINAETrap).
>>>>>>
>>>>>>
>>>>>
>>>>>>> First typo) We are at B so we call next and get to C, let's say we
>>>>>>> fail here - C isn't even remotely valid.
>>>>>>> Now it really matters whether we find/fail on the second and third
>>>>>>> typo. If we fail on the second and third... we're going to give up. 
>>>>>>> We'll
>>>>>>> find that all the corrections are finished and we'll conclude that we've
>>>>>>> tried everything.
>>>>>>>
>>>>>>> Am I following this right?
>>>>>>>
>>>>>>
>>>>>> Not quite. I think you are forgetting about the fact that, for all
>>>>>> TypoExprs except the first, a cached result is used if it exists instead 
>>>>>> of
>>>>>> fetching a new candidate from the Consumer and trying to build a valid 
>>>>>> Expr
>>>>>> with it.
>>>>>>
>>>>>
>>>>> Ah, looking at the code again I see what you mean. The 'break' is the
>>>>> bit that I missed in the reset/advance loop.
>>>>>
>>>>> Which patches would I have to apply to ToT to be able to exercise this
>>>>> code? It might help for me to step through some of these scenarios in a
>>>>> debugger.
>>>>>
>>>>
>>>> To exercise this code, you'd need the full patch series (all 8) to
>>>> exercise this code because it's not until patch #8 that anything actually
>>>> generates TypoExprs in the AST. I can send you a combined patch rebased on
>>>> ToT off-list if you'd like (it's ~112KB uncompressed), and the new test
>>>> cases in test/SemaCXX/typo-correction-delayed.cpp explicitly test this
>>>> functionality.
>>>>
>>>
>>> That'd be great!
>>>
>>
>> With your total patch applied I've been able to reproduce a few typo
>> correction scenarios - thanks!
>>
>> But I can't seem to reach/exercise the loop in
>> TransformTypos::TransformTypoExpr - if I change the "if (!NE.isInvalid())"
>> condition at the end of the loop (the only way to repeat the body of the
>> loop) to "assert(!NE.isInvalid())" it doesn't fire on any of the test cases
>> (nor on my own modified versions). Do you have some tests that cover this?
>> Could you show/describe a simple situation where it occurs so I can play
>> around with that part of the logic?
>>
>
> I'm not sure it can be triggered with CorrectTypoDelayed only hooked up to
> LookupMemberExprInRecord (a static function called by
> Sema::BuildMemberReferenceExpr directly and indirectly via
> LookupMemberExpr), but in the second patch set which hooks
> CorrectTypoDelayed up to DiagnoseEmptyLookup, the assert is triggered by
> the tests SemaCXX/type-definition-in-specifier.cpp
> and SemaCXX/warn-thread-safety-parsing.cpp (in addition to the loop being
> the correct/principled behavior since there is no guarantee that the
> recovery callback will generate a valid ExprResult from a non-empty
> TypoCorrection). For this patch set, to trigger the assert a code snippet
> would have to correct a class member lookup and have a candidate that is a
> member function, constant, variable, or template in the current class or
> one of its base classes (per the RecordMemberExprValidatorCCC) yet which
> yields causes MemberExprTypoRecovery to return an ExprError when trying to
> build the member reference. Basically, it would require having the typo
> correction try a candidate that is an accessible class member that cannot
> be reference--something that I'm not sure is even possible in C++.
>

Hrm, OK - could you change it to an assertion (& remove the outer
loop/replace it with a conditional) until that later patch where it becomes
necessary, so it'll be easier to review it in the context of it being
executed/used code?

Without that wrinkle, I think I've followed the code enough to sign off on
it. Please commit.

[I do find the way the algorithm visits the corrections to be a little hard
to follow, in that its split between the TypoExprs cache, consumer state,
TransformTypoExpr, and CheckAndAdvanceTypoExprCorrectionStreams - in some
very abstract theory I would like having each sequence of typo candidates
(the Consumers represent this) expose an iterator pair, then a generic
utility to do combinations of a list of ranges and it would expose an
iterator over those combinations which you could just increment to get the
next one - it'd separate out the combination logic into a reusable
component that could be easily examined/tested, then composed with the typo
correction stuff to keep the logic there relatively simple/clearer.

I might try to work up an example of what I mean]




>
>> - David
>>
>> (micro review comment I saw along the way: I /think/ 
>> CheckAndAdvanceTypoExprCorrectionStreams
>> could be simplified as follows:
>>
>
> You're right, it can be simplified now that the loop is in its own
> function. Changed.
>
>>
>>   bool CheckAndAdvanceTypoExprCorrectionStreams() {
>>     for (auto TE : TypoExprs) {
>>       TransformCache.erase(TE);
>>       auto &State = SemaRef.getTypoExprState(TE);
>>       if (!State.Consumer->finished())
>>         return true;
>>       State.Consumer->resetCorrectionStream();
>>     }
>>     return false;
>>   }
>>
>>
>>
>>>
>>>
>>>>
>>>>> (I'm sort of wondering if this might be better expressed by some
>>>>> iterator scheme, but I'm not sure - there's no pretty next_combination
>>>>> algorithm (someone proposed one to boost at some point, it seems, but
>>>>> that's about as close as it's got to anything common/generic) anyway)
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> So for 3 TypoExprs... let's say 1 has the candidates (A, B, C), 2
>>>>>>>> has the candidates (D, E, F, G) and 3 has the candidates (H, I) the
>>>>>>>> iterations would test the combinations in order:
>>>>>>>>
>>>>>>>> A D H
>>>>>>>> B D H
>>>>>>>>
>>>>>>>
>>>>>>> I suppose a simpler form of my above example is: if, on the first
>>>>>>> pass, we get ADH, on the next pass the code as-is would result in BCI,
>>>>>>> wouldn't it? It'll "next" each of the corrections.
>>>>>>>
>>>>>>
>>>>>> (I think you meant BEI, and C is only a candidate of the first
>>>>>> TypoExpr, not the second one.) No it won't because of the caching. The
>>>>>> second TypoExpr is only effectively "next"ed once all of the candidates 
>>>>>> for
>>>>>> the first TypoExpr have been tried with the current candidate of the 
>>>>>> second
>>>>>> TypoExpr, and the third TypoExpr would only be "next"ed if all of the
>>>>>> candidates of both the first and second TypoExpr have been tried with the
>>>>>> current candidate of the third TypoExpr.
>>>>>>
>>>>>>>
>>>>>>> And I'm still not quite understanding why there's the  E !=
>>>>>>> TypoExprs[0] condition... maybe that's part of what I'm missing. It 
>>>>>>> looks
>>>>>>> like that code would cause, in a single Transform, multiple instances of
>>>>>>> the first typo to be computed with different/subsequent typo results...
>>>>>>> which seems confusing/strange. (though I don't fully understand 
>>>>>>> how/where
>>>>>>> multiple instances of the same TypoExpr will be visited in a single
>>>>>>> Transform)
>>>>>>>
>>>>>>
>>>>>> That condition ensures that the cached result is never used for the
>>>>>> first TypoExpr (instead of having an extra conditional for all of the
>>>>>> return statements below here to not cache the result for the first
>>>>>> TypoExpr).
>>>>>>
>>>>>>>
>>>>>>> (some nitpicks of the code - there's a duplicate lookup of
>>>>>>> TransformCache (count then [], or count then insert further down the
>>>>>>> algorithm - it should probably just be:
>>>>>>>
>>>>>>> ExprResult &CacheEntry = TransformCache[E];
>>>>>>> if (CacheEntry)
>>>>>>>   return CacheEntry;
>>>>>>>
>>>>>>
>>>>>> This wouldn't work because "TransformCache[E]" would add a cache
>>>>>> entry for E, and either the "if (CacheEntry)" would always be true, or an
>>>>>> ExprError() would be considered false and an ExprError cache entry would
>>>>>> always be ignored, both of which would be bad.
>>>>>>
>>>>>
>>>>> you're right that ExprResult isn't boolean testable - but it is
>>>>> possible to distinguish ExprError from a default-constructed ExprResult. A
>>>>> default constructed ExprResult is in the valid-but-unset state, so far as 
>>>>> I
>>>>> can tell, so it would look like:
>>>>>
>>>>> if (!CacheEntry.isUnset())
>>>>>   return CacheEntry
>>>>>
>>>>> (which is, admittedly, not the /most/ legible thing ever)
>>>>>
>>>>
>>>> I'm worried this won't work, or at least would be (potentially)
>>>> fragile, because .isUnset() is defined as "valid-but-null" and Sema sets
>>>> valid-but-null as an explicit state in the return values of various
>>>> functions--at least a few of which I saw while looking at users of the typo
>>>> correction code.
>>>>
>>>
>>> Ooh... I could totally see that happening (& thanks for the explanation).
>>>
>>>
>>>>
>>>>>
>>>>>> Of course, a new local variable could be created containing the
>>>>>> result of TransformCache.find(E) and then compared to 
>>>>>> TransformCache.end()
>>>>>> and returning as appropriate, but I find it to be more verbose and less
>>>>>> clear:
>>>>>>
>>>>>> auto CacheEntry = TransformCache.find(E);
>>>>>> if (!TypoExprs.insert(E) && CacheEntry != TransformCache.end())
>>>>>>   return CacheEntry->second;
>>>>>>
>>>>>> And neither of the subsequent return statements would be changed
>>>>>> because the cache entry would still need to be created...
>>>>>>
>>>>>
>>>>> If you were using the iterator version, they could be hinted
>>>>> insertions which might be better - but I agree, if you're just using the
>>>>> iterators it's probably not terribly helpful. But as shown above, you can
>>>>> actually just use the op[] at the start, check !isUnset, then initialize
>>>>> the elements. This is a fairly common idiom in map related code, though
>>>>> more often it looks like
>>>>>
>>>>
>>>> After a bit more consideration, I've taken your suggested approach of
>>>> using .isUnset() and added an assertion on trying to cache an "unset" value
>>>> ("assert(!NE.isUnset() && ...);").
>>>>
>>>
>>> Sounds like a fair tradeoff - if the assertion does fire we can add a
>>> test case for it & switch back to something like your original formulation.
>>>
>>> The other gotcha to be aware of is that since it's a reference into a
>>> DenseMap are invalidated by insertions - so if any of the work to
>>> initialize the TransformCache (the while nextCorrection loop, and probably
>>> most importantly the "BuildDeclarationNameExpr" call) can also modify the
>>> TransformCache (by transforming another typo expr) then the reference will
>>> be invalidated and the approach I'd suggested won't be applicable (you
>>> could still keep just one lookup in the cache hit case, but in the cache
>>> miss you'd have to do a second lookup after the work)
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> ...
>>>>>>>     return CacheEntry = NE;
>>>>>>> }
>>>>>>> return CacheEntry = ExprError();
>>>>>>>
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> C D H
>>>>>>>> A E H
>>>>>>>> B E H
>>>>>>>> C E H
>>>>>>>> A F H
>>>>>>>> B F H
>>>>>>>> C F H
>>>>>>>> A G H
>>>>>>>> B G H
>>>>>>>> C G H
>>>>>>>> A D I
>>>>>>>> B D I
>>>>>>>> C D I
>>>>>>>> A E I
>>>>>>>> B E I
>>>>>>>> C E I
>>>>>>>> A F I
>>>>>>>> B F I
>>>>>>>> C F I
>>>>>>>> A G I
>>>>>>>> B G I
>>>>>>>> C G I
>>>>>>>>
>>>>>>>>> If we get to the end, have invalid code, and the first typo has no 
>>>>>>>>> alternatives left to try - I guess we just want to print errors for 
>>>>>>>>> everything, without typo corrections?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> If all of the combinations of typo corrections failed to create an
>>>>>>>> acceptable Expr, then the current correction for all of TypoExprs 
>>>>>>>> would be
>>>>>>>> an empty TypoCorrection which would cause the corresponding diagnostic
>>>>>>>> handlers to print out errors without suggestions.
>>>>>>>>
>>>>>>>>> It might be helpful to split out some of these chunks (the state 
>>>>>>>>> resetting code and maybe the diagnostic printing at the end) into 
>>>>>>>>> separate functions to give them logical names & make the higher level 
>>>>>>>>> algorithm more clear?
>>>>>>>>>
>>>>>>>>> I've split the diagnostics emission and the state handling into
>>>>>>>> separate helpers, and expanded their description comments. I'm not 
>>>>>>>> keen on
>>>>>>>> the name for the state handler but it is the best name I could come up 
>>>>>>>> with
>>>>>>>> that was even remotely descriptive.
>>>>>>>>
>>>>>>>
>>>>>>> Roughly works, though I'm still quite confused, it seems.
>>>>>>> (alternative name for that function would be "reset" rather than 
>>>>>>> "advance"
>>>>>>> - but once I understand the algorithm better I might have better
>>>>>>> suggestions)
>>>>>>>
>>>>>>
>>>>>> The function removes the cached result for the first TypoExpr then
>>>>>> checks the TypoExpr's correction candidate stream; if it has reached the
>>>>>> end of the stream, it will reset the stream then perform the same check 
>>>>>> for
>>>>>> the second TypoExpr, continuing until it either has reset the correction
>>>>>> streams for all of the TypoExprs or has found a TypoExpr that hadn't
>>>>>> reached the end of its correction stream. And in the former case, it 
>>>>>> would
>>>>>> signal that all combinations of corrections had been retried and the
>>>>>> TreeTransform should stop searching for a valid combination of 
>>>>>> corrections.
>>>>>> And now that I think about it, the "E != TypoExprs[0]" is probably now a
>>>>>> superfluous check since calling CheckAndAdvanceTypoExprCorrectionStreams
>>>>>> would always remove the first TypoExpr from the result cache.
>>>>>>
>>>>>> The loop at the end ("for (auto E : TypoExprs) {") is over a hashing 
>>>>>> data structure, right? Is that going to cause unstable output of 
>>>>>> diagnostics? (eg: different machines, different memory layout, the same 
>>>>>> program might get the typos printed in a different order) or is there 
>>>>>> something else that will fix that?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you have to change the set to be a SetVector, or similar (to 
>>>>>>>>> stabilize output) - then you won't need the FirstTypoExpr anymore - 
>>>>>>>>> it'll just be the first one in the set, since you iterate in 
>>>>>>>>> insertion order at that point.
>>>>>>>>>
>>>>>>>>> Switched it from a SmallPtrSet to a SmallSetVector like it should
>>>>>>>> have been. Doing so also allowed me to simplify the state handling 
>>>>>>>> code by
>>>>>>>> being able to assume a deterministic order, and eliminating 
>>>>>>>> FirstTypoExpr.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "while (TypoCorrection TC = State.Consumer->getNextCorrection()) {" 
>>>>>>>>> doesn't seem to be a loop - it has an unconditional return at the end 
>>>>>>>>> of the body. Should it just be an 'if'?
>>>>>>>>>
>>>>>>>>> There was actually a missing check on whether the result of
>>>>>>>> Sema::BuildDeclarationNameExpr was valid.
>>>>>>>>
>>>>>>>
>>>>>>> Were there missing/added test cases to cover this situation?
>>>>>>>
>>>>>>
>>>>>> No because at this point in the patch series, there is nothing that
>>>>>> actually generates TypoExprs.
>>>>>>
>>>>>>> "+ (FullExpr.get()->isTypeDependent() ||
>>>>>>>>>
>>>>>>>>> +       FullExpr.get()->isValueDependent() ||
>>>>>>>>> +       FullExpr.get()->isInstantiationDependent())) {"
>>>>>>>>>
>>>>>>>>> What are those checks for? I don't know what condition we would be in 
>>>>>>>>> if we had typo corrections but it wasn't type, value, or 
>>>>>>>>> instantiation dependent. Perhaps a comment could explain that?
>>>>>>>>>
>>>>>>>>> The check is to avoid running the tree transform on Expr trees
>>>>>>>> that are known to not have a TypoExpr in them, regardless of whether 
>>>>>>>> the
>>>>>>>> current expression evaluation context still indicates there are 
>>>>>>>> uncorrected
>>>>>>>> typos. I've added a comment to that effect.
>>>>>>>>
>>>>>>>
>>>>>>> Hmm - I suppose my question might then be: When would we have
>>>>>>> NumTypos > 0 but be in a situation where the full expression cannot have
>>>>>>> any typos in it?
>>>>>>>
>>>>>>
>>>>>> I don't remember off the top of my head because it's been a few
>>>>>> months now since I wrote the original patches... but basically any
>>>>>> situation where more than one FullExpr shares an expression evaluation
>>>>>> context or when the current expression evaluation context had its 
>>>>>> NumTypos
>>>>>> increased by an expression evaluation context that was popped off the
>>>>>> stack. And TreeTransforms aren't cheap enough to blindly run on every 
>>>>>> Expr
>>>>>> when NumTypos is non-zero.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> To simplify Sema::clearDelayedTypo you could add 
>>>>>>>> MapVector::erase(const KeyType&) (given that std::map has such an 
>>>>>>>> operation, it seems like a reasonable one for MapVector to have).
>>>>>>>>>
>>>>>>>>> Done (simple patch attached).
>>>>>>>>
>>>>>>>
>>>>>>> Great - if you'd like to commit this separately, it should have a
>>>>>>> unit test. Otherwise it's probably simple enough to commit along with 
>>>>>>> your
>>>>>>> new usage if you prefer.
>>>>>>>
>>>>>>
>>>>>> Committed with unit test as r219240.
>>>>>>
>>>>>>>
>>>>>>> Sema::getTypoExprState looks like it could be a const function (it
>>>>>>>>> looks like it's never meant to be called when the element isn't 
>>>>>>>>> already in
>>>>>>>>> the map) - using 'lookup' instead of operator[] will be a const 
>>>>>>>>> operation
>>>>>>>>> on the map (& assert if the element doesn't exist in the map).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sadly I cannot do that because the operator[] cannot be used if the
>>>>>>>> method is made const, and .lookup() can't be used because it returns
>>>>>>>> by-value instead of by-reference and TypoExprState is not copyable:
>>>>>>>>
>>>>>>>> ../tools/clang/lib/Sema/SemaLookup.cpp:4600:10: warning: returning
>>>>>>>> reference to local temporary object [-Wreturn-stack-address]
>>>>>>>>   return DelayedTypos.lookup(TE);
>>>>>>>>
>>>>>>>> ../include/llvm/ADT/MapVector.h:88:30: error: call to
>>>>>>>> implicitly-deleted copy constructor of 'const 
>>>>>>>> clang::Sema::TypoExprState'
>>>>>>>>     return Pos == Map.end()? ValueT() : Vector[Pos->second].second;
>>>>>>>>
>>>>>>>
>>>>>>> Oh, weird - my mistake in assuming/misremembering what lookup
>>>>>>> actually does. I'd still make getTypoExprState const and use find+assert
>>>>>>> just to constrain the interface a bit more rather than leaving it 
>>>>>>> possible
>>>>>>> to accidentally create a new entry in the map when the API is only 
>>>>>>> intended
>>>>>>> to query existing entries.
>>>>>>>
>>>>>>
>>>>>> Done. The new changes I've made overall are:
>>>>>>
>>>>>> diff --git a/include/clang/Sema/Sema.h b/include/clang/Sema/Sema.h
>>>>>> index b30bf5e..f644fdd 100644
>>>>>> --- a/include/clang/Sema/Sema.h
>>>>>> +++ b/include/clang/Sema/Sema.h
>>>>>> @@ -2628,7 +2628,7 @@ private:
>>>>>>                               bool ErrorRecovery, bool
>>>>>> &IsUnqualifiedLookup);
>>>>>>
>>>>>>  public:
>>>>>> -  const TypoExprState &getTypoExprState(TypoExpr *TE);
>>>>>> +  const TypoExprState &getTypoExprState(TypoExpr *TE) const;
>>>>>>
>>>>>>    /// \brief Clears the state of the given TypoExpr.
>>>>>>    void clearDelayedTypo(TypoExpr *TE);
>>>>>> diff --git a/lib/Sema/SemaExprCXX.cpp b/lib/Sema/SemaExprCXX.cpp
>>>>>> index 73aa997..1f08276 100644
>>>>>> --- a/lib/Sema/SemaExprCXX.cpp
>>>>>> +++ b/lib/Sema/SemaExprCXX.cpp
>>>>>> @@ -6028,7 +6028,7 @@ public:
>>>>>>      // If the TypoExpr hasn't been seen before, record it.
>>>>>> Otherwise, return the
>>>>>>      // cached transformation result if there is one and the TypoExpr
>>>>>> isn't the
>>>>>>      // first one that was encountered.
>>>>>> -    if (!TypoExprs.insert(E) && E != TypoExprs[0] &&
>>>>>> TransformCache.count(E)) {
>>>>>> +    if (!TypoExprs.insert(E) && TransformCache.count(E)) {
>>>>>>        return TransformCache[E];
>>>>>>      }
>>>>>>
>>>>>> diff --git a/lib/Sema/SemaLookup.cpp b/lib/Sema/SemaLookup.cpp
>>>>>> index 8ddb5fd..bb6c76a 100644
>>>>>> --- a/lib/Sema/SemaLookup.cpp
>>>>>> +++ b/lib/Sema/SemaLookup.cpp
>>>>>> @@ -4601,8 +4601,11 @@ TypoExpr
>>>>>> *Sema::createDelayedTypo(std::unique_ptr<TypoCorrectionConsumer> TCC,
>>>>>>    return TE;
>>>>>>  }
>>>>>>
>>>>>> -const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE) {
>>>>>> -  return DelayedTypos[TE];
>>>>>> +const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE)
>>>>>> const {
>>>>>> +  auto Entry = DelayedTypos.find(TE);
>>>>>> +  assert(Entry != DelayedTypos.end() &&
>>>>>> +         "Failed to get the state for a TypoExpr!");
>>>>>> +  return Entry->second;
>>>>>>  }
>>>>>>
>>>>>>  void Sema::clearDelayedTypo(TypoExpr *TE) {
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Otherwise this all looks like rather neat stuff - though I'm not
>>>>>>>>> as able to review in detail some of the refactoring of existing typo
>>>>>>>>> correction stuff. It looks plausible.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I've attached the updated patch. Thanks for reviewing this, David!
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>> On Thu, Sep 25, 2014 at 2:00 PM, Kaelyn Takata <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> ping
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 26, 2014 at 11:05 AM, Kaelyn Takata <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +dblaikie
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 31, 2014 at 1:20 PM, Kaelyn Takata <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Part of the infrastructure is a map from a TypoExpr to the
>>>>>>>>>>>> Sema-specific
>>>>>>>>>>>> state needed to correct it, along with helpers to ease dealing
>>>>>>>>>>>> with the
>>>>>>>>>>>> state.
>>>>>>>>>>>>
>>>>>>>>>>>> The the typo count is propagated up the stack of
>>>>>>>>>>>> ExpressionEvaluationContextRecords when one is popped off of to
>>>>>>>>>>>> avoid accidentally dropping TypoExprs on the floor. For example,
>>>>>>>>>>>> the attempted correction of g() in
>>>>>>>>>>>> test/CXX/class/class.mem/p5-0x.cpp
>>>>>>>>>>>> happens with an ExpressionEvaluationContextRecord that is
>>>>>>>>>>>> popped off
>>>>>>>>>>>> the stack prior to ActOnFinishFullExpr being called and the tree
>>>>>>>>>>>> transform for TypoExprs being run.
>>>>>>>>>>>> ---
>>>>>>>>>>>>  include/clang/Sema/Sema.h           |  44 +++++
>>>>>>>>>>>>  include/clang/Sema/SemaInternal.h   |  15 +-
>>>>>>>>>>>>  include/clang/Sema/TypoCorrection.h |   2 +-
>>>>>>>>>>>>  lib/Sema/SemaExpr.cpp               |   7 +
>>>>>>>>>>>>  lib/Sema/SemaExprCXX.cpp            | 108 ++++++++++++
>>>>>>>>>>>>  lib/Sema/SemaLookup.cpp             | 316
>>>>>>>>>>>> ++++++++++++++++++++++++------------
>>>>>>>>>>>>  6 files changed, 384 insertions(+), 108 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH v4 5/8] Start adding the infrastructure for handling TypoExprs.

Reply via email to