On Tue, Oct 7, 2014 at 2:29 PM, Kaelyn Takata <[email protected]> wrote:
> > > On Tue, Oct 7, 2014 at 11:43 AM, David Blaikie <[email protected]> wrote: > >> >> >> On Mon, Oct 6, 2014 at 3:46 PM, Kaelyn Takata <[email protected]> wrote: >> >>> On Fri, Oct 3, 2014 at 11:45 AM, David Blaikie <[email protected]> >>> wrote: >>> >>>> Given that these are parts of a patch series, some of which have >>>> already been signed off on/I haven't looked at, it's a bit hard to review >>>> the overall design - but I believe you & Richard talked that over, so I'm >>>> not too worried about that (just explaining why my review feedback's going >>>> to be a bit narrow) >>>> >>>> + if (NumTypos > 0 && !ExprEvalContexts.empty()) >>>> + ExprEvalContexts.back().NumTypos += NumTypos; >>>> + else >>>> + assert(NumTypos == 0 && "There are outstanding typos after popping >>>> the " >>>> + "last ExpressionEvaluationContextRecord"); >>>> >>>> Could this be simplified? I think it's equivalent to: >>>> >>>> if (!ExprEvalContexts.empty()) >>>> ExprEvalContexts.back().NumTypos += NumTypos; >>>> else >>>> assert(NumTypos == 0 ... ); >>>> >>>> or: >>>> >>>> assert(ExprEvalContexts.empty() || NumTypos != 0 && ...) >>>> if (!ExprEvalContexts.empty()) >>>> ... >>>> >>>> Yes, the "NumTypos > 0" was unnecessary and has been removed. (FYI, the >>> second form you suggested doesn't quite work because the assert would fail >>> on the common case where ExprEvalContexts is not empty and there are no >>> typos.) >>> >> >> Hmm, right, maybe: >> >> assert(ExprEvalContexts.empty() ^ NumTypos != 0 && ... ) >> if (!ExprEvalContexts.empty()) >> ExprEvalContexts.back().NumTypos += NumTypos; >> >> but perhaps that's just too confusing. I just find an assertion as the >> only thing inside a block of code to be a bit off... but it's certainly not >> unprecedented in LLVM and is probably more legible - just my own hangup to >> get over :) >> >> >>> There's an unnecessary semicolon at the end of the "while (true)" loop in >>> TransformTypos::Transform >>>> >>>> Removed. >>> >>>> >>>> >>>> >>>> I'd roll this up a bit: >>>> >>>> >>>> + auto &State = SemaRef.getTypoExprState(TE); >>>> + if (State.DiagHandler) >>>> + (*State.DiagHandler)(State.Consumer->getCurrentCorrection()); >>>> >>>> into: >>>> >>>> if (auto *Handler = SemaRef.getTypoExprState(TE).DiagHandler) >>>> (*DiagHandler)(State.Consumer->getCurrentCorrection()); >>>> >>>> That rollup doesn't work because State would be unknown when accessing >>> State.Consumer in the argument to the DiagHandler. >>> >> >> Oh, right! >> >>> count in the condition, plus insert in the body of the if in >>> TransformTypos::TransformTypoExpr would be more efficient as a single >>> insert-and-check (by avoiding two separate lookups into the set): >>>> if (TypoExprs.insert(E).second) // means this element previously didn't >>>> exist in the set, and now does >>>> >>>> I've changed the count-then-insert to just an insert since >>> SmallPtrSetImpl::insert returns true when the item wasn't already in the >>> set. >>> >>>> This: >>>> >>>> TransformCache.find(E) != TransformCache.end() >>>> >>>> might commonly be written as: >>>> >>>> TransformCache.count(E) >>>> >>>> Optionally with "!= 0", but often count is used directly as a boolean >>>> test, especially when it's a single set/map (not a multi set/map). >>>> >>>> Changed. >>> >>>> This comment didn't quite make sense to me: >>>> >>>> >>>> + // If corrections for the first TypoExpr have been exhausted, retry >>>> them >>>> + // against the next combination of substitutions for all of the >>>> other >>>> + // TypoExprs. >>>> >>>> When you say "retry them" which them do you mean? The corrections for the >>>> first TypoExpr? But what does it mean for them to be exhausted then, if >>>> they can still be considered? >>>> >>>> Maybe I need a bit of help with the higher level algorithm here - I assume >>>> the idea is: >>>> >>>> Call transform. >>>> As it visits TypoExpr nodes, it tries their first typo candidate available. >>>> If we get to the end and have invalid code, reset the state of all the >>>> typo corrections except the first one (on its next query, it'll give back >>>> its second alternative). >>>> >>>> Here is where the confusion is at. As the transform visits each >>> TypoExpr node it tries the first candidate available. If the first >>> candidates of all the TypoExprs don't work, it then tries the next (and >>> subsequent) candidates of only the first TypoExpr that was seen. When it >>> exhausts the candidates from the first TypoExpr, it then resets the stream >>> of candidates for the first TypoExpr and fetches the next candidate of the >>> TypoExpr, continuing in this way until all combinations of candidates of >>> all TypoExprs have been tried. >>> >> >> So I'm still a bit confused about which parts of the retry logic are in >> which parts of the code. >> >> OK, I think I've got it. Transform is called and we don't even know which >> TypoExprs are in the Expr. So we transform. >> >> As transform runs it comes across TypoExprs, and attempts to transform >> them. For each one, it tries candidates until it finds a good one (based on >> the immediate context - what are the cases where BuildDeclarationNameExpr >> fails here? (when does NE.isInvalid fail?)). >> > > I think there is a bit of confusion on what a "retry" is. Within > TransformTypoExpr, for either the first TypoExpr ever encountered by this > TreeTransform or for a TypoExpr without a cached substitution, it tries to > build a valid subexpression from the TypoCorrection candidates (e.g. a > DeclRefExpr) and continues trying candidates until it finds one for which a > valid subexpression can be formed, which it caches and returns. If none of > the candidates worked, it will cache and return an invalid ExprResult. The > TreeTransform then takes the returned subexpression and tries to rebuild > the full expression (possibly trying to transform multiple TypoExprs along > with CallExprs, etc). > > I don't remember off the top of my head without looking at the body, but I > think BuildDeclarationNameExpr will fail in cases where the correction is > e.g. a type name, or a member reference name--basically, > BuildDeclarationNameExpr returns an ExprResult and the correct behavior in > principle is to check for an invalid result and try the next correction > candidate when one is seen--even if the Consumer never returns a candidate > for which BuildDeclarationNameExpr would fail. > >> >> So, in your example below, maybe the right answer is BEH, but AF is >> valid, with no correct candidate for the 3 choice. >> > > No, AF with no correction for the 3rd TypoExpr would result in the entire > Expr result being invalid within the main Transform method. > > >> Consumers start at "before the first" element, calling >> "getNextCorrection" on a new consumer gives you the first correction. >> > > That is correct. And the "before the first" element is also an empty > correction for use with getCurrentCorrection. ;) > >> >> On the first call to TransformExpr, the first call to TransformTypoExpr >> chooses A, records it, continues. >> Gets to the second TransformTypoExpr, tries D and E, succeeds at F. >> > > it would only try D then E then F within a single call to > TransformTypoExpr if BuildDeclarationNameExpr failed for D and E. > >> On the third TransformTypoExpr, it tries H and I, both fail, so it >> returns from the last line ("return TransformCache[E] = ExprError]). >> > > Same case here--that last line would only be reached if H and I both > failed within the call to BuildDeclarationNameExpr. > >> >> We get back to Transform with a failure. So we walk the TypoExprs (in >> CheckAndAdvanceTypoExprCorrectionStreams) we reset the 3rd consumer >> (because it was finished). >> > > No, CheckAndAdvanceTypoExprCorrectionStreams would do nothing if the first > TypoExpr is not finished. If the first TypoExpr is finished, then it would > reset it and clear the cached result for the second TypoExpr (so that the > next call to TransformTypoExpr on the second TypoExpr would try a new > candidate instead of using the cached result). Only if both the first and > second TypoExprs were *both* finished would the cached value of the third > TypoExpr be cleared (and the candidate streams for both the first and > second TypoExprs would be reset in addition to their cached results being > removed from the cache). The only way > CheckAndAdvanceTypoExprCorrectionStreams would see if the third TypoExpr > was finished would be if both the first and second TypoExprs were also > finished, and upon resetting it the tree transform would only be attempted > again if there were at least four known TypoExprs (i.e. the number of known > TypoExprs is greater than the number of TypoExprs that were finished and > had to be reset by CheckAndAdvanceTypoExprCorrectionStreams). > >> >> We transform again: >> >> First typo) We are at A so we call next and get B >> > > Yes. The first TypoExpr would always have it's next candidate tried by a > call to TransformTypoExpr. > > >> Second typo) From F we continue and try G, fail, then we're finished for >> the second typo. >> > > No, G would only be tried if the result from transforming F was removed > from the cache. > > >> Third typo) Perhaps we find some solution for this, or not, it doesn't >> matter. >> >> We get back to Transform with a failure (second typo), So we walk the >> TypoExprs again, resetting the second (and maybe the third). >> > > If there are only three TypoExprs, then resetting the third TypoExpr would > mean there are no valid combinations of corrections that yield a valid > expression (where the final result of the transform is neither an invalid > ExprResult nor caused any errors to be reported by the SFINAETrap). > >> First typo) We are at B so we call next and get to C, let's say we fail >> here - C isn't even remotely valid. >> Now it really matters whether we find/fail on the second and third typo. >> If we fail on the second and third... we're going to give up. We'll find >> that all the corrections are finished and we'll conclude that we've tried >> everything. >> >> Am I following this right? >> > > Not quite. I think you are forgetting about the fact that, for all > TypoExprs except the first, a cached result is used if it exists instead of > fetching a new candidate from the Consumer and trying to build a valid Expr > with it. > Ah, looking at the code again I see what you mean. The 'break' is the bit that I missed in the reset/advance loop. Which patches would I have to apply to ToT to be able to exercise this code? It might help for me to step through some of these scenarios in a debugger. (I'm sort of wondering if this might be better expressed by some iterator scheme, but I'm not sure - there's no pretty next_combination algorithm (someone proposed one to boost at some point, it seems, but that's about as close as it's got to anything common/generic) anyway) > >> >>> So for 3 TypoExprs... let's say 1 has the candidates (A, B, C), 2 has >>> the candidates (D, E, F, G) and 3 has the candidates (H, I) the iterations >>> would test the combinations in order: >>> >>> A D H >>> B D H >>> >> >> I suppose a simpler form of my above example is: if, on the first pass, >> we get ADH, on the next pass the code as-is would result in BCI, wouldn't >> it? It'll "next" each of the corrections. >> > > (I think you meant BEI, and C is only a candidate of the first TypoExpr, > not the second one.) No it won't because of the caching. The second > TypoExpr is only effectively "next"ed once all of the candidates for the > first TypoExpr have been tried with the current candidate of the second > TypoExpr, and the third TypoExpr would only be "next"ed if all of the > candidates of both the first and second TypoExpr have been tried with the > current candidate of the third TypoExpr. > >> >> And I'm still not quite understanding why there's the E != TypoExprs[0] >> condition... maybe that's part of what I'm missing. It looks like that code >> would cause, in a single Transform, multiple instances of the first typo to >> be computed with different/subsequent typo results... which seems >> confusing/strange. (though I don't fully understand how/where multiple >> instances of the same TypoExpr will be visited in a single Transform) >> > > That condition ensures that the cached result is never used for the first > TypoExpr (instead of having an extra conditional for all of the return > statements below here to not cache the result for the first TypoExpr). > >> >> (some nitpicks of the code - there's a duplicate lookup of TransformCache >> (count then [], or count then insert further down the algorithm - it should >> probably just be: >> >> ExprResult &CacheEntry = TransformCache[E]; >> if (CacheEntry) >> return CacheEntry; >> > > This wouldn't work because "TransformCache[E]" would add a cache entry for > E, and either the "if (CacheEntry)" would always be true, or an ExprError() > would be considered false and an ExprError cache entry would always be > ignored, both of which would be bad. > you're right that ExprResult isn't boolean testable - but it is possible to distinguish ExprError from a default-constructed ExprResult. A default constructed ExprResult is in the valid-but-unset state, so far as I can tell, so it would look like: if (!CacheEntry.isUnset()) return CacheEntry (which is, admittedly, not the /most/ legible thing ever) > Of course, a new local variable could be created containing the result of > TransformCache.find(E) and then compared to TransformCache.end() and > returning as appropriate, but I find it to be more verbose and less clear: > > auto CacheEntry = TransformCache.find(E); > if (!TypoExprs.insert(E) && CacheEntry != TransformCache.end()) > return CacheEntry->second; > > And neither of the subsequent return statements would be changed because > the cache entry would still need to be created... > If you were using the iterator version, they could be hinted insertions which might be better - but I agree, if you're just using the iterators it's probably not terribly helpful. But as shown above, you can actually just use the op[] at the start, check !isUnset, then initialize the elements. This is a fairly common idiom in map related code, though more often it looks like > > ... >> return CacheEntry = NE; >> } >> return CacheEntry = ExprError(); >> >> ) >> >> >> >>> C D H >>> A E H >>> B E H >>> C E H >>> A F H >>> B F H >>> C F H >>> A G H >>> B G H >>> C G H >>> A D I >>> B D I >>> C D I >>> A E I >>> B E I >>> C E I >>> A F I >>> B F I >>> C F I >>> A G I >>> B G I >>> C G I >>> >>>> If we get to the end, have invalid code, and the first typo has no >>>> alternatives left to try - I guess we just want to print errors for >>>> everything, without typo corrections? >>>> >>>> >>> If all of the combinations of typo corrections failed to create an >>> acceptable Expr, then the current correction for all of TypoExprs would be >>> an empty TypoCorrection which would cause the corresponding diagnostic >>> handlers to print out errors without suggestions. >>> >>>> It might be helpful to split out some of these chunks (the state resetting >>>> code and maybe the diagnostic printing at the end) into separate functions >>>> to give them logical names & make the higher level algorithm more clear? >>>> >>>> I've split the diagnostics emission and the state handling into >>> separate helpers, and expanded their description comments. I'm not keen on >>> the name for the state handler but it is the best name I could come up with >>> that was even remotely descriptive. >>> >> >> Roughly works, though I'm still quite confused, it seems. (alternative >> name for that function would be "reset" rather than "advance" - but once I >> understand the algorithm better I might have better suggestions) >> > > The function removes the cached result for the first TypoExpr then checks > the TypoExpr's correction candidate stream; if it has reached the end of > the stream, it will reset the stream then perform the same check for the > second TypoExpr, continuing until it either has reset the correction > streams for all of the TypoExprs or has found a TypoExpr that hadn't > reached the end of its correction stream. And in the former case, it would > signal that all combinations of corrections had been retried and the > TreeTransform should stop searching for a valid combination of corrections. > And now that I think about it, the "E != TypoExprs[0]" is probably now a > superfluous check since calling CheckAndAdvanceTypoExprCorrectionStreams > would always remove the first TypoExpr from the result cache. > > The loop at the end ("for (auto E : TypoExprs) {") is over a hashing data > structure, right? Is that going to cause unstable output of diagnostics? (eg: > different machines, different memory layout, the same program might get the > typos printed in a different order) or is there something else that will fix > that? >>>> >>>> >>>> If you have to change the set to be a SetVector, or similar (to stabilize >>>> output) - then you won't need the FirstTypoExpr anymore - it'll just be >>>> the first one in the set, since you iterate in insertion order at that >>>> point. >>>> >>>> Switched it from a SmallPtrSet to a SmallSetVector like it should have >>> been. Doing so also allowed me to simplify the state handling code by being >>> able to assume a deterministic order, and eliminating FirstTypoExpr. >>> >>>> >>>> >>>> "while (TypoCorrection TC = State.Consumer->getNextCorrection()) {" >>>> doesn't seem to be a loop - it has an unconditional return at the end of >>>> the body. Should it just be an 'if'? >>>> >>>> There was actually a missing check on whether the result of >>> Sema::BuildDeclarationNameExpr was valid. >>> >> >> Were there missing/added test cases to cover this situation? >> > > No because at this point in the patch series, there is nothing that > actually generates TypoExprs. > >> "+ (FullExpr.get()->isTypeDependent() || >>>> >>>> + FullExpr.get()->isValueDependent() || >>>> + FullExpr.get()->isInstantiationDependent())) {" >>>> >>>> What are those checks for? I don't know what condition we would be in if >>>> we had typo corrections but it wasn't type, value, or instantiation >>>> dependent. Perhaps a comment could explain that? >>>> >>>> The check is to avoid running the tree transform on Expr trees that are >>> known to not have a TypoExpr in them, regardless of whether the current >>> expression evaluation context still indicates there are uncorrected typos. >>> I've added a comment to that effect. >>> >> >> Hmm - I suppose my question might then be: When would we have NumTypos > >> 0 but be in a situation where the full expression cannot have any typos in >> it? >> > > I don't remember off the top of my head because it's been a few months now > since I wrote the original patches... but basically any situation where > more than one FullExpr shares an expression evaluation context or when the > current expression evaluation context had its NumTypos increased by an > expression evaluation context that was popped off the stack. And > TreeTransforms aren't cheap enough to blindly run on every Expr when > NumTypos is non-zero. > >> >> >>> To simplify Sema::clearDelayedTypo you could add MapVector::erase(const >>> KeyType&) (given that std::map has such an operation, it seems like a >>> reasonable one for MapVector to have). >>>> >>>> Done (simple patch attached). >>> >> >> Great - if you'd like to commit this separately, it should have a unit >> test. Otherwise it's probably simple enough to commit along with your new >> usage if you prefer. >> > > Committed with unit test as r219240. > >> >> Sema::getTypoExprState looks like it could be a const function (it looks >>>> like it's never meant to be called when the element isn't already in the >>>> map) - using 'lookup' instead of operator[] will be a const operation on >>>> the map (& assert if the element doesn't exist in the map). >>>> >>> >>> Sadly I cannot do that because the operator[] cannot be used if the >>> method is made const, and .lookup() can't be used because it returns >>> by-value instead of by-reference and TypoExprState is not copyable: >>> >>> ../tools/clang/lib/Sema/SemaLookup.cpp:4600:10: warning: returning >>> reference to local temporary object [-Wreturn-stack-address] >>> return DelayedTypos.lookup(TE); >>> >>> ../include/llvm/ADT/MapVector.h:88:30: error: call to implicitly-deleted >>> copy constructor of 'const clang::Sema::TypoExprState' >>> return Pos == Map.end()? ValueT() : Vector[Pos->second].second; >>> >> >> Oh, weird - my mistake in assuming/misremembering what lookup actually >> does. I'd still make getTypoExprState const and use find+assert just to >> constrain the interface a bit more rather than leaving it possible to >> accidentally create a new entry in the map when the API is only intended to >> query existing entries. >> > > Done. The new changes I've made overall are: > > diff --git a/include/clang/Sema/Sema.h b/include/clang/Sema/Sema.h > index b30bf5e..f644fdd 100644 > --- a/include/clang/Sema/Sema.h > +++ b/include/clang/Sema/Sema.h > @@ -2628,7 +2628,7 @@ private: > bool ErrorRecovery, bool > &IsUnqualifiedLookup); > > public: > - const TypoExprState &getTypoExprState(TypoExpr *TE); > + const TypoExprState &getTypoExprState(TypoExpr *TE) const; > > /// \brief Clears the state of the given TypoExpr. > void clearDelayedTypo(TypoExpr *TE); > diff --git a/lib/Sema/SemaExprCXX.cpp b/lib/Sema/SemaExprCXX.cpp > index 73aa997..1f08276 100644 > --- a/lib/Sema/SemaExprCXX.cpp > +++ b/lib/Sema/SemaExprCXX.cpp > @@ -6028,7 +6028,7 @@ public: > // If the TypoExpr hasn't been seen before, record it. Otherwise, > return the > // cached transformation result if there is one and the TypoExpr > isn't the > // first one that was encountered. > - if (!TypoExprs.insert(E) && E != TypoExprs[0] && > TransformCache.count(E)) { > + if (!TypoExprs.insert(E) && TransformCache.count(E)) { > return TransformCache[E]; > } > > diff --git a/lib/Sema/SemaLookup.cpp b/lib/Sema/SemaLookup.cpp > index 8ddb5fd..bb6c76a 100644 > --- a/lib/Sema/SemaLookup.cpp > +++ b/lib/Sema/SemaLookup.cpp > @@ -4601,8 +4601,11 @@ TypoExpr > *Sema::createDelayedTypo(std::unique_ptr<TypoCorrectionConsumer> TCC, > return TE; > } > > -const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE) { > - return DelayedTypos[TE]; > +const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE) const { > + auto Entry = DelayedTypos.find(TE); > + assert(Entry != DelayedTypos.end() && > + "Failed to get the state for a TypoExpr!"); > + return Entry->second; > } > > void Sema::clearDelayedTypo(TypoExpr *TE) { > > >> >>> >>> >>>> Otherwise this all looks like rather neat stuff - though I'm not as >>>> able to review in detail some of the refactoring of existing typo >>>> correction stuff. It looks plausible. >>>> >>> >>> I've attached the updated patch. Thanks for reviewing this, David! >>> >>> >>>> >>> >>>> >>> On Thu, Sep 25, 2014 at 2:00 PM, Kaelyn Takata <[email protected]> wrote: >>>> >>>>> ping >>>>> >>>>> On Tue, Aug 26, 2014 at 11:05 AM, Kaelyn Takata <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> +dblaikie >>>>>> >>>>>> >>>>>> On Thu, Jul 31, 2014 at 1:20 PM, Kaelyn Takata <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Part of the infrastructure is a map from a TypoExpr to the >>>>>>> Sema-specific >>>>>>> state needed to correct it, along with helpers to ease dealing with >>>>>>> the >>>>>>> state. >>>>>>> >>>>>>> The the typo count is propagated up the stack of >>>>>>> ExpressionEvaluationContextRecords when one is popped off of to >>>>>>> avoid accidentally dropping TypoExprs on the floor. For example, >>>>>>> the attempted correction of g() in test/CXX/class/class.mem/p5-0x.cpp >>>>>>> happens with an ExpressionEvaluationContextRecord that is popped off >>>>>>> the stack prior to ActOnFinishFullExpr being called and the tree >>>>>>> transform for TypoExprs being run. >>>>>>> --- >>>>>>> include/clang/Sema/Sema.h | 44 +++++ >>>>>>> include/clang/Sema/SemaInternal.h | 15 +- >>>>>>> include/clang/Sema/TypoCorrection.h | 2 +- >>>>>>> lib/Sema/SemaExpr.cpp | 7 + >>>>>>> lib/Sema/SemaExprCXX.cpp | 108 ++++++++++++ >>>>>>> lib/Sema/SemaLookup.cpp | 316 >>>>>>> ++++++++++++++++++++++++------------ >>>>>>> 6 files changed, 384 insertions(+), 108 deletions(-) >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
