hokein accepted this revision. hokein added a comment. This revision is now accepted and ready to land.
Looks great, let's ship it! feel free to land it in any form you think it is suitable. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/GLR.h:150 +// OldHeads is the parse state at TokenIndex. +// This function consumes consumes zero or more tokens by advancing TokenIndex, +// and places any recovery states created in NewHeads. ---------------- nit: remove the duplicated `consumes`. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h:148 + RecoveryStrategy Strategy; // Heuristic choosing the tokens to match. + SymbolID Result; // The symbol that is produced. + }; ---------------- nit: mention the `Result` must be a nonterminal. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:131 + // Expected to be called by LR parsers. + llvm::ArrayRef<Action> getActions(StateID State, SymbolID Terminal) const; // Returns the state after we reduce a nonterminal. ---------------- nit: unrelated method? ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:248 + // A given state has [RecoveryOffset[S], RecoveryOffset[S+1]). + std::vector<uint32_t> RecoveryOffset; + std::vector<Recovery> Recoveries; ---------------- sammccall wrote: > hokein wrote: > > I see the motivation of the `OffsetTable` structure now, this would come as > > a follow-up to simplify the `ReduceOffset` and `RecoveryOffset`, right? > Yes. Though I'm on the fence about whether it's worth it with one case (it's > a bit awkward to generalize the building IIRC). A motivating bit is that it is tricky to implement a correct `get` method (we both made the same out-of-bound issue). I'm fine with the current form, we can revisit it afterwards. ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:178 + const ForestNode &Placeholder = + Params.Forest.createOpaque(Option->Symbol, Option->Position); + const GSS::Node *NewHead = Params.GSStack.addNode( ---------------- should we worry about the case where we create a duplicated forest node? e.g. we have two best options and both recover to the same nonterminal. ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:44 + +void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads, + unsigned &TokenIndex, const TokenStream &Tokens, ---------------- sammccall wrote: > hokein wrote: > > The `GLR.cpp` file is growing significantly, I think the recovery part is > > large enough to be lived in a separate file `GLRRecovery.cpp`, but the > > declaration can still be in the `GLR.h`. > This is interesting, recover/shift/reduce/parse are (vertically) > self-contained enough that it didn't seem like a big problem yet... > > If the concern is file length, maybe we'd thather start with `reduce`; if > it's relatedness, `GSS`? > > My line count is: > - recover: 156 > - shift: 47 > - reduce: 319 > - parse: 87 > - GSS: 66 Yeah, indeed these pieces can be split, my main concern is not file length -- I would prefer keeping shift/reduce/parse into a single file, as they form a complete GLR algorithm (splitting them would make it harder to read and follow). Recovery seems like a different thing, I would image this part of code will grow more in the future - the GLR algorithm has the core recovery mechanism framework with some fallback recovery strategies (eof, brackets) - we have different recovery strategies implemented (some of them might be cxx-specific, probably be part of pseudoCXX library); ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:68 + // one arbitrarily. + std::vector<const ForestNode *> Path; + }; ---------------- sammccall wrote: > hokein wrote: > > nit: maybe name it `Parses` or `PartialParses`. Path make me think this is > > a patch of GSS nodes. > Renamed to DiscardedParse, does that work for you? It is better than the original name. The `DiscardedParse` is a bit weird when we start to put it under the opaque node, in that sense, they are not discarded, IMO ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:83 + // expr := type ( . expr ) - (up) we should recover this outer expr + // expr := . type ( expr ) - (up+left) we should not recover this type! + // ---------------- sammccall wrote: > hokein wrote: > > I think you're right -- I thought the first GSS node with a recovery state > > we encounter during the Walkup state is the node we want. > > > > This example is a little confusing (it still matches our previous mental > > model), I didn't get it until we discussed offline. I think the following > > example is clearer > > > > ``` > > parsing the text `if (true) else ?` > > > > IfStmt := if (stmt) else . stmt - which we're currently parsing > > IfStmt := if (.stmt) else stmt - (left) the first recovery GSS node, > > should not recover from this > > IfStmt := . if (stmt) else stmt - (up), we should recover from this > > ``` > > > > I also think it is worth to add it into the test. > Fair enough, a couple of problems with that example though: > - dropping the "then" clause from the grammar of an if statement is > confusing (but adding it back in makes the example complicated) > - using a non-kernel item for the last fails to show the "up" edge clearly > (but there's not enough context to show a kernel item) > > Came up with another one that hopefully works. > > I didn't manage to write a reasonable test showing it - it basically can't > happen with bracket recovery. In order to demonstrate it we need `{` in the > "wrong" recovery construct to be paired with a `}` after the cursor... which > makes it look *very* much like a correct recovery. > > Added a FIXME to add a testcase once we can define new recovery strategies > instead. oops, your example makes more sense -- I didn't notice that I missed the if-body stmt. ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:88 + // For now, we have to take this into account when defining recovery rules. + // FIXME: find a more satisfying way to avoid such false recovery. + std::vector<const ForestNode *> Path; ---------------- sammccall wrote: > hokein wrote: > > I can't think of a better solution other than a search-based one (would > > like to think more about it). > > > > Currently, we find all recovery options by traversing the whole GSS, and do > > a post-filter (based on the Start, and End). I might do both in the DFS > > (which will save some cost of traversing), the DFS will give us the best > > recovery options, and then we build the GSS node, and forest node. But up > > to you. > I'm not sure what you mean by a search-based one, can you elaborate? > > > Currently, we find all recovery options by traversing the whole GSS, and do > > a post-filter (based on the Start, and End). I might do both in the DFS > > (which will save some cost of traversing) > > This replaces a relatively simple traversal of GSS nodes (which there aren't > that many of) with running the recovery strategy more times - this is > (usually) a more complicated algorithm running over tokens, which is > (usually) a larger data set. It seems likely to be a bad trade > performance-wise. > > In any case, performance isn't terribly critical here, and mixing the > discovery & evaluation of options seems harder to read & debug (we lose the > nice documented data structures we can debug, predictable -debug dumping of > not-taken recovery options, etc). > I'm not sure what you mean by a search-based one, can you elaborate? The search-based one refers to the current one -- basically we perform a brute-force search for all available recoveries and get the best one. > In any case, performance isn't terribly critical here, and mixing the > discovery & evaluation of options seems harder to read & debug (we lose the > nice documented data structures we can debug, predictable -debug dumping of > not-taken recovery options, etc). That's fair enough. ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:130 + + assert(NewHeads.empty()); // We may repeatedly populate and clear it. + llvm::Optional<Token::Range> RecoveryRange; ---------------- sammccall wrote: > hokein wrote: > > nit: might be clearer to move the assertion to the beginning of the > > function. > The purpose of the assertion is to make it obvious that NewHeads.clear() only > discards items added during the loop below. > An assertion at the top of the function would be less useful for this, both > because it's not on the screen when reading NewHeads.clear() and because it's > much less locally obvious that nothing has been pushed onto NewHeads at some > prior point in the function. Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed to `glrRecover` must be empty). btw, the empty assertion is missing in the latest version. ================ Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:487 + +// FIXME: Add a test for the spurious recovery mentioned in glrRecovery() +// once we can extend the recovery strategies to do so. ---------------- nit: I'd probably move this to the comment mentioned in glrRecovery(), which is more discoverable. ================ Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:558 +TEST_F(GLRTest, RecoveryEndToEnd) { + // Simple example of brace-based recovery showing: ---------------- nit: not sure the intention having the `RecoveryEndToEnd` separated from the above recover-related tests, why not grouping them together? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128486/new/ https://reviews.llvm.org/D128486 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits