[PATCH] D128486: [pseudo] Add error-recovery framework & brace-based recovery

Haojian Wu via Phabricator via cfe-commits Tue, 05 Jul 2022 07:08:47 -0700

hokein accepted this revision.
hokein added a comment.
This revision is now accepted and ready to land.


Looks great, let's ship it! feel free to land it in any form you think it is 
suitable.



================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/GLR.h:150
+// OldHeads is the parse state at TokenIndex.
+// This function consumes consumes zero or more tokens by advancing TokenIndex,
+// and places any recovery states created in NewHeads.
----------------
nit: remove the duplicated `consumes`.


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h:148
+    RecoveryStrategy Strategy; // Heuristic choosing the tokens to match.
+    SymbolID Result;           // The symbol that is produced.
+  };
----------------
nit: mention the `Result` must be a nonterminal.


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:131
+  // Expected to be called by LR parsers.
+  llvm::ArrayRef<Action> getActions(StateID State, SymbolID Terminal) const;
   // Returns the state after we reduce a nonterminal.
----------------
nit: unrelated method?


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:248
+  // A given state has [RecoveryOffset[S], RecoveryOffset[S+1]).
+  std::vector<uint32_t> RecoveryOffset;
+  std::vector<Recovery> Recoveries;
----------------
sammccall wrote:
> hokein wrote:
> > I see the motivation of the `OffsetTable` structure now, this would come as 
> > a follow-up to simplify the `ReduceOffset` and `RecoveryOffset`, right?
> Yes. Though I'm on the fence about whether it's worth it with one case (it's 
> a bit awkward to generalize the building IIRC).
A motivating bit is that it is tricky to implement a correct `get` method (we 
both made the same out-of-bound issue). I'm fine with the current form, we can 
revisit it afterwards.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:178
+    const ForestNode &Placeholder =
+        Params.Forest.createOpaque(Option->Symbol, Option->Position);
+    const GSS::Node *NewHead = Params.GSStack.addNode(
----------------
should we worry about the case where we create a duplicated forest node? e.g. 
we have two best options and both recover to the same nonterminal.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:44
+
+void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
+                unsigned &TokenIndex, const TokenStream &Tokens,
----------------
sammccall wrote:
> hokein wrote:
> > The `GLR.cpp` file is growing significantly, I think the recovery part is 
> > large enough to be lived in a separate file `GLRRecovery.cpp`, but the 
> > declaration can still be in the `GLR.h`.
> This is interesting, recover/shift/reduce/parse are (vertically) 
> self-contained enough that it didn't seem like a big problem yet...
> 
> If the concern is file length, maybe we'd thather start with `reduce`; if 
> it's relatedness, `GSS`?
> 
> My line count is:
>  - recover: 156
>  - shift: 47
>  - reduce: 319
>  - parse: 87
>  - GSS: 66
Yeah, indeed these pieces can be split, my main concern is not file length -- I 
would prefer keeping shift/reduce/parse into a single file, as they form a 
complete GLR algorithm (splitting them would make it harder to read and follow).

Recovery seems like a different thing, I would image this part of code will 
grow more in the future
- the GLR algorithm has the core recovery mechanism framework with some 
fallback recovery strategies (eof, brackets)
- we have different recovery strategies implemented (some of them might be 
cxx-specific, probably be part of pseudoCXX library);




================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:68
+    // one arbitrarily.
+    std::vector<const ForestNode *> Path;
+  };
----------------
sammccall wrote:
> hokein wrote:
> > nit: maybe name it `Parses` or `PartialParses`. Path make me think this is 
> > a patch of GSS nodes.
> Renamed to DiscardedParse, does that work for you?
It is better than the original name. The `DiscardedParse` is a bit weird when 
we start to put it under the opaque node, in that sense, they are not 
discarded, IMO


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:83
+  //    expr := type ( . expr )   - (up) we should recover this outer expr
+  //    expr := . type ( expr )   - (up+left) we should not recover this type!
+  //
----------------
sammccall wrote:
> hokein wrote:
> > I think you're right -- I thought the first GSS node with a recovery state 
> > we encounter during the Walkup  state is the node we want. 
> > 
> > This example is a little confusing (it still matches our previous mental 
> > model), I didn't get it until we discussed offline. I think the following 
> > example is clearer 
> > 
> > ```
> > parsing the text `if (true) else ?`
> > 
> > IfStmt := if (stmt) else . stmt - which we're currently parsing
> > IfStmt := if (.stmt) else stmt  - (left) the first recovery GSS node, 
> > should not recover from this
> > IfStmt := . if (stmt) else stmt - (up), we should recover from this
> > ```
> > 
> > I also think it is worth to add it into the test.
> Fair enough, a couple of problems with that example though:
>  - dropping the "then" clause from the grammar of an if statement is 
> confusing (but adding it back in makes the example complicated)
>  - using a non-kernel item for the last fails to show the "up" edge clearly 
> (but there's not enough context to show a kernel item)
> 
> Came up with another one that hopefully works.
> 
> I didn't manage to write a reasonable test showing it - it basically can't 
> happen with bracket recovery. In order to demonstrate it we need `{` in the 
> "wrong" recovery construct to be paired with a `}` after the cursor... which 
> makes it look *very* much like a correct recovery.
> 
> Added a FIXME to add a testcase once we can define new recovery strategies 
> instead.
oops, your example makes more sense -- I didn't notice that I missed the 
if-body stmt.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:88
+  // For now, we have to take this into account when defining recovery rules.
+  // FIXME: find a more satisfying way to avoid such false recovery.
+  std::vector<const ForestNode *> Path;
----------------
sammccall wrote:
> hokein wrote:
> > I can't think of a better solution other than a search-based one (would 
> > like to think more about it).
> > 
> > Currently, we find all recovery options by traversing the whole GSS, and do 
> > a post-filter (based on the Start, and End). I might do both in the DFS 
> > (which will save some cost of traversing), the DFS will give us the best 
> > recovery options, and then we build the GSS node, and forest node.  But up 
> > to you.
> I'm not sure what you mean by a search-based one, can you elaborate?
> 
> > Currently, we find all recovery options by traversing the whole GSS, and do 
> > a post-filter (based on the Start, and End). I might do both in the DFS 
> > (which will save some cost of traversing)
> 
> This replaces a relatively simple traversal of GSS nodes (which there aren't 
> that many of) with running the recovery strategy more times - this is 
> (usually) a more complicated algorithm running over tokens, which is 
> (usually) a larger data set. It seems likely to be a bad trade 
> performance-wise.
> 
> In any case, performance isn't terribly critical here, and mixing the 
> discovery & evaluation of options  seems harder to read & debug (we lose the 
> nice documented data structures we can debug, predictable -debug dumping of 
> not-taken recovery options, etc).
> I'm not sure what you mean by a search-based one, can you elaborate?

The search-based one refers to the current one -- basically we perform a 
brute-force search for all available recoveries and get the best one. 

> In any case, performance isn't terribly critical here, and mixing the 
> discovery & evaluation of options seems harder to read & debug (we lose the 
> nice documented data structures we can debug, predictable -debug dumping of 
> not-taken recovery options, etc).

That's fair enough. 


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:130
+
+  assert(NewHeads.empty()); // We may repeatedly populate and clear it.
+  llvm::Optional<Token::Range> RecoveryRange;
----------------
sammccall wrote:
> hokein wrote:
> > nit: might be clearer to move the assertion to the beginning of the 
> > function.
> The purpose of the assertion is to make it obvious that NewHeads.clear() only 
> discards items added during the loop below.
> An assertion at the top of the function would be less useful for this, both 
> because it's not on the screen when reading NewHeads.clear() and because it's 
> much less locally obvious that nothing has been pushed onto NewHeads at some 
> prior point in the function.
Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed 
to `glrRecover` must be empty).

btw, the empty assertion is missing in the latest version.


================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:487
+
+// FIXME: Add a test for the spurious recovery mentioned in glrRecovery()
+// once we can extend the recovery strategies to do so.
----------------
nit: I'd probably move this to the comment mentioned in glrRecovery(), which is 
more discoverable.


================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:558
 
+TEST_F(GLRTest, RecoveryEndToEnd) {
+  // Simple example of brace-based recovery showing:
----------------
nit: not sure the intention having the `RecoveryEndToEnd` separated from the 
above recover-related tests, why not grouping them together?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128486/new/

https://reviews.llvm.org/D128486

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128486: [pseudo] Add error-recovery framework & brace-based recovery

Reply via email to