[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

David Goldman via Phabricator via cfe-commits Wed, 04 Aug 2021 09:52:11 -0700

dgoldman marked an inline comment as done.
dgoldman added inline comments.


================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:535
+/// by range.
+std::vector<DocumentSymbol> mergePragmas(std::vector<DocumentSymbol> &Syms,
+                                         std::vector<PragmaMarkSymbol> 
&Pragmas,
----------------
dgoldman wrote:
> kadircet wrote:
> > dgoldman wrote:
> > > kadircet wrote:
> > > > dgoldman wrote:
> > > > > kadircet wrote:
> > > > > > dgoldman wrote:
> > > > > > > kadircet wrote:
> > > > > > > > dgoldman wrote:
> > > > > > > > > sammccall wrote:
> > > > > > > > > > FWIW the flow control/how we make progress seem hard to 
> > > > > > > > > > follow here to me.
> > > > > > > > > > 
> > > > > > > > > > In particular I think I'm struggling with the statefulness 
> > > > > > > > > > of "is there an open mark group".
> > > > > > > > > > 
> > > > > > > > > > Possible simplifications:
> > > > > > > > > >  - define a dummy root symbol, which seems clearer than the 
> > > > > > > > > > vector<symbols> + range
> > > > > > > > > >  - avoid reverse-sorting the list of pragma symbols, and 
> > > > > > > > > > just consume from the front of an ArrayRef instead
> > > > > > > > > >  - make the outer loop over pragmas, rather than symbols. 
> > > > > > > > > > It would first check if the pragma belongs directly here or 
> > > > > > > > > > not, and if so, loop over symbols to work out which should 
> > > > > > > > > > become children. This seems very likely to be efficient 
> > > > > > > > > > enough in practice (few pragmas, or most children are 
> > > > > > > > > > grouped into pragmas)
> > > > > > > > > > define a dummy root symbol, which seems clearer than the 
> > > > > > > > > > vector<symbols> + range
> > > > > > > > > 
> > > > > > > > > I guess? Then we'd take in a `DocumentSymbol & and a 
> > > > > > > > > ArrayRef<PragmaMarkSymbol> & (or just by value and then 
> > > > > > > > > return it as well). The rest would be the same though
> > > > > > > > > 
> > > > > > > > > > In particular I think I'm struggling with the statefulness 
> > > > > > > > > > of "is there an open mark group".
> > > > > > > > > 
> > > > > > > > > We need to track the current open group if there is one in 
> > > > > > > > > order to move children to it.
> > > > > > > > > 
> > > > > > > > > > make the outer loop over pragmas, rather than symbols. It 
> > > > > > > > > > would first check if the pragma belongs directly here or 
> > > > > > > > > > not, and if so, loop over symbols to work out which should 
> > > > > > > > > > become children. This seems very likely to be efficient 
> > > > > > > > > > enough in practice (few pragmas, or most children are 
> > > > > > > > > > grouped into pragmas)
> > > > > > > > > 
> > > > > > > > > The important thing here is knowing where the pragma mark 
> > > > > > > > > ends - if it doesn't, it actually gets all of the children. 
> > > > > > > > > So we'd have to peak at the next pragma mark, add all symbols 
> > > > > > > > > before it to us as children, and then potentially recurse to 
> > > > > > > > > nest it inside of a symbol. I'll try it out and see if it's 
> > > > > > > > > simpler.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > ```
> > > > > > > > while(Pragmas) {
> > > > > > > > // We'll figure out where the Pragmas.front() should go.
> > > > > > > > Pragma P = Pragmas.front();
> > > > > > > > DocumentSymbol *Cur = Root;
> > > > > > > > while(Cur->contains(P)) {
> > > > > > > >   auto *OldCur = Cur;
> > > > > > > >   for(auto *C : Cur->children) {
> > > > > > > >      // We assume at most 1 child can contain the pragma (as 
> > > > > > > > pragmas are on a single line, and children have disjoint ranges)
> > > > > > > >      if (C->contains(P)) {
> > > > > > > >          Cur = C;
> > > > > > > >          break;
> > > > > > > >      }
> > > > > > > >   }
> > > > > > > >   // Cur is immediate parent of P
> > > > > > > >   if (OldCur == Cur) {
> > > > > > > >     // Just insert P into children if it is not a group and we 
> > > > > > > > are done.
> > > > > > > >     // Otherwise we need to figure out when current pragma is 
> > > > > > > > terminated:
> > > > > > > > // if next pragma is not contained in Cur, or is contained in 
> > > > > > > > one of the children, It is at the end of Cur, nest all the 
> > > > > > > > children that appear after P under the symbol node for P.
> > > > > > > > // Otherwise nest all the children that appear after P but 
> > > > > > > > before next pragma under the symbol node for P.
> > > > > > > > // Pop Pragmas and break
> > > > > > > >   }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > Does that make sense, i hope i am not missing something 
> > > > > > > > obvious? Complexity-wise in the worst case we'll go all the way 
> > > > > > > > down to a leaf once per pragma, since there will only be a 
> > > > > > > > handful of pragmas most of the time it shouldn't be too bad.
> > > > > > > I've implemented your suggestion. I don't think it's simpler, but 
> > > > > > > LMK, maybe it can be improved.
> > > > > > oops, i was looking into an older revision and missed 
> > > > > > mergepragmas2, i think it looks quite similar to this one but we 
> > > > > > can probably get rid of the recursion as well and simplify a couple 
> > > > > > more cases
> > > > > This makes sense,  I think that works for the most part besides 
> > > > > dropping the recursion, specifically for
> > > > > 
> > > > > ```
> > > > >       // Next pragma is contained in the Sym, it belongs there and 
> > > > > doesn't
> > > > >       // affect us at all.
> > > > >       if (Sym.range.contains(NextPragma.DocSym.range)) {
> > > > >         Sym.children = mergePragmas2(Sym.children, Pragmas, 
> > > > > Sym.range);
> > > > >         continue;
> > > > >       }
> > > > > ```
> > > > > 
> > > > > I guess we could explicitly forbid 3+ layers of nesting and handle it 
> > > > > inline there? But I'm not sure it's worth the effort to rewrite all 
> > > > > of this - the recursion shouldn't be deep and we avoid needing to 
> > > > > shift vector elements over by recreating a new one.
> > > > Sorry I don't follow why we can't get rid of the recursion in this case.
> > > > 
> > > > Two loop solution I described above literally tries to find the 
> > > > document symbol node, such that the current pragma is contained in that 
> > > > node && current pragma isn't contained in any of that node's children. 
> > > > Afterwards it inserts the pragma into that node and starts traversing 
> > > > the tree from root again for the next pragma.
> > > > 
> > > > Again I don't follow where the `3+ layers of nesting` constraint came 
> > > > from. But I do feel like the iterative version is somewhat easier to 
> > > > reason about (especially keeping track of what's happening with 
> > > > `pragmas.front()` and the way it bails out via `parentrange` check). 
> > > > Shifting of the vector is definitely unfortunate but I think it 
> > > > shouldn't imply big performance hits in practice as we are only 
> > > > shifting the children of a single node.
> > > Yeah, I understand the first part,  I think specifically handling the 
> > > group case after you discover where it needs to be inserted is a bit more 
> > > complicated, something like the following:
> > > 
> > > 
> > > ```
> > >         // Pragma is a group, so we need to figure out where it 
> > > terminates:
> > >         // - If the next Pragma is not contained in Cur, P owns all of its
> > >         //   parent's children which occur after P.
> > >         // - If the next pragma is contained in Cur but actually belongs 
> > > to one
> > >         //   of the parent's children, we temporarily skip over it and 
> > > look at
> > >         //   the next pragma to decide where we end.
> > >         // - Otherwise nest all of its parent's children which occur 
> > > after P but
> > >         //   before the next pragma.
> > > ```
> > > 
> > > And yeah, shifting in the worst case is definitely worst (due to repeat 
> > > shifting) although it shouldn't be too common in practice (things like a 
> > > large @implementation block would probably have the most children).
> > > we temporarily skip over it and look at the next pragma to decide where 
> > > we end.
> > 
> > Ah you are right I was missing this part. As you suggested we'll need to 
> > loop over all the remaining pragmas, which will make the complexity 
> > `((quadratic in terms of # of pragmas) * (depth of document symbol tree))` 
> > I think it is still acceptable in practice. I don't think there'll be more 
> > than a dozen pragmas in a file. (moreover i think this problem exists with 
> > the current implementation you've provided, see my comment below)
> > 
> > Are you fine with reshaping the implementation in that way then? Because I 
> > still feel like the code will end up looking a lot more straightforward, 
> > even if not the most optimal and we can try to optimize the rest if it 
> > shows up in the latency graphs.
> Implementing this now - there's one other edge case here, in that we don't 
> want to treat `#pragma mark Blah` as a group. I guess I can maintain state 
> similar to the first solution to solve that.
Wait nevermind, I don't think that's a problem since the pragmas should never 
overlap, just have some other issue in my code I need to figure out


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105904/new/

https://reviews.llvm.org/D105904

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

Reply via email to