[PATCH] D88299: [clang-format] Add MacroUnexpander.

2023-08-05 Thread Owen Pan via Phabricator via cfe-commits
owenpan added a comment.
Herald added reviewers: rymiel, HazardyKnusperkeks, owenpan, MyDeveloperDay.

@klimek can you have a look at 
https://github.com/llvm/llvm-project/issues/64275?




Comment at: clang/lib/Format/MacroCallReconstructor.cpp:223
+  }
+  assert(!ActiveExpansions.empty());
+  if (ActiveExpansions.back().SpelledI != ActiveExpansions.back().SpelledE) {

This fails as reported in https://github.com/llvm/llvm-project/issues/64275.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-09-21 Thread Owen Pan via Phabricator via cfe-commits
owenpan added a comment.

@klimek np!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-09-21 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In D88299#3804910 , @owenpan wrote:

> @sstwcw thanks for pointing it out. See D134329 
> .

Thanks Owen, really appreciate this! And sorry for not getting to it yet myself 
:(


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-09-21 Thread Owen Pan via Phabricator via cfe-commits
owenpan added subscribers: MyDeveloperDay, owenpan.
owenpan added a comment.

@sstwcw thanks for pointing it out. See D134329 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-28 Thread sstwcw via Phabricator via cfe-commits
sstwcw added a comment.

It looks like some of the braces in the code should be removed for example 
those surrounding one-line for bodies.  Sorry if it is not my job to point this 
out, but MyDeveloperDay has not said anything.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-27 Thread Björn Schäpers via Phabricator via cfe-commits
HazardyKnusperkeks added a comment.

A bit surprising such a big change...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-18 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In D88299#3660779 , @nridge wrote:

> Thanks! (I was intrigued by Sam's "solves a whole class of clang-format's 
> biggest problems" comment :-))

The end-result hopefully will :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-18 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added a comment.

Thanks! (I was intrigued by Sam's "solves a whole class of clang-format's 
biggest problems" comment :-))


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-18 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In D88299#3660772 , @nridge wrote:

> Does this patch change the formatting behaviour of clang-format?
>
> If so, are there any test cases that show before/after formatting? The 
> MacroCallReconstructorTest unit test seems like it's testing an internal 
> interface.

No, this is prep-work for the real change, which I'm planning to send out soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-18 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added a comment.

Does this patch change the formatting behaviour of clang-format?

If so, are there any test cases that show before/after formatting? The 
MacroCallReconstructorTest unit test seems like it's testing an internal 
interface.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-12 Thread Manuel Klimek via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd6d0dc1f4537: [clang-format] Add MacroUnexpander. (authored 
by klimek).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

Files:
  clang/lib/Format/CMakeLists.txt
  clang/lib/Format/FormatToken.h
  clang/lib/Format/MacroCallReconstructor.cpp
  clang/lib/Format/Macros.h
  clang/lib/Format/UnwrappedLineParser.h
  clang/unittests/Format/CMakeLists.txt
  clang/unittests/Format/MacroCallReconstructorTest.cpp

Index: clang/unittests/Format/MacroCallReconstructorTest.cpp
===
--- /dev/null
+++ clang/unittests/Format/MacroCallReconstructorTest.cpp
@@ -0,0 +1,688 @@
+#include "../../lib/Format/Macros.h"
+#include "../../lib/Format/UnwrappedLineParser.h"
+#include "TestLexer.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include 
+#include 
+#include 
+
+namespace clang {
+namespace format {
+namespace {
+
+using UnexpandedMap =
+llvm::DenseMap>;
+
+// Keeps track of a sequence of macro expansions.
+//
+// The expanded tokens are accessible via getTokens(), while a map of macro call
+// identifier token to unexpanded token stream is accessible via
+// getUnexpanded().
+class Expansion {
+public:
+  Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}
+
+  // Appends the token stream obtained from expanding the macro Name given
+  // the provided arguments, to be later retrieved with getTokens().
+  // Returns the list of tokens making up the unexpanded macro call.
+  TokenList
+  expand(llvm::StringRef Name,
+ const SmallVector, 1> &Args) {
+auto *ID = Lex.id(Name);
+auto UnexpandedLine = std::make_unique();
+UnexpandedLine->Tokens.push_back(ID);
+if (!Args.empty()) {
+  UnexpandedLine->Tokens.push_back(Lex.id("("));
+  for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
+if (I != Args.begin())
+  UnexpandedLine->Tokens.push_back(Lex.id(","));
+UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
+  I->end());
+  }
+  UnexpandedLine->Tokens.push_back(Lex.id(")"));
+}
+Unexpanded[ID] = std::move(UnexpandedLine);
+
+auto Expanded = uneof(Macros.expand(ID, Args));
+Tokens.append(Expanded.begin(), Expanded.end());
+
+TokenList UnexpandedTokens;
+for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
+  UnexpandedTokens.push_back(Node.Tok);
+}
+return UnexpandedTokens;
+  }
+
+  TokenList expand(llvm::StringRef Name,
+   const std::vector &Args = {}) {
+return expand(Name, lexArgs(Args));
+  }
+
+  const UnexpandedMap &getUnexpanded() const { return Unexpanded; }
+
+  const TokenList &getTokens() const { return Tokens; }
+
+private:
+  llvm::SmallVector
+  lexArgs(const std::vector &Args) {
+llvm::SmallVector Result;
+for (const auto &Arg : Args) {
+  Result.push_back(uneof(Lex.lex(Arg)));
+}
+return Result;
+  }
+  llvm::DenseMap> Unexpanded;
+  llvm::SmallVector Tokens;
+  TestLexer &Lex;
+  MacroExpander &Macros;
+};
+
+struct Chunk {
+  Chunk(llvm::ArrayRef Tokens)
+  : Tokens(Tokens.begin(), Tokens.end()) {}
+  Chunk(llvm::ArrayRef Children)
+  : Children(Children.begin(), Children.end()) {}
+  llvm::SmallVector Tokens;
+  llvm::SmallVector Children;
+};
+
+bool tokenMatches(const FormatToken *Left, const FormatToken *Right) {
+  if (Left->getType() == Right->getType() &&
+  Left->TokenText == Right->TokenText)
+return true;
+  llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
+  return false;
+}
+
+// Allows to produce chunks of a token list by typing the code of equal tokens.
+//
+// Created from a list of tokens, users call "consume" to get the next chunk
+// of tokens, checking that they match the written code.
+struct Matcher {
+  Matcher(const TokenList &Tokens, TestLexer &Lex)
+  : Tokens(Tokens), It(this->Tokens.begin()), Lex(Lex) {}
+
+  Chunk consume(StringRef Tokens) {
+TokenList Result;
+for (const FormatToken *Token : uneof(Lex.lex(Tokens))) {
+  assert(tokenMatches(*It, Token));
+  Result.push_back(*It);
+  ++It;
+}
+return Chunk(Result);
+  }
+
+  TokenList Tokens;
+  TokenList::iterator It;
+  TestLexer &Lex;
+};
+
+UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
+  const UnexpandedMap &M2) {
+  UnexpandedMap Result;
+  for (const auto &KV : M1) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  for (const auto &KV : M2) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  return Result;
+}
+
+class MacroCallReconstructorTest : publi

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-11 Thread Manuel Klimek via Phabricator via cfe-commits
klimek updated this revision to Diff 443729.
klimek added a comment.

Address review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

Files:
  clang/lib/Format/CMakeLists.txt
  clang/lib/Format/FormatToken.h
  clang/lib/Format/MacroCallReconstructor.cpp
  clang/lib/Format/Macros.h
  clang/lib/Format/UnwrappedLineParser.h
  clang/unittests/Format/CMakeLists.txt
  clang/unittests/Format/MacroCallReconstructorTest.cpp

Index: clang/unittests/Format/MacroCallReconstructorTest.cpp
===
--- /dev/null
+++ clang/unittests/Format/MacroCallReconstructorTest.cpp
@@ -0,0 +1,688 @@
+#include "../../lib/Format/Macros.h"
+#include "../../lib/Format/UnwrappedLineParser.h"
+#include "TestLexer.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include 
+#include 
+#include 
+
+namespace clang {
+namespace format {
+namespace {
+
+using UnexpandedMap =
+llvm::DenseMap>;
+
+// Keeps track of a sequence of macro expansions.
+//
+// The expanded tokens are accessible via getTokens(), while a map of macro call
+// identifier token to unexpanded token stream is accessible via
+// getUnexpanded().
+class Expansion {
+public:
+  Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}
+
+  // Appends the token stream obtained from expanding the macro Name given
+  // the provided arguments, to be later retrieved with getTokens().
+  // Returns the list of tokens making up the unexpanded macro call.
+  TokenList
+  expand(llvm::StringRef Name,
+ const SmallVector, 1> &Args) {
+auto *ID = Lex.id(Name);
+auto UnexpandedLine = std::make_unique();
+UnexpandedLine->Tokens.push_back(ID);
+if (!Args.empty()) {
+  UnexpandedLine->Tokens.push_back(Lex.id("("));
+  for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
+if (I != Args.begin())
+  UnexpandedLine->Tokens.push_back(Lex.id(","));
+UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
+  I->end());
+  }
+  UnexpandedLine->Tokens.push_back(Lex.id(")"));
+}
+Unexpanded[ID] = std::move(UnexpandedLine);
+
+auto Expanded = uneof(Macros.expand(ID, Args));
+Tokens.append(Expanded.begin(), Expanded.end());
+
+TokenList UnexpandedTokens;
+for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
+  UnexpandedTokens.push_back(Node.Tok);
+}
+return UnexpandedTokens;
+  }
+
+  TokenList expand(llvm::StringRef Name,
+   const std::vector &Args = {}) {
+return expand(Name, lexArgs(Args));
+  }
+
+  const UnexpandedMap &getUnexpanded() const { return Unexpanded; }
+
+  const TokenList &getTokens() const { return Tokens; }
+
+private:
+  llvm::SmallVector
+  lexArgs(const std::vector &Args) {
+llvm::SmallVector Result;
+for (const auto &Arg : Args) {
+  Result.push_back(uneof(Lex.lex(Arg)));
+}
+return Result;
+  }
+  llvm::DenseMap> Unexpanded;
+  llvm::SmallVector Tokens;
+  TestLexer &Lex;
+  MacroExpander &Macros;
+};
+
+struct Chunk {
+  Chunk(llvm::ArrayRef Tokens)
+  : Tokens(Tokens.begin(), Tokens.end()) {}
+  Chunk(llvm::ArrayRef Children)
+  : Children(Children.begin(), Children.end()) {}
+  llvm::SmallVector Tokens;
+  llvm::SmallVector Children;
+};
+
+bool tokenMatches(const FormatToken *Left, const FormatToken *Right) {
+  if (Left->getType() == Right->getType() &&
+  Left->TokenText == Right->TokenText)
+return true;
+  llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
+  return false;
+}
+
+// Allows to produce chunks of a token list by typing the code of equal tokens.
+//
+// Created from a list of tokens, users call "consume" to get the next chunk
+// of tokens, checking that they match the written code.
+struct Matcher {
+  Matcher(const TokenList &Tokens, TestLexer &Lex)
+  : Tokens(Tokens), It(this->Tokens.begin()), Lex(Lex) {}
+
+  Chunk consume(StringRef Tokens) {
+TokenList Result;
+for (const FormatToken *Token : uneof(Lex.lex(Tokens))) {
+  assert(tokenMatches(*It, Token));
+  Result.push_back(*It);
+  ++It;
+}
+return Chunk(Result);
+  }
+
+  TokenList Tokens;
+  TokenList::iterator It;
+  TestLexer &Lex;
+};
+
+UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
+  const UnexpandedMap &M2) {
+  UnexpandedMap Result;
+  for (const auto &KV : M1) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  for (const auto &KV : M2) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  return Result;
+}
+
+class MacroCallReconstructorTest : public ::testing::Test {
+public:
+  MacroCallReconstructorTest() : Lex(Allocator, Buffers) {}
+
+  std::unique_ptr
+  createExpand

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-11 Thread Manuel Klimek via Phabricator via cfe-commits
klimek marked 7 inline comments as done.
klimek added inline comments.



Comment at: clang/lib/Format/Macros.h:201
+  /// Generally, this line tries to have the same structure as the expanded,
+  /// formatted unwrapped lines handed in via \c addLine(), with the exception
+  /// that for multiple top-level lines, each subsequent line will be the

sammccall wrote:
> you could give this a name like "tail form", and then refer to it in docs of 
> MacroCallReconstructor::Result, in MacroCallReconstructor.cpp:482, etc.
> 
> Up to you.
I'm somewhat unhappy with the term "tail form"; happy to do this in a 
subsequent change if we find a better name.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2022-07-11 Thread Sam McCall via Phabricator via cfe-commits
sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.
Herald added a project: All.

OK, I think we've reached the point where:

- the impact is very clear, this solves a whole class of clang-format's biggest 
problems
- the idea clearly works (there are good tests)
- the implementation is very well-documented: I can't really improve my 
understanding further by asking for things to be better explained
- I can't make clear suggestions to simplify - you've applied all my low-level 
suggestions and my high-level understanding is poor
- I still don't feel like I really understand how it works, but that's not 
really different than the other big pieces of clang-format

So I think all that's left to do here is ship it. It makes me uneasy that the 
core of clang-format is functionally magic (could anyone other than you and 
Daniel reproduce it after nuclear apocalypse?) but this doesn't really change 
that state.




Comment at: clang/lib/Format/MacroCallReconstructor.cpp:53
+  assert(State != Finalized);
+  LLVM_DEBUG(llvm::dbgs() << "Unex: new line...\n");
+  forEachToken(Line, [&](FormatToken *Token, FormatToken *Parent, bool First) {

if you want to keep these LLVM_DEBUGs, maybe this should be "MCR" or so instead 
of "unex"?



Comment at: clang/lib/Format/MacroCallReconstructor.cpp:62
+  finalize();
+  UnwrappedLine Final =
+  createUnwrappedLine(*Result.Tokens.front()->Children.front(), Level);

you might want an assertion that Result has one token with one child (it's 
pretty obvious in finalize() but less so here)



Comment at: clang/lib/Format/MacroCallReconstructor.cpp:98
+   ActiveExpansions.size() != Token->MacroCtx->ExpandedFrom.size())) {
+if (reconstructActiveCallUntil(Token))
+  First = true;

nit: I think this would be clearer by naming the result:
`if (bool PassedMacroComma = reconstruct...)`

(because it's not clear from the name what the function returns, and 
documenting it would help only a little)



Comment at: clang/lib/Format/MacroCallReconstructor.cpp:108
+appendToken(Token);
+  } else {
+// Otherwise add the reconstructed equivalent of the token.

(this is the else branch of a negated condition, consider swapping the branches 
to avoid double-negative)



Comment at: clang/lib/Format/MacroCallReconstructor.cpp:114
+
+// Adjusts the stack of active reconstructed liens so we're ready to push
+// tokens. The tokens to be pushed are children of ExpandedParent in the

liens -> lines

(unless there's a *really* weird metaphor going on here!)



Comment at: clang/lib/Format/MacroCallReconstructor.cpp:232
+} else if (!currentLine()->Tokens.empty()) {
+  // FIXME:assert(!currentLine()->Tokens.empty());
+  // Map all hidden tokens to the last visible token in the output.

this FIXME looks obsolete?



Comment at: clang/lib/Format/Macros.h:190
+  /// that needs to be processed to finish an macro call.
+  /// Only when \c finished() is true, \c getResult() can be called to retrieve
+  /// the resulting \c UnwrappedLine.

nit: getResult()->takeResult() in comment now



Comment at: clang/lib/Format/Macros.h:201
+  /// Generally, this line tries to have the same structure as the expanded,
+  /// formatted unwrapped lines handed in via \c addLine(), with the exception
+  /// that for multiple top-level lines, each subsequent line will be the

you could give this a name like "tail form", and then refer to it in docs of 
MacroCallReconstructor::Result, in MacroCallReconstructor.cpp:482, etc.

Up to you.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88299: [clang-format] Add MacroUnexpander.

2021-11-26 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

Noticed I should have waiting with the renaming of the file until the review is 
done :( Sorry for the extra confusion.




Comment at: clang/lib/Format/MacroUnexpander.cpp:77
+  // stream, we need to continue the unexpansion until we find the right token
+  // if it is part of the unexpanded token stream.
+  // Note that hidden tokens can be part of the unexpanded stream in nested

sammccall wrote:
> I can't work out what "it" refers to in this sentence.
> 
> (and "spelled" token stream?)
Changed to "given token" - it refers to the token.

It's reconstructed, not spelled, like the example below explains: we do have 
hidden tokens that we want to find when we're in the middle of a recursive 
expansion. I wouldn't call the hidden tokens here "spelled" - would you?



Comment at: clang/lib/Format/MacroUnexpander.cpp:82
+  // And the call: BRACED(BRACED(x))
+  // The outer macro call will be BRACED({a}), and the hidden tokens '{' and
+  // '}' can be found in the unexpanded macro stream of that level.

sammccall wrote:
> It would be helpful to complete the example by spelling out which token 
> you're adding, which the correct parent is, and which tokens you need to 
> "expand over" to make it available.
> 
> I *think* the answer to the first two is - when you're adding the `a` then 
> its proper parent is the inner `(` in `BRACED(BRACED(`... but I don't know 
> the last part.
Found out that I was missing a unit test. Added a unit test, and now explained 
the unit test here in the comment. PTAL.



Comment at: clang/lib/Format/MacroUnexpander.cpp:84
+  // '}' can be found in the unexpanded macro stream of that level.
+  if (!Unexpanded.empty() && Token->MacroCtx &&
+  (Token->MacroCtx->Role != MR_Hidden ||

sammccall wrote:
> is it possible that you may need to unexpand more than the innermost macro?
> 
> e.g. BRACED(ID(ID(BRACED(ID(ID(a)) expands to {{a}} and the parents of 
> `a` and inner `{` each come from a couple of macro-levels up.
> 
> (I don't totally understand the logic here, so the answer's probably "no"...)
A token on a higher level must always also be there on a lower level.
Calls in this example are:
BRACED({a})
ID({a})
ID({a})
BRACED(a)
ID(a)
ID(a)
When the next token comes in, we thus always find it in the higher level (open) 
macro call. When we reconstruct for that token, we then reconstruct multiple 
macro calls at once, if it is the first token in multiple macro calls.



Comment at: clang/lib/Format/MacroUnexpander.cpp:173
+  assert(Token->MacroCtx);
+  // A single token can be the only result of a macro call:
+  // Given: ID(x, y) ;

sammccall wrote:
> This raises another point: a macro can have an empty body.
> AFAICT this fundamentally isn't supported here, as we're driven by the 
> expanded token stream.
> I guess it makes sense to handle this elsewhere in clang-format (or even not 
> handle it) but it should be documented somewhere.
I think the right point to document and handle it is at the next layer, where 
we integrate all of this into clang-format.



Comment at: clang/lib/Format/MacroUnexpander.cpp:238
+  // generated by the call.
+  for (size_t I = Unexpanded.size(); I < Token->MacroCtx->ExpandedFrom.size();
+   ++I) {

sammccall wrote:
> nit: index arithmetic obscures what's going on a bit.
> You could write
> 
> ```
> ArrayRef StartedMacros = 
> makeArrayRef(Token->MacroCtx->ExpandedFrom).drop_front(Unexpanded.size());
> for (FormatToken *ID : llvm::reverse(StartedMacros)) {
> ...
> }
> ```
> but up to you
> 
> It's not obvious to me *why* we're iterating in reverse order BTW: i.e. why 
> the order of the `Unexpanded` stack is opposite the order of the 
> `ExpandedFrom` stack.
> So maybe just a comment to reinforce that, like (// ExpandedFrom is 
> outside-in order, we want inside-out)
Oooh, this is nice. s/drop_front/drop_back/. Added comment.



Comment at: clang/lib/Format/MacroUnexpander.cpp:282
+ (Unexpanded.size() >= Token->MacroCtx->EndOfExpansion));
+  if (!Token->MacroCtx)
+return;

sammccall wrote:
> nit: Token->MacroCtx is checked at the callsite, and startUnexpansion asserts 
> it as a precondition - do that here too for symmetry?
Thanks for spotting, this was from a time where the code looked differently 
(the assert above also didn't make sense any more in that form).



Comment at: clang/lib/Format/MacroUnexpander.cpp:310
+// Expand the closing parenthesis, if it exists, including an optional
+// trailing comment.
+for (auto T = Unexpanded.back().I; T != Unexpanded.back().End; ++T) {

sammccall wrote:
> I can't work out if this is supposed to say comma or comment :-)
> 
> If comma - is that a thing?
> If comment - why would a comment be considered part of the unexpansion of a 
> 

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2021-11-26 Thread Manuel Klimek via Phabricator via cfe-commits
klimek updated this revision to Diff 390057.
klimek marked 49 inline comments as done.
klimek added a comment.

Work in review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

Files:
  clang/lib/Format/CMakeLists.txt
  clang/lib/Format/FormatToken.h
  clang/lib/Format/MacroCallReconstructor.cpp
  clang/lib/Format/Macros.h
  clang/lib/Format/UnwrappedLineParser.h
  clang/unittests/Format/CMakeLists.txt
  clang/unittests/Format/MacroCallReconstructorTest.cpp

Index: clang/unittests/Format/MacroCallReconstructorTest.cpp
===
--- /dev/null
+++ clang/unittests/Format/MacroCallReconstructorTest.cpp
@@ -0,0 +1,688 @@
+#include "../../lib/Format/Macros.h"
+#include "../../lib/Format/UnwrappedLineParser.h"
+#include "TestLexer.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include 
+#include 
+#include 
+
+namespace clang {
+namespace format {
+namespace {
+
+using UnexpandedMap =
+llvm::DenseMap>;
+
+// Keeps track of a sequence of macro expansions.
+//
+// The expanded tokens are accessible via getTokens(), while a map of macro call
+// identifier token to unexpanded token stream is accessible via
+// getUnexpanded().
+class Expansion {
+public:
+  Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}
+
+  // Appends the token stream obtained from expanding the macro Name given
+  // the provided arguments, to be later retrieved with getTokens().
+  // Returns the list of tokens making up the unexpanded macro call.
+  TokenList
+  expand(llvm::StringRef Name,
+ const SmallVector, 1> &Args) {
+auto *ID = Lex.id(Name);
+auto UnexpandedLine = std::make_unique();
+UnexpandedLine->Tokens.push_back(ID);
+if (!Args.empty()) {
+  UnexpandedLine->Tokens.push_back(Lex.id("("));
+  for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
+if (I != Args.begin())
+  UnexpandedLine->Tokens.push_back(Lex.id(","));
+UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
+  I->end());
+  }
+  UnexpandedLine->Tokens.push_back(Lex.id(")"));
+}
+Unexpanded[ID] = std::move(UnexpandedLine);
+
+auto Expanded = uneof(Macros.expand(ID, Args));
+Tokens.append(Expanded.begin(), Expanded.end());
+
+TokenList UnexpandedTokens;
+for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
+  UnexpandedTokens.push_back(Node.Tok);
+}
+return UnexpandedTokens;
+  }
+
+  TokenList expand(llvm::StringRef Name,
+   const std::vector &Args = {}) {
+return expand(Name, lexArgs(Args));
+  }
+
+  const UnexpandedMap &getUnexpanded() const { return Unexpanded; }
+
+  const TokenList &getTokens() const { return Tokens; }
+
+private:
+  llvm::SmallVector
+  lexArgs(const std::vector &Args) {
+llvm::SmallVector Result;
+for (const auto &Arg : Args) {
+  Result.push_back(uneof(Lex.lex(Arg)));
+}
+return Result;
+  }
+  llvm::DenseMap> Unexpanded;
+  llvm::SmallVector Tokens;
+  TestLexer &Lex;
+  MacroExpander &Macros;
+};
+
+struct Chunk {
+  Chunk(llvm::ArrayRef Tokens)
+  : Tokens(Tokens.begin(), Tokens.end()) {}
+  Chunk(llvm::ArrayRef Children)
+  : Children(Children.begin(), Children.end()) {}
+  llvm::SmallVector Tokens;
+  llvm::SmallVector Children;
+};
+
+bool tokenMatches(const FormatToken *Left, const FormatToken *Right) {
+  if (Left->getType() == Right->getType() &&
+  Left->TokenText == Right->TokenText)
+return true;
+  llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
+  return false;
+}
+
+// Allows to produce chunks of a token list by typing the code of equal tokens.
+//
+// Created from a list of tokens, users call "consume" to get the next chunk
+// of tokens, checking that they match the written code.
+struct Matcher {
+  Matcher(const TokenList &Tokens, TestLexer &Lex)
+  : Tokens(Tokens), It(this->Tokens.begin()), Lex(Lex) {}
+
+  Chunk consume(StringRef Tokens) {
+TokenList Result;
+for (const FormatToken *Token : uneof(Lex.lex(Tokens))) {
+  assert(tokenMatches(*It, Token));
+  Result.push_back(*It);
+  ++It;
+}
+return Chunk(Result);
+  }
+
+  TokenList Tokens;
+  TokenList::iterator It;
+  TestLexer &Lex;
+};
+
+UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
+  const UnexpandedMap &M2) {
+  UnexpandedMap Result;
+  for (const auto &KV : M1) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  for (const auto &KV : M2) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  return Result;
+}
+
+class MacroCallReconstructorTest : public ::testing::Test {
+public:
+  MacroCallReconstructorTest() : Lex(Allocator, Buffer

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2021-02-03 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

Thanks a lot for all the time explaining, I get this at least at a high level 
now.
I have trouble following the ins and outs of the algorithm, but I think I 
understand most of it and tests cover the rest.

Regarding tests: I have trouble reasoning about whether all cases are tested. I 
wonder if we can easily throw this against (internal) coverage + mutation 
testing infrastructure? I'm not a massive believer in that stuff usually, but 
this seems like a good candidate.

Lots of comments around making this easier for confused people like me to 
understand, partly because I don't understand it well enough to suggest more 
substantive changes.
Throughout, I think it'd be nice to be explicit about:

- which structure tokens/lines are part of: spelled/expanded/unexpanded. 
(Including getting rid of input/output/incoming/original, which AFAICT are 
synonyms for these)
- which lines are complete, and which are partial (being added to as we go)

It took me a while to understand the control flow and to tell "who's driving" - 
I think it was hard for me to see that processNextUnexpanded was basically a 
leaf, and that 
unexpand/startUnexpansion/endUnexpansion/continueUnexpansionUntil were the 
layer above, and add() was the top. Maybe there's naming that would clarify 
this better, but I don't have great suggestions, and I'm not sure if this is a 
problem other people would have.
Maybe an example trace showing the internal method calls when parsing a small 
example might help... but again, not sure.




Comment at: clang/lib/Format/MacroUnexpander.cpp:52
+// the token, its parent and whether it is the first token in the line.
+void MacroUnexpander::forEachToken(
+const UnwrappedLine &Line,

this doesn't use any state, right?

it could be static or even private to the impl file.

replacing std::function with a template param allow this loop to be specialized 
for the one callsite - up to you, maybe it doesn't matter much, but it's not 
very invasive



Comment at: clang/lib/Format/MacroUnexpander.cpp:66
+
+// Unexand \p Token, given its parent \p OriginalParent in the incoming
+// unwrapped line. \p First specifies whether it is the first token in a given

I find using all of spelled/expanded/unexpanded, 
input/incoming/output/outgoing, and original, unclear.
(Particularly after OriginalParent is propagated around by that name without 
comment, and because "original" refers to the expanded structure, which is 
*not* the first one created)
Can we call this "ExpandedParent in the expanded unwrapped line"?



Comment at: clang/lib/Format/MacroUnexpander.cpp:66
+
+// Unexand \p Token, given its parent \p OriginalParent in the incoming
+// unwrapped line. \p First specifies whether it is the first token in a given

sammccall wrote:
> I find using all of spelled/expanded/unexpanded, 
> input/incoming/output/outgoing, and original, unclear.
> (Particularly after OriginalParent is propagated around by that name without 
> comment, and because "original" refers to the expanded structure, which is 
> *not* the first one created)
> Can we call this "ExpandedParent in the expanded unwrapped line"?
Unexand -> unexpand



Comment at: clang/lib/Format/MacroUnexpander.cpp:77
+  // stream, we need to continue the unexpansion until we find the right token
+  // if it is part of the unexpanded token stream.
+  // Note that hidden tokens can be part of the unexpanded stream in nested

I can't work out what "it" refers to in this sentence.

(and "spelled" token stream?)



Comment at: clang/lib/Format/MacroUnexpander.cpp:80
+  // macro calls.
+  // For example, given: BRACED(a) {a}
+  // And the call: BRACED(BRACED(x))

(in these examples throughout, #define BRACED(a) {a} would make it clearer that 
this is a macro definition and not code)



Comment at: clang/lib/Format/MacroUnexpander.cpp:82
+  // And the call: BRACED(BRACED(x))
+  // The outer macro call will be BRACED({a}), and the hidden tokens '{' and
+  // '}' can be found in the unexpanded macro stream of that level.

It would be helpful to complete the example by spelling out which token you're 
adding, which the correct parent is, and which tokens you need to "expand over" 
to make it available.

I *think* the answer to the first two is - when you're adding the `a` then its 
proper parent is the inner `(` in `BRACED(BRACED(`... but I don't know the last 
part.



Comment at: clang/lib/Format/MacroUnexpander.cpp:84
+  // '}' can be found in the unexpanded macro stream of that level.
+  if (!Unexpanded.empty() && Token->MacroCtx &&
+  (Token->MacroCtx->Role != MR_Hidden ||

is it possible that you may need to unexpand more than the innermost macro?

e.g. BRACED(ID(ID(BRACED(ID(ID(a)) expands to {{a}} and the parents o

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2020-10-28 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added inline comments.



Comment at: clang/lib/Format/FormatToken.h:449
 
+  /// When macro expansion introduces parents, those are marked as
+  /// \c MacroParent, so formatting knows their children need to be formatted.

sammccall wrote:
> I can't really understand from the comment when this is supposed to be set, 
> and there are no tests of it.
> 
> (The comment is vague: is a "parent" the inverse of FormatToken::Children 
> here? Is this scenario when the parents in question are new, or their 
> children are new, or both? What part of the code is "formatting", and why 
> would it otherwise skip the children?)
Rewrote comment.



Comment at: clang/lib/Format/Macros.h:134
 
+/// Matches formatted lines that were created by macro expansion to a format
+/// for the original macro call.

sammccall wrote:
> "matches formatted lines" probably describes the hard technical problem it 
> has to solve, but not so much what it does for the caller:  what the 
> transformation is between its inputs and its outputs.
> 
> Is it something like:
> 
> ```
> Rewrites expanded code (containing tokens expanded from macros) into spelled 
> code (containing tokens for the macro call itself). Token types are 
> preserved, so macro arguments in the output have semantically-correct types 
> from their expansion. This is the point of expansion/unexpansion: to allow 
> this information to be used in formatting.
> ```
> 
> [Is it just tokentypes? I guess it's also Role and MustBreakBefore and some 
> other stuff like that?]
It's not the token info, this we'd trivially have by using the original token 
sequence which is still lying around (we re-use the same tokens).

Reworded.



Comment at: clang/lib/Format/Macros.h:142
+/// When getting the formatted lines of the expansion via the \c addLine 
method:
+/// -> class A {
+/// -> public:

sammccall wrote:
> sammccall wrote:
> > I'm a bit confused by these arrows. It doesn't seem that they each point to 
> > an unwrappedline passed to addLine?
> This example didn't really help me understand the interface of this class, it 
> seems to be a special case:
>  - the input is a single block construct (rather than e.g. a whole program), 
> but it's not clear why that's the case.
>  - the output (unexpanded form) consists of exactly a macro call with no 
> leading/trailing tokens, which isn't true in general
> 
> If the idea is to provide as input the minimal range of lines that span the 
> macro, we should say so somewhere. But I would like to understand why we're 
> not simply processing the whole file.
Why not? That is the intention? Note that high-level we do not pass class 
definitions in one unwrapped line; if they ever go into a single line it's done 
in the end by a special merging pass (yes, that's a bit confusing).



Comment at: clang/lib/Format/Macros.h:142
+/// When getting the formatted lines of the expansion via the \c addLine 
method:
+/// -> class A {
+/// -> public:

klimek wrote:
> sammccall wrote:
> > sammccall wrote:
> > > I'm a bit confused by these arrows. It doesn't seem that they each point 
> > > to an unwrappedline passed to addLine?
> > This example didn't really help me understand the interface of this class, 
> > it seems to be a special case:
> >  - the input is a single block construct (rather than e.g. a whole 
> > program), but it's not clear why that's the case.
> >  - the output (unexpanded form) consists of exactly a macro call with no 
> > leading/trailing tokens, which isn't true in general
> > 
> > If the idea is to provide as input the minimal range of lines that span the 
> > macro, we should say so somewhere. But I would like to understand why we're 
> > not simply processing the whole file.
> Why not? That is the intention? Note that high-level we do not pass class 
> definitions in one unwrapped line; if they ever go into a single line it's 
> done in the end by a special merging pass (yes, that's a bit confusing).
Re: input is a single construct - that is not the case (see other comment)
Re: leading / trailing tokens: I didn't want to make it too complex in the 
example.



Comment at: clang/lib/Format/Macros.h:147
+///
+/// Creates the unwrapped lines containing the macro call tokens so that
+/// the macro call tokens fit the semantic structure of the expanded formatted

sammccall wrote:
> this says "creates the unwrapped lines" but getResult() returns only one.
> Does the plural here refer to the tree? Maybe just use singular or "the 
> unwrappedlinetree"?
Fixed comment.



Comment at: clang/lib/Format/Macros.h:154
+/// -> })
+class MacroUnexpander {
+public:

sammccall wrote:
> I get the symmetry between the expander/unexpander classes, but the name is 
> making it harder for me to follow the code.
>  - the extra compound+negation in the name make

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2020-10-28 Thread Manuel Klimek via Phabricator via cfe-commits
klimek updated this revision to Diff 301253.
klimek marked 3 inline comments as done.
klimek added a comment.

Adapted based on review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88299/new/

https://reviews.llvm.org/D88299

Files:
  clang/lib/Format/CMakeLists.txt
  clang/lib/Format/FormatToken.h
  clang/lib/Format/MacroUnexpander.cpp
  clang/lib/Format/Macros.h
  clang/unittests/Format/CMakeLists.txt
  clang/unittests/Format/MacroUnexpanderTest.cpp

Index: clang/unittests/Format/MacroUnexpanderTest.cpp
===
--- /dev/null
+++ clang/unittests/Format/MacroUnexpanderTest.cpp
@@ -0,0 +1,650 @@
+#include "../../lib/Format/Macros.h"
+#include "../../lib/Format/UnwrappedLineParser.h"
+#include "TestLexer.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include 
+#include 
+#include 
+
+namespace clang {
+namespace format {
+namespace {
+
+using UnexpandedMap = std::map>;
+
+// Keeps track of a sequence of macro expansions.
+//
+// The expanded tokens are accessible via getTokens(), while a map of macro call
+// identifier token to unexpanded token stream is accessible via
+// getUnexpanded().
+class Expansion {
+public:
+  Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}
+
+  // Appends the token stream obtained from expanding the macro Name given
+  // the provided arguments, to be later retrieved with getTokens().
+  // Returns the list of tokens making up the unexpanded macro call.
+  TokenList
+  expand(llvm::StringRef Name,
+ const SmallVector, 1> &Args) {
+auto *ID = Lex.id(Name);
+auto UnexpandedLine = std::make_unique();
+UnexpandedLine->Tokens.push_back(ID);
+if (!Args.empty()) {
+  UnexpandedLine->Tokens.push_back(Lex.id("("));
+  for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
+if (I != Args.begin())
+  UnexpandedLine->Tokens.push_back(Lex.id(","));
+UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
+  I->end());
+  }
+  UnexpandedLine->Tokens.push_back(Lex.id(")"));
+}
+Unexpanded[ID] = std::move(UnexpandedLine);
+
+auto Expanded = uneof(Macros.expand(ID, Args));
+Tokens.append(Expanded.begin(), Expanded.end());
+
+TokenList UnexpandedTokens;
+for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
+  UnexpandedTokens.push_back(Node.Tok);
+}
+return UnexpandedTokens;
+  }
+
+  TokenList expand(llvm::StringRef Name,
+   const std::vector &Args = {}) {
+return expand(Name, lexArgs(Args));
+  }
+
+  const UnexpandedMap &getUnexpanded() const { return Unexpanded; }
+
+  const TokenList &getTokens() const { return Tokens; }
+
+private:
+  llvm::SmallVector
+  lexArgs(const std::vector &Args) {
+llvm::SmallVector Result;
+for (const auto &Arg : Args) {
+  Result.push_back(uneof(Lex.lex(Arg)));
+}
+return Result;
+  }
+  std::map> Unexpanded;
+  llvm::SmallVector Tokens;
+  TestLexer &Lex;
+  MacroExpander &Macros;
+};
+
+struct Chunk {
+  Chunk(llvm::ArrayRef Tokens)
+  : Tokens(Tokens.begin(), Tokens.end()) {}
+  Chunk(llvm::ArrayRef Children)
+  : Children(Children.begin(), Children.end()) {}
+  llvm::SmallVector Tokens;
+  llvm::SmallVector Children;
+};
+
+bool tokenMatches(const FormatToken *Left, const FormatToken *Right) {
+  if (Left->getType() == Right->getType() &&
+  Left->TokenText == Right->TokenText)
+return true;
+  llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
+  return false;
+}
+
+// Allows to produce chunks of a token list by typing the code of equal tokens.
+//
+// Created from a list of tokens, users call "consume" to get the next chunk
+// of tokens, checking that they match the written code.
+struct Matcher {
+  Matcher(const TokenList &Tokens, TestLexer &Lex)
+  : Tokens(Tokens), It(this->Tokens.begin()), Lex(Lex) {}
+
+  Chunk consume(StringRef Tokens) {
+TokenList Result;
+for (const FormatToken *Token : uneof(Lex.lex(Tokens))) {
+  assert(tokenMatches(*It, Token));
+  Result.push_back(*It);
+  ++It;
+}
+return Chunk(Result);
+  }
+
+  TokenList Tokens;
+  TokenList::iterator It;
+  TestLexer &Lex;
+};
+
+UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
+  const UnexpandedMap &M2) {
+  UnexpandedMap Result;
+  for (const auto &KV : M1) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  for (const auto &KV : M2) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  return Result;
+}
+
+class MacroUnexpanderTest : public ::testing::Test {
+public:
+  MacroUnexpanderTest() : Lex(Allocator, Buffers) {}
+
+  std::unique_ptr
+  createExpander(const std::vector &MacroDefinitions) {
+ret

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2020-10-12 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

This is magnificently difficult :-)
I've sunk a fair few hours into understanding it now, and need to send some 
comments based on the interface+tests.
Some of these will be misguided or answered by the implementation, but I'm 
afraid I can't fit it all into my head (on a reasonable timeframe) until I 
understand it better.




Comment at: clang/lib/Format/FormatToken.h:449
 
+  /// When macro expansion introduces parents, those are marked as
+  /// \c MacroParent, so formatting knows their children need to be formatted.

I can't really understand from the comment when this is supposed to be set, and 
there are no tests of it.

(The comment is vague: is a "parent" the inverse of FormatToken::Children here? 
Is this scenario when the parents in question are new, or their children are 
new, or both? What part of the code is "formatting", and why would it otherwise 
skip the children?)



Comment at: clang/lib/Format/Macros.h:134
 
+/// Matches formatted lines that were created by macro expansion to a format
+/// for the original macro call.

"matches formatted lines" probably describes the hard technical problem it has 
to solve, but not so much what it does for the caller:  what the transformation 
is between its inputs and its outputs.

Is it something like:

```
Rewrites expanded code (containing tokens expanded from macros) into spelled 
code (containing tokens for the macro call itself). Token types are preserved, 
so macro arguments in the output have semantically-correct types from their 
expansion. This is the point of expansion/unexpansion: to allow this 
information to be used in formatting.
```

[Is it just tokentypes? I guess it's also Role and MustBreakBefore and some 
other stuff like that?]



Comment at: clang/lib/Format/Macros.h:142
+/// When getting the formatted lines of the expansion via the \c addLine 
method:
+/// -> class A {
+/// -> public:

I'm a bit confused by these arrows. It doesn't seem that they each point to an 
unwrappedline passed to addLine?



Comment at: clang/lib/Format/Macros.h:142
+/// When getting the formatted lines of the expansion via the \c addLine 
method:
+/// -> class A {
+/// -> public:

sammccall wrote:
> I'm a bit confused by these arrows. It doesn't seem that they each point to 
> an unwrappedline passed to addLine?
This example didn't really help me understand the interface of this class, it 
seems to be a special case:
 - the input is a single block construct (rather than e.g. a whole program), 
but it's not clear why that's the case.
 - the output (unexpanded form) consists of exactly a macro call with no 
leading/trailing tokens, which isn't true in general

If the idea is to provide as input the minimal range of lines that span the 
macro, we should say so somewhere. But I would like to understand why we're not 
simply processing the whole file.



Comment at: clang/lib/Format/Macros.h:147
+///
+/// Creates the unwrapped lines containing the macro call tokens so that
+/// the macro call tokens fit the semantic structure of the expanded formatted

this says "creates the unwrapped lines" but getResult() returns only one.
Does the plural here refer to the tree? Maybe just use singular or "the 
unwrappedlinetree"?



Comment at: clang/lib/Format/Macros.h:154
+/// -> })
+class MacroUnexpander {
+public:

I get the symmetry between the expander/unexpander classes, but the name is 
making it harder for me to follow the code.
 - the extra compound+negation in the name makes it hard/slow to understand, as 
I'm always thinking first about expansion
 - the fact that expansion/unexpansion are not actually opposites completely 
breaks my intuition. It also creates two different meanings of "unexpanded" 
that led me down the garden path a bit (e.g. in the test).

Would you consider `MacroCollapser`? It's artificial, but expand/collapse are 
well-known antonyms without being the same word.

(Incidentally, I just realized this is why I find "UnwrappedLine" so slippery - 
the ambiguity between "this isn't wrapped" and "this was unwrapped")



Comment at: clang/lib/Format/Macros.h:164
+  /// For the given \p Line, match all occurences of tokens expanded from a
+  /// macro and replace them with the original macro call in \c getResult().
+  void addLine(const UnwrappedLine &Line);

I find this hard to follow.
- again, the "match" part is an implementation detail that sounds interesting 
but isn't :-)
- "from a macro" sounds like one in particular, but is actually every macro
- the bit about "getResult" is a separate point that feels shoehorned in

What about:
"Replaces tokens that were expanded from macros with the original macro calls. 
The result is stored and available in getResult()"



Comment at: clang

[PATCH] D88299: [clang-format] Add MacroUnexpander.

2020-09-25 Thread Manuel Klimek via Phabricator via cfe-commits
klimek created this revision.
klimek added a reviewer: sammccall.
Herald added subscribers: cfe-commits, mgorny.
Herald added a project: clang.
klimek requested review of this revision.

MacroUnexpander applies the structural formatting of expanded lines into
UnwrappedLines to the corresponding unexpanded macro calls, resulting in
UnwrappedLines for the macro calls the user typed.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D88299

Files:
  clang/lib/Format/CMakeLists.txt
  clang/lib/Format/FormatToken.h
  clang/lib/Format/MacroUnexpander.cpp
  clang/lib/Format/Macros.h
  clang/unittests/Format/CMakeLists.txt
  clang/unittests/Format/MacroUnexpanderTest.cpp

Index: clang/unittests/Format/MacroUnexpanderTest.cpp
===
--- /dev/null
+++ clang/unittests/Format/MacroUnexpanderTest.cpp
@@ -0,0 +1,636 @@
+#include "../../lib/Format/Macros.h"
+#include "../../lib/Format/UnwrappedLineParser.h"
+#include "TestLexer.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include 
+#include 
+#include 
+
+namespace clang {
+namespace format {
+namespace {
+
+using UnexpandedMap = std::map>;
+
+class Expansion {
+public:
+  Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}
+
+  TokenList
+  expand(llvm::StringRef Name,
+ const SmallVector, 1> &Args) {
+auto *ID = Lex.id(Name);
+auto UnexpandedLine = std::make_unique();
+UnexpandedLine->Tokens.push_back(ID);
+if (!Args.empty()) {
+  UnexpandedLine->Tokens.push_back(Lex.id("("));
+  for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
+if (I != Args.begin())
+  UnexpandedLine->Tokens.push_back(Lex.id(","));
+UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
+  I->end());
+  }
+  UnexpandedLine->Tokens.push_back(Lex.id(")"));
+}
+Unexpanded[ID] = std::move(UnexpandedLine);
+
+auto Expanded = uneof(Macros.expand(ID, Args));
+Tokens.append(Expanded.begin(), Expanded.end());
+
+TokenList UnexpandedTokens;
+for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
+  UnexpandedTokens.push_back(Node.Tok);
+}
+return UnexpandedTokens;
+  }
+
+  TokenList expand(llvm::StringRef Name,
+   const std::vector &Args = {}) {
+return expand(Name, lexArgs(Args));
+  }
+
+  const UnexpandedMap &getUnexpanded() const { return Unexpanded; }
+
+  const TokenList &getTokens() const { return Tokens; }
+
+private:
+  llvm::SmallVector
+  lexArgs(const std::vector &Args) {
+llvm::SmallVector Result;
+for (const auto &Arg : Args) {
+  Result.push_back(uneof(Lex.lex(Arg)));
+}
+return Result;
+  }
+  std::map> Unexpanded;
+  llvm::SmallVector Tokens;
+  TestLexer &Lex;
+  MacroExpander &Macros;
+};
+
+struct Chunk {
+  Chunk(llvm::ArrayRef Tokens)
+  : Tokens(Tokens.begin(), Tokens.end()) {}
+  Chunk(llvm::ArrayRef Children)
+  : Children(Children.begin(), Children.end()) {}
+  llvm::SmallVector Tokens;
+  llvm::SmallVector Children;
+};
+
+bool tokenMatches(const FormatToken *Left, const FormatToken *Right) {
+  if (Left->getType() == Right->getType() &&
+  Left->TokenText == Right->TokenText)
+return true;
+  llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
+  return false;
+}
+
+struct Matcher {
+  Matcher(const TokenList &Tokens) : Tokens(Tokens), It(this->Tokens.begin()) {}
+
+  Chunk consume(const TokenList &Tokens) {
+TokenList Result;
+for (const FormatToken *Token : Tokens) {
+  assert(tokenMatches(*It, Token));
+  Result.push_back(*It);
+  ++It;
+}
+return Chunk(Result);
+  }
+
+  TokenList Tokens;
+  TokenList::iterator It;
+};
+
+UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
+  const UnexpandedMap &M2) {
+  UnexpandedMap Result;
+  for (const auto &KV : M1) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  for (const auto &KV : M2) {
+Result[KV.first] = std::make_unique(*KV.second);
+  }
+  return Result;
+}
+
+class MacroUnexpanderTest : public ::testing::Test {
+public:
+  std::unique_ptr
+  create(const std::vector &MacroDefinitions) {
+return std::make_unique(MacroDefinitions,
+   Lex.SourceMgr.get(), Lex.Style,
+   Lex.Allocator, Lex.IdentTable);
+  }
+
+  UnwrappedLine line(llvm::ArrayRef Tokens) {
+UnwrappedLine Result;
+for (FormatToken *Tok : Tokens) {
+  Result.Tokens.push_back(UnwrappedLineNode(Tok));
+}
+return Result;
+  }
+
+  UnwrappedLine line(llvm::StringRef Text) { return line({lex(Text)}); }
+
+  UnwrappedLine line(llvm::ArrayRef Chunks) {
+UnwrappedLine Result;
+for (const Chunk &Chunk : Chunks) {
+  Result.Tokens.