https://github.com/denzor200 created https://github.com/llvm/llvm-project/pull/173587
Implementation of this matcher was mostly generated by Cursor. This matcher(like [hasAdjacentSubstatements](https://github.com/llvm/llvm-project/pull/169965)) needed for me at least to implement https://github.com/llvm/llvm-project/issues/133110 and https://github.com/llvm/llvm-project/issues/38471 Also maybe we will use it in https://github.com/llvm/llvm-project/pull/158462 >From 2e0158102f4c21d2411d3d8b9c22cdb5aa0cc76c Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Tue, 23 Dec 2025 05:00:12 +0300 Subject: [PATCH 1/7] Initial implementation from Cursor --- clang/docs/LibASTMatchersReference.html | 31 +++ clang/docs/ReleaseNotes.rst | 1 + clang/include/clang/ASTMatchers/ASTMatchers.h | 20 ++ .../clang/ASTMatchers/ASTMatchersInternal.h | 32 +++ clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 78 ++++++ clang/lib/ASTMatchers/Dynamic/Registry.cpp | 1 + .../ASTMatchers/ASTMatchersTraversalTest.cpp | 254 ++++++++++++++++++ 7 files changed, 417 insertions(+) diff --git a/clang/docs/LibASTMatchersReference.html b/clang/docs/LibASTMatchersReference.html index e34ac30b8f5a4..cb65552751485 100644 --- a/clang/docs/LibASTMatchersReference.html +++ b/clang/docs/LibASTMatchersReference.html @@ -8211,6 +8211,22 @@ <h2 id="traversal-matchers">AST Traversal Matchers</h2> </pre></td></tr> +<tr><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html">CompoundStmt</a>></td><td class="name" onclick="toggle('forEachAdjacentSubstatements0')"><a name="forEachAdjacentSubstatements0Anchor">forEachAdjacentSubstatements</a></td><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>>, ..., Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>></td></tr> +<tr><td colspan="4" class="doc" id="forEachAdjacentSubstatements0"><pre>Matches compound statements where all sets of adjacent substatements +matching the provided sequence of matchers are found. Also matches +StmtExprs that have CompoundStmt as children. + +Given + { {}; 1+2; {}; 3+4; } +forEachAdjacentSubstatements(compoundStmt(), binaryOperator()) + matches '{ {}; 1+2; {}; 3+4; }' twice +with compoundStmt() + matching '{}' (first match) and '{}' (second match) +with binaryOperator() + matching '1+2' (first match) and '3+4' (second match) +</pre></td></tr> + + <tr><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1CoroutineBodyStmt.html">CoroutineBodyStmt</a>></td><td class="name" onclick="toggle('hasBody5')"><a name="hasBody5Anchor">hasBody</a></td><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>> InnerMatcher</td></tr> <tr><td colspan="4" class="doc" id="hasBody5"><pre>Matches a 'for', 'while', 'while' statement or a function or coroutine definition that has a given body. Note that in case of functions or @@ -9991,6 +10007,21 @@ <h2 id="traversal-matchers">AST Traversal Matchers</h2> </pre></td></tr> +<tr><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1StmtExpr.html">StmtExpr</a>></td><td class="name" onclick="toggle('forEachAdjacentSubstatements1')"><a name="forEachAdjacentSubstatements1Anchor">forEachAdjacentSubstatements</a></td><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>>, ..., Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>></td></tr> +<tr><td colspan="4" class="doc" id="forEachAdjacentSubstatements1"><pre>Matches StmtExprs with compound statements containing all sets of +adjacent substatements that match the provided sequence of matchers. + +Given + int x = ({ {}; 1+2; {}; 3+4; }); +forEachAdjacentSubstatements(compoundStmt(), binaryOperator()) + matches '({ {}; 1+2; {}; 3+4; })' twice +with compoundStmt() + matching '{}' (first match) and '{}' (second match) +with binaryOperator() + matching '1+2' (first match) and '3+4' (second match) +</pre></td></tr> + + <tr><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>></td><td class="name" onclick="toggle('alignOfExpr0')"><a name="alignOfExpr0Anchor">alignOfExpr</a></td><td>Matcher<<a href="https://clang.llvm.org/doxygen/classclang_1_1UnaryExprOrTypeTraitExpr.html">UnaryExprOrTypeTraitExpr</a>> InnerMatcher</td></tr> <tr><td colspan="4" class="doc" id="alignOfExpr0"><pre>Same as unaryExprOrTypeTraitExpr, but only matching alignof. diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index feaf92ad4415f..6c925412fc334 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -753,6 +753,7 @@ AST Matchers - Added ``hasExplicitParameters`` for ``LambdaExpr`` as an output attribute to AST JSON dumps. - Add ``arrayTypeLoc`` matcher for matching ``ArrayTypeLoc``. +- Add ``forEachAdjacentSubstatements``. clang-format ------------ diff --git a/clang/include/clang/ASTMatchers/ASTMatchers.h b/clang/include/clang/ASTMatchers/ASTMatchers.h index e3ec26207d2bc..9b5b3d0169623 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchers.h +++ b/clang/include/clang/ASTMatchers/ASTMatchers.h @@ -5911,6 +5911,26 @@ AST_POLYMORPHIC_MATCHER_P(hasAnySubstatement, Builder) != CS->body_end(); } + +/// Matches compound statements where one ore more sets of adjacent +/// substatements matching the provided sequence of matchers. Also matches +/// StmtExprs that have CompoundStmt as children. +/// +/// Given +/// \code +/// { {}; 1+2; {}; 3+4; } +/// \endcode +/// forEachAdjacentSubstatements(compoundStmt(), binaryOperator()) +/// matches '{ {}; 1+2; {}; 3+4; }' twice +/// with compoundStmt() +/// matching '{}' (first match) and '{}' (second match) +/// with binaryOperator() +/// matching '1+2' (first match) and '3+4' (second match) +extern const internal::VariadicFunction< + internal::ForEachAdjSubstatementsMatcherType, internal::Matcher<Stmt>, + internal::forEachAdjSubstatementsFunc> + forEachAdjacentSubstatements; + /// Checks that a compound statement contains a specific number of /// child statements. /// diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index c050fb7d797e3..8a6c287347202 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -47,6 +47,7 @@ #include "clang/AST/TemplateName.h" #include "clang/AST/Type.h" #include "clang/AST/TypeLoc.h" +#include "clang/ASTMatchers/ASTMatchersMacros.h" #include "clang/Basic/LLVM.h" #include "clang/Basic/OperatorKinds.h" #include "llvm/ADT/APFloat.h" @@ -2283,6 +2284,37 @@ using HasOpNameMatcher = HasOpNameMatcher hasAnyOperatorNameFunc(ArrayRef<const StringRef *> NameRefs); +/// Matches nodes of type T (CompoundStmt or StmtExpr) that contain sequences +/// of consecutive substatements matching the provided matchers in order. +/// +/// See \c forEachAdjacentSubstatements() in ASTMatchers.h for details. +template <typename T, typename ArgT = std::vector<Matcher<Stmt>>> +class ForEachAdjSubstatementsMatcher : public MatcherInterface<T> { + static_assert(std::is_same<T, CompoundStmt>::value || + std::is_same<T, StmtExpr>::value, + "Matcher only supports `CompoundStmt` and `StmtExpr`"); + static_assert(std::is_same<ArgT, std::vector<Matcher<Stmt>>>::value, + "Matcher ArgT must be std::vector<Matcher<Stmt>>"); + +public: + explicit ForEachAdjSubstatementsMatcher(std::vector<Matcher<Stmt>> Matchers) + : Matchers(std::move(Matchers)) {} + + bool matches(const T &Node, ASTMatchFinder *Finder, + BoundNodesTreeBuilder *Builder) const override; + +private: + std::vector<Matcher<Stmt>> Matchers; +}; + +using ForEachAdjSubstatementsMatcherType = + PolymorphicMatcher<ForEachAdjSubstatementsMatcher, + AST_POLYMORPHIC_SUPPORTED_TYPES(CompoundStmt, StmtExpr), + std::vector<Matcher<Stmt>>>; + +ForEachAdjSubstatementsMatcherType +forEachAdjSubstatementsFunc(ArrayRef<const Matcher<Stmt> *> MatcherRefs); + using HasOverloadOpNameMatcher = PolymorphicMatcher<HasOverloadedOperatorNameMatcher, void(TypeList<CXXOperatorCallExpr, FunctionDecl>), diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index a556ffca96903..54ce71a67579a 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -467,6 +467,80 @@ hasAnyOverloadedOperatorNameFunc(ArrayRef<const StringRef *> NameRefs) { return HasOverloadOpNameMatcher(vectorFromRefs(NameRefs)); } +static std::vector<Matcher<Stmt>> +vectorFromMatcherRefs(ArrayRef<const Matcher<Stmt> *> MatcherRefs) { + std::vector<Matcher<Stmt>> Matchers; + Matchers.reserve(MatcherRefs.size()); + for (auto *Matcher : MatcherRefs) + Matchers.push_back(*Matcher); + return Matchers; +} + +/// Creates a new BoundNodesTreeBuilder that extends the given builder. +/// This is a helper to simplify the pattern of creating a new builder, +/// adding the current builder's matches to it, and then using it for matching. +static BoundNodesTreeBuilder extendBuilder(BoundNodesTreeBuilder &&Builder) { + BoundNodesTreeBuilder Extended; + Extended.addMatch(std::move(Builder)); + return Extended; +} + +ForEachAdjSubstatementsMatcherType +forEachAdjSubstatementsFunc(ArrayRef<const Matcher<Stmt> *> MatcherRefs) { + return ForEachAdjSubstatementsMatcherType(vectorFromMatcherRefs(MatcherRefs)); +} + +template <typename T, typename ArgT> +bool ForEachAdjSubstatementsMatcher<T, ArgT>::matches( + const T &Node, ASTMatchFinder *Finder, + BoundNodesTreeBuilder *Builder) const { + const CompoundStmt *CS = CompoundStmtMatcher<T>::get(Node); + if (!CS) + return false; + + // Search for all sequences of adjacent substatements that match the matchers + const auto BodyBegin = CS->body_begin(); + const auto BodyEnd = CS->body_end(); + + if (Matchers.empty()) + return false; + + bool FoundAny = false; + + // Try each possible starting position + for (auto StartIt = BodyBegin; StartIt != BodyEnd; ++StartIt) { + // Check if there are enough statements remaining + if (std::distance(StartIt, BodyEnd) < + static_cast<ptrdiff_t>(Matchers.size())) + break; + + // Start with a fresh builder for this sequence + BoundNodesTreeBuilder SequenceBuilder; + + // Use enumerate to iterate over matchers and statements simultaneously + auto StmtRange = llvm::make_range(StartIt, StartIt + Matchers.size()); + for (auto [Idx, Matcher, StmtPtr] : llvm::enumerate(Matchers, StmtRange)) { + // Extend the builder before matching each statement + SequenceBuilder = extendBuilder(std::move(SequenceBuilder)); + if (!Matcher.matches(*StmtPtr, Finder, &SequenceBuilder)) + break; + + // If this is the last iteration and we matched, add the match + if (Idx == Matchers.size() - 1) { + Builder->addMatch(SequenceBuilder); + FoundAny = true; + } + } + } + + return FoundAny; +} + +template bool ForEachAdjSubstatementsMatcher<CompoundStmt>::matches( + const CompoundStmt &, ASTMatchFinder *, BoundNodesTreeBuilder *) const; +template bool ForEachAdjSubstatementsMatcher<StmtExpr>::matches( + const StmtExpr &, ASTMatchFinder *, BoundNodesTreeBuilder *) const; + HasNameMatcher::HasNameMatcher(std::vector<std::string> N) : UseUnqualifiedMatch( llvm::all_of(N, [](StringRef Name) { return !Name.contains("::"); })), @@ -1047,6 +1121,10 @@ const internal::VariadicFunction<internal::Matcher<NamedDecl>, StringRef, const internal::VariadicFunction<internal::HasOpNameMatcher, StringRef, internal::hasAnyOperatorNameFunc> hasAnyOperatorName = {}; +const internal::VariadicFunction<internal::ForEachAdjSubstatementsMatcherType, + internal::Matcher<Stmt>, + internal::forEachAdjSubstatementsFunc> + forEachAdjacentSubstatements = {}; const internal::VariadicFunction<internal::HasOverloadOpNameMatcher, StringRef, internal::hasAnyOverloadedOperatorNameFunc> hasAnyOverloadedOperatorName = {}; diff --git a/clang/lib/ASTMatchers/Dynamic/Registry.cpp b/clang/lib/ASTMatchers/Dynamic/Registry.cpp index d1b19659905ca..ea166edae1103 100644 --- a/clang/lib/ASTMatchers/Dynamic/Registry.cpp +++ b/clang/lib/ASTMatchers/Dynamic/Registry.cpp @@ -289,6 +289,7 @@ RegistryMaps::RegistryMaps() { REGISTER_MATCHER(hasAnyPlacementArg); REGISTER_MATCHER(hasAnySelector); REGISTER_MATCHER(hasAnySubstatement); + REGISTER_MATCHER(forEachAdjacentSubstatements); REGISTER_MATCHER(hasAnyTemplateArgument); REGISTER_MATCHER(hasAnyTemplateArgumentLoc); REGISTER_MATCHER(hasAnyUsingShadowDecl); diff --git a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp index c0a03deb5b543..ab0baade2d9e5 100644 --- a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp +++ b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp @@ -2399,6 +2399,260 @@ TEST(HasAnySubstatement, FindsSubstatementBetweenOthers) { compoundStmt(hasAnySubstatement(forStmt())))); } +TEST(ForEachAdjSubstatements, MatchesAdjacentSubstatements) { + // Basic case: compound statement followed by binary operator + EXPECT_TRUE( + matches("void f() { {} 1+2; }", compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesMultipleAdjacentPairs) { + // When multiple adjacent pairs exist, should match all of them + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; {} 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 2))); + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; {} 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo"))), + std::make_unique<VerifyIdIsBoundTo<BinaryOperator>>("bo", 2))); +} + +TEST(ForEachAdjSubstatements, DoesNotMatchNonAdjacentSubstatements) { + // Statements exist but not adjacent + EXPECT_TRUE(notMatches("void f() { {} 1; 1+2; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesInNestedCompoundStatements) { + // Should match in nested compound statements + EXPECT_TRUE(matches("void f() { if (true) { {} 1+2; } }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesAllAdjacentPairs) { + // When multiple adjacent pairs exist, should match all of them + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; int x; {} 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 2))); +} + +TEST(ForEachAdjSubstatements, DoesNotMatchEmptyCompound) { + // Empty compound statement has no adjacent pairs + EXPECT_TRUE( + notMatches("void f() { }", compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, DoesNotMatchSingleStatement) { + // Single statement has no adjacent pairs + EXPECT_TRUE( + notMatches("void f() { 1+2; }", compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesDifferentStatementTypes) { + // Test with different statement types + EXPECT_TRUE( + matches("void f() { for (;;); while (true); }", + compoundStmt(forEachAdjacentSubstatements(forStmt(), whileStmt())))); + + EXPECT_TRUE(matches( + "void f() { int x; return; }", + compoundStmt(forEachAdjacentSubstatements(declStmt(), returnStmt())))); +} + +TEST(ForEachAdjSubstatements, WorksWithStmtExpr) { + // Test that it works with StmtExpr (polymorphic support) + EXPECT_TRUE(matches( + "void f() { int x = ({ {} 1+2; }); }", + stmtExpr(forEachAdjacentSubstatements(compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, DoesNotMatchWrongOrder) { + // The order matters - binaryOperator must come after compoundStmt + EXPECT_TRUE(notMatches("void f() { 1+2; {} }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesWithStatementsBetween) { + // Should still match even if there are other statements before/after + EXPECT_TRUE(matches("void f() { int x; {} 1+2; int y; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesMultipleSequencesWithStatementsBetween) { + // Should match all sequences even with statements in between + constexpr llvm::StringRef Code = + "void f() { {} 1+2; int x; {} 3+4; int y; {} 5+6; }"; + const auto Matcher = compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo"))); + EXPECT_TRUE(matchAndVerifyResultTrue(Code, Matcher, + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 3))); + EXPECT_TRUE(matchAndVerifyResultTrue(Code, Matcher, + std::make_unique<VerifyIdIsBoundTo<BinaryOperator>>("bo", 3))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesThreeAdjacentSubstatements) { + // Test variadic version with 3 matchers + EXPECT_TRUE( + matches("void f() { {} 1+2; 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesMultipleSequences) { + // Test variadic version matches all sequences + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; 3+4; int x; {} 5+6; 7+8; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo1"), + binaryOperator().bind("bo2"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 2))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesFourAdjacentSubstatements) { + // Test variadic version with 4 matchers + EXPECT_TRUE(matches( + "void f() { int x; return; {} 1+2; }", + compoundStmt(forEachAdjacentSubstatements( + declStmt(), returnStmt(), compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesFiveAdjacentSubstatements) { + // Test variadic version with 5 matchers + EXPECT_TRUE(matches( + "void f() { for (;;); while (true); if (true) {} return; 1+2; }", + compoundStmt(forEachAdjacentSubstatements(forStmt(), whileStmt(), ifStmt(), + returnStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicDoesNotMatchNonAdjacentSequence) { + // Three matchers but statements are not all adjacent + EXPECT_TRUE( + notMatches("void f() { {} 1; 1+2; 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicDoesNotMatchPartialSequence) { + // First two match but third doesn't + EXPECT_TRUE( + notMatches("void f() { {} 1+2; return; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesInNestedCompound) { + // Test variadic version in nested compound statements + EXPECT_TRUE( + matches("void f() { if (true) { {} 1+2; 3+4; } }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesWithDifferentTypes) { + // Test variadic version with different statement types + EXPECT_TRUE(matches("void f() { for (;;); while (true); if (true) {} }", + compoundStmt(forEachAdjacentSubstatements( + forStmt(), whileStmt(), ifStmt())))); +} + +TEST(ForEachAdjSubstatements, VariadicDoesNotMatchWrongOrder) { + // Order matters in variadic version + EXPECT_TRUE( + notMatches("void f() { 1+2; {} 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesAllSequences) { + // When multiple sequences exist, should match all of them + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; 3+4; int x; {} 5+6; 7+8; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo1"), + binaryOperator().bind("bo2"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 2))); +} + +TEST(ForEachAdjSubstatements, VariadicWorksWithStmtExpr) { + // Test variadic version with StmtExpr + EXPECT_TRUE( + matches("void f() { int x = ({ {} 1+2; 3+4; }); }", + stmtExpr(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicRequiresMinimumStatements) { + // Need at least as many statements as matchers + EXPECT_TRUE( + notMatches("void f() { {} 1+2; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesWithStatementsBetween) { + // Should still match even if there are other statements before/after + EXPECT_TRUE( + matches("void f() { int x; {} 1+2; 3+4; int y; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesComplexSequence) { + // Test with a complex sequence of different statement types + EXPECT_TRUE(matches("void f() { int a; int b; return; {} 1+2; }", + compoundStmt(forEachAdjacentSubstatements( + declStmt(), declStmt(), returnStmt(), compoundStmt(), + binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicDoesNotMatchGapInSequence) { + // Sequence has a gap in the middle + EXPECT_TRUE( + notMatches("void f() { {} 1+2; int x; 3+4; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt(), binaryOperator(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, VariadicMatchesLongSequence) { + // Test with a longer sequence (6 statements) + EXPECT_TRUE(matches("void f() { int a; int b; int c; return; {} 1+2; }", + compoundStmt(forEachAdjacentSubstatements( + declStmt(), declStmt(), declStmt(), returnStmt(), + compoundStmt(), binaryOperator())))); +} + +TEST(ForEachAdjSubstatements, MatchesOverlappingSequences) { + // Test that overlapping sequences are both matched + // { {}; 1+2; {}; } has two sequences: {}->1+2 and 1+2->{} + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; {}; }", + compoundStmt(forEachAdjacentSubstatements( + compoundStmt().bind("cs"), binaryOperator().bind("bo"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 1))); + // The second sequence: binaryOperator followed by compoundStmt + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; {}; }", + compoundStmt(forEachAdjacentSubstatements( + binaryOperator().bind("bo"), compoundStmt().bind("cs"))), + std::make_unique<VerifyIdIsBoundTo<BinaryOperator>>("bo", 1))); + EXPECT_TRUE(matchAndVerifyResultTrue( + "void f() { {} 1+2; {}; }", + compoundStmt(forEachAdjacentSubstatements( + binaryOperator().bind("bo"), compoundStmt().bind("cs"))), + std::make_unique<VerifyIdIsBoundTo<CompoundStmt>>("cs", 1))); +} + TEST(Member, MatchesMemberAllocationFunction) { // Fails in C++11 mode EXPECT_TRUE(matchesConditionally( >From 05fd2989f51fb772ef982811a761df2c38a3302d Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Tue, 23 Dec 2025 14:14:45 +0300 Subject: [PATCH 2/7] review --- clang/include/clang/ASTMatchers/ASTMatchers.h | 2 +- .../include/clang/ASTMatchers/ASTMatchersInternal.h | 10 +++++----- clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 12 ++++++------ 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchers.h b/clang/include/clang/ASTMatchers/ASTMatchers.h index 9b5b3d0169623..5da86716587e5 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchers.h +++ b/clang/include/clang/ASTMatchers/ASTMatchers.h @@ -5927,7 +5927,7 @@ AST_POLYMORPHIC_MATCHER_P(hasAnySubstatement, /// with binaryOperator() /// matching '1+2' (first match) and '3+4' (second match) extern const internal::VariadicFunction< - internal::ForEachAdjSubstatementsMatcherType, internal::Matcher<Stmt>, + internal::ForEachAdjacentSubstatementsMatcherType, internal::Matcher<Stmt>, internal::forEachAdjSubstatementsFunc> forEachAdjacentSubstatements; diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index 8a6c287347202..5309d85014f58 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -2289,7 +2289,7 @@ HasOpNameMatcher hasAnyOperatorNameFunc(ArrayRef<const StringRef *> NameRefs); /// /// See \c forEachAdjacentSubstatements() in ASTMatchers.h for details. template <typename T, typename ArgT = std::vector<Matcher<Stmt>>> -class ForEachAdjSubstatementsMatcher : public MatcherInterface<T> { +class ForEachAdjacentSubstatementsMatcher : public MatcherInterface<T> { static_assert(std::is_same<T, CompoundStmt>::value || std::is_same<T, StmtExpr>::value, "Matcher only supports `CompoundStmt` and `StmtExpr`"); @@ -2297,7 +2297,7 @@ class ForEachAdjSubstatementsMatcher : public MatcherInterface<T> { "Matcher ArgT must be std::vector<Matcher<Stmt>>"); public: - explicit ForEachAdjSubstatementsMatcher(std::vector<Matcher<Stmt>> Matchers) + explicit ForEachAdjacentSubstatementsMatcher(std::vector<Matcher<Stmt>> Matchers) : Matchers(std::move(Matchers)) {} bool matches(const T &Node, ASTMatchFinder *Finder, @@ -2307,12 +2307,12 @@ class ForEachAdjSubstatementsMatcher : public MatcherInterface<T> { std::vector<Matcher<Stmt>> Matchers; }; -using ForEachAdjSubstatementsMatcherType = - PolymorphicMatcher<ForEachAdjSubstatementsMatcher, +using ForEachAdjacentSubstatementsMatcherType = + PolymorphicMatcher<ForEachAdjacentSubstatementsMatcher, AST_POLYMORPHIC_SUPPORTED_TYPES(CompoundStmt, StmtExpr), std::vector<Matcher<Stmt>>>; -ForEachAdjSubstatementsMatcherType +ForEachAdjacentSubstatementsMatcherType forEachAdjSubstatementsFunc(ArrayRef<const Matcher<Stmt> *> MatcherRefs); using HasOverloadOpNameMatcher = diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index 54ce71a67579a..6e2eb8ac96d76 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -485,13 +485,13 @@ static BoundNodesTreeBuilder extendBuilder(BoundNodesTreeBuilder &&Builder) { return Extended; } -ForEachAdjSubstatementsMatcherType +ForEachAdjacentSubstatementsMatcherType forEachAdjSubstatementsFunc(ArrayRef<const Matcher<Stmt> *> MatcherRefs) { - return ForEachAdjSubstatementsMatcherType(vectorFromMatcherRefs(MatcherRefs)); + return ForEachAdjacentSubstatementsMatcherType(vectorFromMatcherRefs(MatcherRefs)); } template <typename T, typename ArgT> -bool ForEachAdjSubstatementsMatcher<T, ArgT>::matches( +bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( const T &Node, ASTMatchFinder *Finder, BoundNodesTreeBuilder *Builder) const { const CompoundStmt *CS = CompoundStmtMatcher<T>::get(Node); @@ -536,9 +536,9 @@ bool ForEachAdjSubstatementsMatcher<T, ArgT>::matches( return FoundAny; } -template bool ForEachAdjSubstatementsMatcher<CompoundStmt>::matches( +template bool ForEachAdjacentSubstatementsMatcher<CompoundStmt>::matches( const CompoundStmt &, ASTMatchFinder *, BoundNodesTreeBuilder *) const; -template bool ForEachAdjSubstatementsMatcher<StmtExpr>::matches( +template bool ForEachAdjacentSubstatementsMatcher<StmtExpr>::matches( const StmtExpr &, ASTMatchFinder *, BoundNodesTreeBuilder *) const; HasNameMatcher::HasNameMatcher(std::vector<std::string> N) @@ -1121,7 +1121,7 @@ const internal::VariadicFunction<internal::Matcher<NamedDecl>, StringRef, const internal::VariadicFunction<internal::HasOpNameMatcher, StringRef, internal::hasAnyOperatorNameFunc> hasAnyOperatorName = {}; -const internal::VariadicFunction<internal::ForEachAdjSubstatementsMatcherType, +const internal::VariadicFunction<internal::ForEachAdjacentSubstatementsMatcherType, internal::Matcher<Stmt>, internal::forEachAdjSubstatementsFunc> forEachAdjacentSubstatements = {}; >From 5e805c03323cdcf584e4801eb3eac1697ad4d8ba Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Thu, 25 Dec 2025 23:26:55 +0300 Subject: [PATCH 3/7] Cursor implements memoization --- .../clang/ASTMatchers/ASTMatchersInternal.h | 29 +++++++ clang/lib/ASTMatchers/ASTMatchFinder.cpp | 53 ++++++++++-- clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 83 ++++++++++++------- 3 files changed, 124 insertions(+), 41 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index 5309d85014f58..4745668aac725 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -722,6 +722,16 @@ class ASTMatchFinder { AMM_ParentOnly }; + /// Type of match operation for memoization purposes. + enum MatchType { + /// Matching ancestors. + MT_Ancestors, + /// Matching descendants. + MT_Descendants, + /// Matching children. + MT_Child + }; + virtual ~ASTMatchFinder() = default; /// Returns true if the given C++ class is directly or indirectly derived @@ -795,6 +805,25 @@ class ASTMatchFinder { bool isTraversalIgnoringImplicitNodes() const; + /// Memoized match execution for custom matchers. + /// + /// This method provides memoization support for custom matcher operations + /// that don't use the standard matchesChildOf/matchesDescendantOf methods. + /// It checks the memoization cache before executing the provided callback, + /// and caches the result for future use. + /// + /// \param MatcherID The unique identifier for this matcher operation + /// \param Node The AST node to match against + /// \param Builder The bound nodes builder (will be updated with cached result) + /// \param MatchCallback The callback that performs the actual matching + /// \param Type The type of match (MT_Child, MT_Descendants, or MT_Ancestors) + /// \return true if the match succeeded, false otherwise + virtual bool memoizedMatch( + const DynTypedMatcher::MatcherIDType &MatcherID, + const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback, + MatchType Type) = 0; + protected: virtual bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, const DynTypedMatcher &Matcher, diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp index e8a0004c2e187..f82628c99af52 100644 --- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp +++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp @@ -46,12 +46,8 @@ typedef MatchFinder::MatchCallback MatchCallback; // optimize this on. static const unsigned MaxMemoizationEntries = 10000; -enum class MatchType { - Ancestors, - - Descendants, - Child, -}; +// MatchType is now defined in ASTMatchFinder, use it from there. +using MatchType = ASTMatchFinder::MatchType; // We use memoization to avoid running the same matcher on the same // AST node twice. This struct is the key for looking up match @@ -70,7 +66,7 @@ struct MatchKey { DynTypedNode Node; BoundNodesTreeBuilder BoundNodes; TraversalKind Traversal = TK_AsIs; - MatchType Type; + MatchType Type = MatchType::MT_Child; bool operator<(const MatchKey &Other) const { return std::tie(Traversal, Type, MatcherID, Node, BoundNodes) < @@ -609,7 +605,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, Key.BoundNodes = *Builder; Key.Traversal = Ctx.getParentMapContext().getTraversalKind(); // Memoize result even doing a single-level match, it might be expensive. - Key.Type = MaxDepth == 1 ? MatchType::Child : MatchType::Descendants; + Key.Type = MaxDepth == 1 ? MatchType::MT_Child : MatchType::MT_Descendants; MemoizationMap::iterator I = ResultCache.find(Key); if (I != ResultCache.end()) { *Builder = I->second.Nodes; @@ -700,6 +696,45 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, return matchesAnyAncestorOf(Node, Ctx, Matcher, Builder); } + // Implements ASTMatchFinder::memoizedMatch. + bool memoizedMatch(const DynTypedMatcher::MatcherIDType &MatcherID, + const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> + MatchCallback, + MatchType Type) override { + // Reset the cache if it's too large. + if (ResultCache.size() > MaxMemoizationEntries) + ResultCache.clear(); + + // For AST-nodes that don't have an identity, we can't memoize. + if (!Node.getMemoizationData() || !Builder->isComparable()) + return MatchCallback(Builder); + + MatchKey Key; + Key.MatcherID = MatcherID; + Key.Node = Node; + // Note that we key on the bindings *before* the match. + Key.BoundNodes = *Builder; + Key.Traversal = ActiveASTContext->getParentMapContext().getTraversalKind(); + Key.Type = Type; + + MemoizationMap::iterator I = ResultCache.find(Key); + if (I != ResultCache.end()) { + *Builder = I->second.Nodes; + return I->second.ResultOfMatch; + } + + MemoizedMatchResult Result; + Result.Nodes = *Builder; + Result.ResultOfMatch = MatchCallback(&Result.Nodes); + + MemoizedMatchResult &CachedResult = ResultCache[Key]; + CachedResult = std::move(Result); + + *Builder = CachedResult.Nodes; + return CachedResult.ResultOfMatch; + } + // Matches all registered matchers on the given node and calls the // result callback for every node that matches. void match(const DynTypedNode &Node) { @@ -1175,7 +1210,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, Keys.back().Node = Node; Keys.back().BoundNodes = *Builder; Keys.back().Traversal = Ctx.getParentMapContext().getTraversalKind(); - Keys.back().Type = MatchType::Ancestors; + Keys.back().Type = MatchType::MT_Ancestors; // Check the cache. MemoizationMap::iterator I = ResultCache.find(Keys.back()); diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index 6e2eb8ac96d76..9e7074c4284dc 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -498,42 +498,61 @@ bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( if (!CS) return false; - // Search for all sequences of adjacent substatements that match the matchers - const auto BodyBegin = CS->body_begin(); - const auto BodyEnd = CS->body_end(); - if (Matchers.empty()) return false; - bool FoundAny = false; - - // Try each possible starting position - for (auto StartIt = BodyBegin; StartIt != BodyEnd; ++StartIt) { - // Check if there are enough statements remaining - if (std::distance(StartIt, BodyEnd) < - static_cast<ptrdiff_t>(Matchers.size())) - break; - - // Start with a fresh builder for this sequence - BoundNodesTreeBuilder SequenceBuilder; - - // Use enumerate to iterate over matchers and statements simultaneously - auto StmtRange = llvm::make_range(StartIt, StartIt + Matchers.size()); - for (auto [Idx, Matcher, StmtPtr] : llvm::enumerate(Matchers, StmtRange)) { - // Extend the builder before matching each statement - SequenceBuilder = extendBuilder(std::move(SequenceBuilder)); - if (!Matcher.matches(*StmtPtr, Finder, &SequenceBuilder)) - break; - - // If this is the last iteration and we matched, add the match - if (Idx == Matchers.size() - 1) { - Builder->addMatch(SequenceBuilder); - FoundAny = true; - } - } - } + // Create a unique matcher ID for this matcher instance and sequence. + // We use the this pointer combined with the number of matchers to create + // a unique identifier. The actual matcher sequence is implicitly part of + // the matcher instance. + DynTypedMatcher::MatcherIDType MatcherID = + std::make_pair(ASTNodeKind::getFromNodeKind<T>(), + reinterpret_cast<uint64_t>(this)); + + DynTypedNode DynNode = DynTypedNode::create(Node); + + // Use memoization to avoid re-running the same matcher on the same node. + return Finder->memoizedMatch( + MatcherID, DynNode, Builder, + [this, CS, Finder](BoundNodesTreeBuilder *MemoBuilder) -> bool { + // Search for all sequences of adjacent substatements that match the + // matchers + const auto BodyBegin = CS->body_begin(); + const auto BodyEnd = CS->body_end(); + + bool FoundAny = false; + + // Try each possible starting position + for (auto StartIt = BodyBegin; StartIt != BodyEnd; ++StartIt) { + // Check if there are enough statements remaining + if (std::distance(StartIt, BodyEnd) < + static_cast<ptrdiff_t>(Matchers.size())) + break; + + // Start with a fresh builder for this sequence + BoundNodesTreeBuilder SequenceBuilder; + + // Use enumerate to iterate over matchers and statements simultaneously + auto StmtRange = + llvm::make_range(StartIt, StartIt + Matchers.size()); + for (auto [Idx, Matcher, StmtPtr] : + llvm::enumerate(Matchers, StmtRange)) { + // Extend the builder before matching each statement + SequenceBuilder = extendBuilder(std::move(SequenceBuilder)); + if (!Matcher.matches(*StmtPtr, Finder, &SequenceBuilder)) + break; + + // If this is the last iteration and we matched, add the match + if (Idx == Matchers.size() - 1) { + MemoBuilder->addMatch(SequenceBuilder); + FoundAny = true; + } + } + } - return FoundAny; + return FoundAny; + }, + ASTMatchFinder::MT_Child); } template bool ForEachAdjacentSubstatementsMatcher<CompoundStmt>::matches( >From c82f289898d5a19d75e2467db4ae8dfef07bc9a0 Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Thu, 25 Dec 2025 23:28:51 +0300 Subject: [PATCH 4/7] don't move MatchType --- .../clang/ASTMatchers/ASTMatchersInternal.h | 16 +++------------ clang/lib/ASTMatchers/ASTMatchFinder.cpp | 20 +++++++++++-------- clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 3 +-- 3 files changed, 16 insertions(+), 23 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index 4745668aac725..da51034312c38 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -722,16 +722,6 @@ class ASTMatchFinder { AMM_ParentOnly }; - /// Type of match operation for memoization purposes. - enum MatchType { - /// Matching ancestors. - MT_Ancestors, - /// Matching descendants. - MT_Descendants, - /// Matching children. - MT_Child - }; - virtual ~ASTMatchFinder() = default; /// Returns true if the given C++ class is directly or indirectly derived @@ -812,17 +802,17 @@ class ASTMatchFinder { /// It checks the memoization cache before executing the provided callback, /// and caches the result for future use. /// + /// The match type is assumed to be Child (matching children, not descendants). + /// /// \param MatcherID The unique identifier for this matcher operation /// \param Node The AST node to match against /// \param Builder The bound nodes builder (will be updated with cached result) /// \param MatchCallback The callback that performs the actual matching - /// \param Type The type of match (MT_Child, MT_Descendants, or MT_Ancestors) /// \return true if the match succeeded, false otherwise virtual bool memoizedMatch( const DynTypedMatcher::MatcherIDType &MatcherID, const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, - llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback, - MatchType Type) = 0; + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; protected: virtual bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp index f82628c99af52..9aaf820e75994 100644 --- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp +++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp @@ -46,8 +46,12 @@ typedef MatchFinder::MatchCallback MatchCallback; // optimize this on. static const unsigned MaxMemoizationEntries = 10000; -// MatchType is now defined in ASTMatchFinder, use it from there. -using MatchType = ASTMatchFinder::MatchType; +enum class MatchType { + Ancestors, + + Descendants, + Child, +}; // We use memoization to avoid running the same matcher on the same // AST node twice. This struct is the key for looking up match @@ -66,7 +70,7 @@ struct MatchKey { DynTypedNode Node; BoundNodesTreeBuilder BoundNodes; TraversalKind Traversal = TK_AsIs; - MatchType Type = MatchType::MT_Child; + MatchType Type; bool operator<(const MatchKey &Other) const { return std::tie(Traversal, Type, MatcherID, Node, BoundNodes) < @@ -605,7 +609,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, Key.BoundNodes = *Builder; Key.Traversal = Ctx.getParentMapContext().getTraversalKind(); // Memoize result even doing a single-level match, it might be expensive. - Key.Type = MaxDepth == 1 ? MatchType::MT_Child : MatchType::MT_Descendants; + Key.Type = MaxDepth == 1 ? MatchType::Child : MatchType::Descendants; MemoizationMap::iterator I = ResultCache.find(Key); if (I != ResultCache.end()) { *Builder = I->second.Nodes; @@ -700,8 +704,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, bool memoizedMatch(const DynTypedMatcher::MatcherIDType &MatcherID, const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, llvm::function_ref<bool(BoundNodesTreeBuilder *)> - MatchCallback, - MatchType Type) override { + MatchCallback) override { // Reset the cache if it's too large. if (ResultCache.size() > MaxMemoizationEntries) ResultCache.clear(); @@ -716,7 +719,8 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, // Note that we key on the bindings *before* the match. Key.BoundNodes = *Builder; Key.Traversal = ActiveASTContext->getParentMapContext().getTraversalKind(); - Key.Type = Type; + // Assume Child match type for custom matchers. + Key.Type = MatchType::Child; MemoizationMap::iterator I = ResultCache.find(Key); if (I != ResultCache.end()) { @@ -1210,7 +1214,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, Keys.back().Node = Node; Keys.back().BoundNodes = *Builder; Keys.back().Traversal = Ctx.getParentMapContext().getTraversalKind(); - Keys.back().Type = MatchType::MT_Ancestors; + Keys.back().Type = MatchType::Ancestors; // Check the cache. MemoizationMap::iterator I = ResultCache.find(Keys.back()); diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index 9e7074c4284dc..8d04fa29f43b6 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -551,8 +551,7 @@ bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( } return FoundAny; - }, - ASTMatchFinder::MT_Child); + }); } template bool ForEachAdjacentSubstatementsMatcher<CompoundStmt>::matches( >From c5333a9a9429ab99664490b128f11eb1711d261b Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Fri, 26 Dec 2025 00:23:29 +0300 Subject: [PATCH 5/7] pass the whole matcher, not only id --- .../clang/ASTMatchers/ASTMatchersInternal.h | 11 +++++---- clang/lib/ASTMatchers/ASTMatchFinder.cpp | 4 ++-- clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 23 ++++++++----------- 3 files changed, 18 insertions(+), 20 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index da51034312c38..032b2f3eaa46e 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -411,9 +411,10 @@ class DynTypedMatcher { public: /// Takes ownership of the provided implementation pointer. template <typename T> - DynTypedMatcher(MatcherInterface<T> *Implementation) + DynTypedMatcher(const MatcherInterface<T> *Implementation) : SupportedKind(ASTNodeKind::getFromNodeKind<T>()), - RestrictKind(SupportedKind), Implementation(Implementation) {} + RestrictKind(SupportedKind), + Implementation(Implementation) {} /// Construct from a variadic function. enum VariadicOperator { @@ -542,7 +543,7 @@ class DynTypedMatcher { private: DynTypedMatcher(ASTNodeKind SupportedKind, ASTNodeKind RestrictKind, - IntrusiveRefCntPtr<DynMatcherInterface> Implementation) + IntrusiveRefCntPtr<const DynMatcherInterface> Implementation) : SupportedKind(SupportedKind), RestrictKind(RestrictKind), Implementation(std::move(Implementation)) {} @@ -554,7 +555,7 @@ class DynTypedMatcher { /// It allows to perform implicit and dynamic cast of matchers without /// needing to change \c Implementation. ASTNodeKind RestrictKind; - IntrusiveRefCntPtr<DynMatcherInterface> Implementation; + IntrusiveRefCntPtr<const DynMatcherInterface> Implementation; }; /// Wrapper of a MatcherInterface<T> *that allows copying. @@ -810,7 +811,7 @@ class ASTMatchFinder { /// \param MatchCallback The callback that performs the actual matching /// \return true if the match succeeded, false otherwise virtual bool memoizedMatch( - const DynTypedMatcher::MatcherIDType &MatcherID, + const DynTypedMatcher &Matcher, const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp index 9aaf820e75994..553973e00e10d 100644 --- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp +++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp @@ -701,7 +701,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, } // Implements ASTMatchFinder::memoizedMatch. - bool memoizedMatch(const DynTypedMatcher::MatcherIDType &MatcherID, + bool memoizedMatch(const DynTypedMatcher &Matcher, const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) override { @@ -714,7 +714,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, return MatchCallback(Builder); MatchKey Key; - Key.MatcherID = MatcherID; + Key.MatcherID = Matcher.getID(); Key.Node = Node; // Note that we key on the bindings *before* the match. Key.BoundNodes = *Builder; diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index 8d04fa29f43b6..a8a389d5ccbf2 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -129,7 +129,7 @@ class VariadicMatcher : public DynMatcherInterface { class IdDynMatcher : public DynMatcherInterface { public: IdDynMatcher(StringRef ID, - IntrusiveRefCntPtr<DynMatcherInterface> InnerMatcher) + IntrusiveRefCntPtr<const DynMatcherInterface> InnerMatcher) : ID(ID), InnerMatcher(std::move(InnerMatcher)) {} bool dynMatches(const DynTypedNode &DynNode, ASTMatchFinder *Finder, @@ -145,7 +145,7 @@ class IdDynMatcher : public DynMatcherInterface { private: const std::string ID; - const IntrusiveRefCntPtr<DynMatcherInterface> InnerMatcher; + const IntrusiveRefCntPtr<const DynMatcherInterface> InnerMatcher; }; /// A matcher that always returns true. @@ -167,7 +167,7 @@ class DynTraversalMatcherImpl : public DynMatcherInterface { public: explicit DynTraversalMatcherImpl( clang::TraversalKind TK, - IntrusiveRefCntPtr<DynMatcherInterface> InnerMatcher) + IntrusiveRefCntPtr<const DynMatcherInterface> InnerMatcher) : TK(TK), InnerMatcher(std::move(InnerMatcher)) {} bool dynMatches(const DynTypedNode &DynNode, ASTMatchFinder *Finder, @@ -181,7 +181,7 @@ class DynTraversalMatcherImpl : public DynMatcherInterface { private: clang::TraversalKind TK; - IntrusiveRefCntPtr<DynMatcherInterface> InnerMatcher; + IntrusiveRefCntPtr<const DynMatcherInterface> InnerMatcher; }; } // namespace @@ -501,19 +501,16 @@ bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( if (Matchers.empty()) return false; - // Create a unique matcher ID for this matcher instance and sequence. - // We use the this pointer combined with the number of matchers to create - // a unique identifier. The actual matcher sequence is implicitly part of - // the matcher instance. - DynTypedMatcher::MatcherIDType MatcherID = - std::make_pair(ASTNodeKind::getFromNodeKind<T>(), - reinterpret_cast<uint64_t>(this)); - DynTypedNode DynNode = DynTypedNode::create(Node); + // Create a DynTypedMatcher wrapper for this matcher instance to use for + // memoization. The matcher sequence is implicitly part of the matcher + // instance, so using 'this' pointer provides a unique identifier. + DynTypedMatcher MatcherWrapper(this); + // Use memoization to avoid re-running the same matcher on the same node. return Finder->memoizedMatch( - MatcherID, DynNode, Builder, + MatcherWrapper, DynNode, Builder, [this, CS, Finder](BoundNodesTreeBuilder *MemoBuilder) -> bool { // Search for all sequences of adjacent substatements that match the // matchers >From fcf5facbb0b704d2d90e41ca7622238e7da7eaa5 Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Fri, 26 Dec 2025 01:30:07 +0300 Subject: [PATCH 6/7] ready to llm refactoring --- .../clang/ASTMatchers/ASTMatchersInternal.h | 35 +++++++++---------- clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 6 ++-- 2 files changed, 18 insertions(+), 23 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index 032b2f3eaa46e..524f4e67e35e2 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -788,6 +788,16 @@ class ASTMatchFinder { Matcher, Builder, MatchMode); } + template <typename T> + bool memoizedMatch( + const T &Node, const DynTypedMatcher &Matcher, + BoundNodesTreeBuilder *Builder, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) { + static_assert(std::is_base_of<Stmt, T>::value, + "type not allowed for recursive matching"); + return memoizedMatch(Matcher, DynTypedNode::create(Node), Builder, MatchCallback); + } + virtual ASTContext &getASTContext() const = 0; virtual bool IsMatchingInASTNodeNotSpelledInSource() const = 0; @@ -796,25 +806,6 @@ class ASTMatchFinder { bool isTraversalIgnoringImplicitNodes() const; - /// Memoized match execution for custom matchers. - /// - /// This method provides memoization support for custom matcher operations - /// that don't use the standard matchesChildOf/matchesDescendantOf methods. - /// It checks the memoization cache before executing the provided callback, - /// and caches the result for future use. - /// - /// The match type is assumed to be Child (matching children, not descendants). - /// - /// \param MatcherID The unique identifier for this matcher operation - /// \param Node The AST node to match against - /// \param Builder The bound nodes builder (will be updated with cached result) - /// \param MatchCallback The callback that performs the actual matching - /// \return true if the match succeeded, false otherwise - virtual bool memoizedMatch( - const DynTypedMatcher &Matcher, - const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, - llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; - protected: virtual bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, const DynTypedMatcher &Matcher, @@ -830,6 +821,12 @@ class ASTMatchFinder { const DynTypedMatcher &Matcher, BoundNodesTreeBuilder *Builder, AncestorMatchMode MatchMode) = 0; + + virtual bool memoizedMatch( + const DynTypedMatcher &Matcher, + const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; + private: friend struct ASTChildrenNotSpelledInSourceScope; virtual bool isMatchingChildrenNotSpelledInSource() const = 0; diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index a8a389d5ccbf2..76c20c4d03a9e 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -501,16 +501,14 @@ bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( if (Matchers.empty()) return false; - DynTypedNode DynNode = DynTypedNode::create(Node); - // Create a DynTypedMatcher wrapper for this matcher instance to use for // memoization. The matcher sequence is implicitly part of the matcher // instance, so using 'this' pointer provides a unique identifier. DynTypedMatcher MatcherWrapper(this); // Use memoization to avoid re-running the same matcher on the same node. - return Finder->memoizedMatch( - MatcherWrapper, DynNode, Builder, + return Finder->memoizedMatch(static_cast<const Stmt&>(*CS), + MatcherWrapper, Builder, [this, CS, Finder](BoundNodesTreeBuilder *MemoBuilder) -> bool { // Search for all sequences of adjacent substatements that match the // matchers >From 76394ee06e78088e27f0cec68e70dd2f4504de8b Mon Sep 17 00:00:00 2001 From: denzor200 <[email protected]> Date: Fri, 26 Dec 2025 04:04:54 +0300 Subject: [PATCH 7/7] llm refactoring --- .../clang/ASTMatchers/ASTMatchersInternal.h | 39 +++++---- clang/lib/ASTMatchers/ASTMatchFinder.cpp | 80 +++++++------------ clang/lib/ASTMatchers/ASTMatchersInternal.cpp | 4 +- 3 files changed, 55 insertions(+), 68 deletions(-) diff --git a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h index 524f4e67e35e2..f16c2e32cb6c2 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchersInternal.h +++ b/clang/include/clang/ASTMatchers/ASTMatchersInternal.h @@ -745,7 +745,8 @@ class ASTMatchFinder { template <typename T> bool matchesChildOf(const T &Node, const DynTypedMatcher &Matcher, - BoundNodesTreeBuilder *Builder, BindKind Bind) { + BoundNodesTreeBuilder *Builder, BindKind Bind, + llvm::function_ref<bool(BoundNodesTreeBuilder*)> MatchCallback) { static_assert(std::is_base_of<Decl, T>::value || std::is_base_of<Stmt, T>::value || std::is_base_of<NestedNameSpecifier, T>::value || @@ -754,6 +755,21 @@ class ASTMatchFinder { std::is_base_of<QualType, T>::value || std::is_base_of<Attr, T>::value, "unsupported type for recursive matching"); + return matchesChildOf(DynTypedNode::create(Node), getASTContext(), Matcher, + Builder, Bind, MatchCallback); + } + + template <typename T> + bool matchesChildOf(const T &Node, const DynTypedMatcher &Matcher, + BoundNodesTreeBuilder *Builder, BindKind Bind) { + static_assert(std::is_base_of<Decl, T>::value || + std::is_base_of<Stmt, T>::value || + std::is_base_of<NestedNameSpecifier, T>::value || + std::is_base_of<NestedNameSpecifierLoc, T>::value || + std::is_base_of<TypeLoc, T>::value || + std::is_base_of<QualType, T>::value || + std::is_base_of<Attr, T>::value, + "unsupported type for recursive matching"); return matchesChildOf(DynTypedNode::create(Node), getASTContext(), Matcher, Builder, Bind); } @@ -788,16 +804,6 @@ class ASTMatchFinder { Matcher, Builder, MatchMode); } - template <typename T> - bool memoizedMatch( - const T &Node, const DynTypedMatcher &Matcher, - BoundNodesTreeBuilder *Builder, - llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) { - static_assert(std::is_base_of<Stmt, T>::value, - "type not allowed for recursive matching"); - return memoizedMatch(Matcher, DynTypedNode::create(Node), Builder, MatchCallback); - } - virtual ASTContext &getASTContext() const = 0; virtual bool IsMatchingInASTNodeNotSpelledInSource() const = 0; @@ -807,6 +813,12 @@ class ASTMatchFinder { bool isTraversalIgnoringImplicitNodes() const; protected: + virtual bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, + const DynTypedMatcher &Matcher, + BoundNodesTreeBuilder *Builder, + BindKind Bind, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; + virtual bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, const DynTypedMatcher &Matcher, BoundNodesTreeBuilder *Builder, @@ -822,11 +834,6 @@ class ASTMatchFinder { BoundNodesTreeBuilder *Builder, AncestorMatchMode MatchMode) = 0; - virtual bool memoizedMatch( - const DynTypedMatcher &Matcher, - const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, - llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) = 0; - private: friend struct ASTChildrenNotSpelledInSourceScope; virtual bool isMatchingChildrenNotSpelledInSource() const = 0; diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp index 553973e00e10d..7739d45377416 100644 --- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp +++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp @@ -593,14 +593,15 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, return RecursiveASTVisitor<MatchASTVisitor>::dataTraverseNode(S, Queue); } - // Matches children or descendants of 'Node' with 'BaseMatcher'. - bool memoizedMatchesRecursively(const DynTypedNode &Node, ASTContext &Ctx, - const DynTypedMatcher &Matcher, - BoundNodesTreeBuilder *Builder, int MaxDepth, - BindKind Bind) { + // Matches children or descendants of 'Node' with a custom callback. + bool memoizedMatchesRecursively( + const DynTypedNode &Node, ASTContext &Ctx, + const DynTypedMatcher &Matcher, BoundNodesTreeBuilder *Builder, + int MaxDepth, BindKind Bind, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> MatchCallback) { // For AST-nodes that don't have an identity, we can't memoize. if (!Node.getMemoizationData() || !Builder->isComparable()) - return matchesRecursively(Node, Matcher, Builder, MaxDepth, Bind); + return MatchCallback(Builder); MatchKey Key; Key.MatcherID = Matcher.getID(); @@ -618,8 +619,7 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, MemoizedMatchResult Result; Result.Nodes = *Builder; - Result.ResultOfMatch = - matchesRecursively(Node, Matcher, &Result.Nodes, MaxDepth, Bind); + Result.ResultOfMatch = MatchCallback(&Result.Nodes); MemoizedMatchResult &CachedResult = ResultCache[Key]; CachedResult = std::move(Result); @@ -668,13 +668,27 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, bool Directly) override; public: - // Implements ASTMatchFinder::matchesChildOf. + // Implements ASTMatchFinder::matchesChildOf (with callback). bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, const DynTypedMatcher &Matcher, - BoundNodesTreeBuilder *Builder, BindKind Bind) override { + BoundNodesTreeBuilder *Builder, BindKind Bind, + llvm::function_ref<bool(BoundNodesTreeBuilder *)> + MatchCallback) override { if (ResultCache.size() > MaxMemoizationEntries) ResultCache.clear(); - return memoizedMatchesRecursively(Node, Ctx, Matcher, Builder, 1, Bind); + return memoizedMatchesRecursively(Node, Ctx, Matcher, Builder, 1, Bind, + MatchCallback); + } + + // Implements ASTMatchFinder::matchesChildOf (without callback). + bool matchesChildOf(const DynTypedNode &Node, ASTContext &Ctx, + const DynTypedMatcher &Matcher, + BoundNodesTreeBuilder *Builder, BindKind Bind) override { + return matchesChildOf(Node, Ctx, Matcher, Builder, Bind, + [this, &Node, &Matcher, Bind]( + BoundNodesTreeBuilder* Nodes) -> bool { + return matchesRecursively(Node, Matcher, Nodes, 1, Bind); + }); } // Implements ASTMatchFinder::matchesDescendantOf. bool matchesDescendantOf(const DynTypedNode &Node, ASTContext &Ctx, @@ -684,7 +698,11 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, if (ResultCache.size() > MaxMemoizationEntries) ResultCache.clear(); return memoizedMatchesRecursively(Node, Ctx, Matcher, Builder, INT_MAX, - Bind); + Bind, + [this, &Node, &Matcher, Bind]( + BoundNodesTreeBuilder* Nodes) -> bool { + return matchesRecursively(Node, Matcher, Nodes, INT_MAX, Bind); + }); } // Implements ASTMatchFinder::matchesAncestorOf. bool matchesAncestorOf(const DynTypedNode &Node, ASTContext &Ctx, @@ -700,44 +718,6 @@ class MatchASTVisitor : public RecursiveASTVisitor<MatchASTVisitor>, return matchesAnyAncestorOf(Node, Ctx, Matcher, Builder); } - // Implements ASTMatchFinder::memoizedMatch. - bool memoizedMatch(const DynTypedMatcher &Matcher, - const DynTypedNode &Node, BoundNodesTreeBuilder *Builder, - llvm::function_ref<bool(BoundNodesTreeBuilder *)> - MatchCallback) override { - // Reset the cache if it's too large. - if (ResultCache.size() > MaxMemoizationEntries) - ResultCache.clear(); - - // For AST-nodes that don't have an identity, we can't memoize. - if (!Node.getMemoizationData() || !Builder->isComparable()) - return MatchCallback(Builder); - - MatchKey Key; - Key.MatcherID = Matcher.getID(); - Key.Node = Node; - // Note that we key on the bindings *before* the match. - Key.BoundNodes = *Builder; - Key.Traversal = ActiveASTContext->getParentMapContext().getTraversalKind(); - // Assume Child match type for custom matchers. - Key.Type = MatchType::Child; - - MemoizationMap::iterator I = ResultCache.find(Key); - if (I != ResultCache.end()) { - *Builder = I->second.Nodes; - return I->second.ResultOfMatch; - } - - MemoizedMatchResult Result; - Result.Nodes = *Builder; - Result.ResultOfMatch = MatchCallback(&Result.Nodes); - - MemoizedMatchResult &CachedResult = ResultCache[Key]; - CachedResult = std::move(Result); - - *Builder = CachedResult.Nodes; - return CachedResult.ResultOfMatch; - } // Matches all registered matchers on the given node and calls the // result callback for every node that matches. diff --git a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp index 76c20c4d03a9e..022d24738bc57 100644 --- a/clang/lib/ASTMatchers/ASTMatchersInternal.cpp +++ b/clang/lib/ASTMatchers/ASTMatchersInternal.cpp @@ -507,8 +507,8 @@ bool ForEachAdjacentSubstatementsMatcher<T, ArgT>::matches( DynTypedMatcher MatcherWrapper(this); // Use memoization to avoid re-running the same matcher on the same node. - return Finder->memoizedMatch(static_cast<const Stmt&>(*CS), - MatcherWrapper, Builder, + return Finder->matchesChildOf(static_cast<const Stmt&>(*CS), + MatcherWrapper, Builder, ASTMatchFinder::BK_All, [this, CS, Finder](BoundNodesTreeBuilder *MemoBuilder) -> bool { // Search for all sequences of adjacent substatements that match the // matchers _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
