[clang] af6acd7 - [Clang][Comments] Support for parsing headers in Doxygen \par commands (#91100)

via cfe-commits Thu, 20 Jun 2024 09:15:00 -0700

Author: hdoc
Date: 2024-06-20T12:14:51-04:00
New Revision: af6acd7442646fde56de919964bd52d7bb7922b2


URL: 
https://github.com/llvm/llvm-project/commit/af6acd7442646fde56de919964bd52d7bb7922b2
DIFF: 
https://github.com/llvm/llvm-project/commit/af6acd7442646fde56de919964bd52d7bb7922b2.diff

LOG: [Clang][Comments] Support for parsing headers in Doxygen \par commands 
(#91100)

### Background

Doxygen's `\par` command
([link](https://www.doxygen.nl/manual/commands.html#cmdpar)) has an
optional argument, which denotes the header of the paragraph started by
a given `\par` command.

In short, the paragraph command can be used with a heading, or without
one. The code block below shows both forms and how the current version
of LLVM/Clang parses this code:
```
$ cat test.cpp
/// \par User defined paragraph:
/// Contents of the paragraph.
///
/// \par
/// New paragraph under the same heading.
///
/// \par
/// A second paragraph.
class A {};

$ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp
`-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition
  |-FullComment 0x1530fea38 <line:2:4, line:9:23>
  | |-ParagraphComment 0x1530fe7e0 <line:2:4>
  | | `-TextComment 0x1530fe7b8 <col:4> Text=" "
  | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par"
  | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30>
  | |   |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined 
paragraph:"
  | |   `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the 
paragraph."
  | |-ParagraphComment 0x1530fe8c0 <line:5:4>
  | | `-TextComment 0x1530fe898 <col:4> Text=" "
  | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par"
  | | `-ParagraphComment 0x1530fe930 <col:4, col:41>
  | |   `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under 
the same heading."
  | |-ParagraphComment 0x1530fe978 <line:8:4>
  | | `-TextComment 0x1530fe950 <col:4> Text=" "
  | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par"
  |   `-ParagraphComment 0x1530fe9e8 <col:4, col:23>
  |     `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph."
  `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A
```

As we can see above, the optional paragraph heading (`"User defined
paragraph"`) is not an argument of the `\par` `BlockCommandComment`, but
instead a child `TextComment`.

For documentation generators like [hdoc](https://hdoc.io/), it would be
ideal if we could parse Doxygen documentation comments with these
semantics in mind. Currently that's not possible.

### Change

This change parses `\par` command according to how Doxygen parses them,
making an optional header available as a an argument if it is present.
In addition:

- AST unit tests are defined to test this functionality when an argument
is present, isn't present, with additional spacing, etc.
- TableGen is updated with an `IsParCommand` to support this
functionality
- `lit` tests are updated where needed

Added: 
    

Modified: 
    clang/docs/ReleaseNotes.rst
    clang/include/clang/AST/CommentCommandTraits.h
    clang/include/clang/AST/CommentCommands.td
    clang/include/clang/AST/CommentParser.h
    clang/lib/AST/CommentParser.cpp
    clang/test/Index/comment-misc-tags.m
    clang/unittests/AST/CommentParser.cpp
    clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp

Removed: 
    


################################################################################
diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index d0e5e67651364..36e23981cc5df 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -150,6 +150,15 @@ here. Generic improvements to Clang as a whole or to its 
underlying
 infrastructure are described first, followed by language-specific
 sections with improvements to Clang's support for those languages.
 
+- The ``\par`` documentation comment command now supports an optional
+  argument, which denotes the header of the paragraph started by
+  an instance of the ``\par`` command comment. The implementation
+  of the argument handling matches its semantics
+  `in Doxygen <https://www.doxygen.nl/manual/commands.html#cmdpar>`.
+  Namely, any text on the same line as the ``\par`` command will become
+  a header for the paragaph, and if there is no text then the command
+  will start a new paragraph.
+
 C++ Language Changes
 --------------------
 - C++17 support is now completed, with the enablement of the

diff  --git a/clang/include/clang/AST/CommentCommandTraits.h 
b/clang/include/clang/AST/CommentCommandTraits.h
index 0c3254d84eb00..78c484fff3aed 100644
--- a/clang/include/clang/AST/CommentCommandTraits.h
+++ b/clang/include/clang/AST/CommentCommandTraits.h
@@ -88,6 +88,10 @@ struct CommandInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsHeaderfileCommand : 1;
 
+  /// True if this is a \\par command.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsParCommand : 1;
+
   /// True if we don't want to warn about this command being passed an empty
   /// paragraph.  Meaningful only for block commands.
   LLVM_PREFERRED_TYPE(bool)

diff  --git a/clang/include/clang/AST/CommentCommands.td 
b/clang/include/clang/AST/CommentCommands.td
index 06b2fa9b5531c..a410cd4039bee 100644
--- a/clang/include/clang/AST/CommentCommands.td
+++ b/clang/include/clang/AST/CommentCommands.td
@@ -18,6 +18,7 @@ class Command<string name> {
   bit IsThrowsCommand = 0;
   bit IsDeprecatedCommand = 0;
   bit IsHeaderfileCommand = 0;
+  bit IsParCommand = 0;
 
   bit IsEmptyParagraphAllowed = 0;
 
@@ -156,7 +157,7 @@ def Date       : BlockCommand<"date">;
 def Invariant  : BlockCommand<"invariant">;
 def Li         : BlockCommand<"li">;
 def Note       : BlockCommand<"note">;
-def Par        : BlockCommand<"par">;
+def Par        : BlockCommand<"par"> { let IsParCommand = 1; let NumArgs = 1; }
 def Post       : BlockCommand<"post">;
 def Pre        : BlockCommand<"pre">;
 def Remark     : BlockCommand<"remark">;

diff  --git a/clang/include/clang/AST/CommentParser.h 
b/clang/include/clang/AST/CommentParser.h
index a2d0e30835e2c..289f0b2c066b9 100644
--- a/clang/include/clang/AST/CommentParser.h
+++ b/clang/include/clang/AST/CommentParser.h
@@ -105,6 +105,9 @@ class Parser {
   ArrayRef<Comment::Argument>
   parseThrowCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
 
+  ArrayRef<Comment::Argument>
+  parseParCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
+
   BlockCommandComment *parseBlockCommand();
   InlineCommandComment *parseInlineCommand();
 
@@ -123,4 +126,3 @@ class Parser {
 } // end namespace clang
 
 #endif
-

diff  --git a/clang/lib/AST/CommentParser.cpp b/clang/lib/AST/CommentParser.cpp
index 5baf81a509fb6..d5e5bb27ceba3 100644
--- a/clang/lib/AST/CommentParser.cpp
+++ b/clang/lib/AST/CommentParser.cpp
@@ -222,6 +222,63 @@ class TextTokenRetokenizer {
     return true;
   }
 
+  // Check if this line starts with @par or \par
+  bool startsWithParCommand() {
+    unsigned Offset = 1;
+
+    // Skip all whitespace characters at the beginning.
+    // This needs to backtrack because Pos has already advanced past the
+    // actual \par or @par command by the time this function is called.
+    while (isWhitespace(*(Pos.BufferPtr - Offset)))
+      Offset++;
+
+    // Once we've reached the whitespace, backtrack and check if the previous
+    // four characters are \par or @par.
+    llvm::StringRef LineStart(Pos.BufferPtr - Offset - 3, 4);
+    return LineStart.starts_with("\\par") || LineStart.starts_with("@par");
+  }
+
+  /// Extract a par command argument-header.
+  bool lexParHeading(Token &Tok) {
+    if (isEnd())
+      return false;
+
+    Position SavedPos = Pos;
+
+    consumeWhitespace();
+    SmallString<32> WordText;
+    const char *WordBegin = Pos.BufferPtr;
+    SourceLocation Loc = getSourceLocation();
+
+    if (!startsWithParCommand())
+      return false;
+
+    // Read until the end of this token, which is effectively the end of the
+    // line. This gets us the content of the par header, if there is one.
+    while (!isEnd()) {
+      WordText.push_back(peek());
+      if (Pos.BufferPtr + 1 == Pos.BufferEnd) {
+        consumeChar();
+        break;
+      }
+      consumeChar();
+    }
+
+    unsigned Length = WordText.size();
+    if (Length == 0) {
+      Pos = SavedPos;
+      return false;
+    }
+
+    char *TextPtr = Allocator.Allocate<char>(Length + 1);
+
+    memcpy(TextPtr, WordText.c_str(), Length + 1);
+    StringRef Text = StringRef(TextPtr, Length);
+
+    formTokenWithChars(Tok, Loc, WordBegin, Length, Text);
+    return true;
+  }
+
   /// Extract a word -- sequence of non-whitespace characters.
   bool lexWord(Token &Tok) {
     if (isEnd())
@@ -394,6 +451,24 @@ Parser::parseThrowCommandArgs(TextTokenRetokenizer 
&Retokenizer,
   return llvm::ArrayRef(Args, ParsedArgs);
 }
 
+ArrayRef<Comment::Argument>
+Parser::parseParCommandArgs(TextTokenRetokenizer &Retokenizer,
+                            unsigned NumArgs) {
+  assert(NumArgs > 0);
+  auto *Args = new (Allocator.Allocate<Comment::Argument>(NumArgs))
+      Comment::Argument[NumArgs];
+  unsigned ParsedArgs = 0;
+  Token Arg;
+
+  while (ParsedArgs < NumArgs && Retokenizer.lexParHeading(Arg)) {
+    Args[ParsedArgs] = Comment::Argument{
+        SourceRange(Arg.getLocation(), Arg.getEndLocation()), Arg.getText()};
+    ParsedArgs++;
+  }
+
+  return llvm::ArrayRef(Args, ParsedArgs);
+}
+
 BlockCommandComment *Parser::parseBlockCommand() {
   assert(Tok.is(tok::backslash_command) || Tok.is(tok::at_command));
 
@@ -449,6 +524,9 @@ BlockCommandComment *Parser::parseBlockCommand() {
     else if (Info->IsThrowsCommand)
       S.actOnBlockCommandArgs(
           BC, parseThrowCommandArgs(Retokenizer, Info->NumArgs));
+    else if (Info->IsParCommand)
+      S.actOnBlockCommandArgs(BC,
+                              parseParCommandArgs(Retokenizer, Info->NumArgs));
     else
       S.actOnBlockCommandArgs(BC, parseCommandArgs(Retokenizer, 
Info->NumArgs));
 

diff  --git a/clang/test/Index/comment-misc-tags.m 
b/clang/test/Index/comment-misc-tags.m
index 47ee9d9aa392a..6d018dbfcf193 100644
--- a/clang/test/Index/comment-misc-tags.m
+++ b/clang/test/Index/comment-misc-tags.m
@@ -91,18 +91,16 @@ @interface IOCommandGate
   
 struct Test {int filler;};
 
-// CHECK:       (CXComment_BlockCommand CommandName=[par]
+// CHECK:       (CXComment_BlockCommand CommandName=[par] Arg[0]=User defined 
paragraph:
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:        (CXComment_Text Text=[ User defined paragraph:] 
HasTrailingNewline)
 // CHECK-NEXT:        (CXComment_Text Text=[ Contents of the paragraph.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[par]
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:        (CXComment_Text Text=[ New paragraph under the same 
heading.])))
+// CHECK-NEXT:        (CXComment_Text Text=[New paragraph under the same 
heading.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[note]
 // CHECK-NEXT:     (CXComment_Paragraph
 // CHECK-NEXT:        (CXComment_Text Text=[ This note consists of two 
paragraphs.] HasTrailingNewline)
 // CHECK-NEXT:        (CXComment_Text Text=[ This is the first paragraph.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[par]
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:     (CXComment_Text Text=[ And this is the second paragraph.])))
-
+// CHECK-NEXT:     (CXComment_Text Text=[And this is the second paragraph.])))

diff  --git a/clang/unittests/AST/CommentParser.cpp 
b/clang/unittests/AST/CommentParser.cpp
index 1c57c899f9074..e0df182d430c3 100644
--- a/clang/unittests/AST/CommentParser.cpp
+++ b/clang/unittests/AST/CommentParser.cpp
@@ -1639,6 +1639,143 @@ TEST_F(CommentParserTest, ThrowsCommandHasArg9) {
   }
 }
 
+TEST_F(CommentParserTest, ParCommandHasArg1) {
+  const char *Sources[] = {
+      "/// @par Paragraph header:",     "/// @par Paragraph header:\n",
+      "/// @par Paragraph header:\r\n", "/// @par Paragraph header:\n\r",
+      "/** @par Paragraph header:*/",
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 0));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg2) {
+  const char *Sources[] = {
+      "/// @par Paragraph header: ",     "/// @par Paragraph header: \n",
+      "/// @par Paragraph header: \r\n", "/// @par Paragraph header: \n\r",
+      "/** @par Paragraph header: */",
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 0));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header: ");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg3) {
+  const char *Sources[] = {
+      ("/// @par Paragraph header:\n"
+       "/// Paragraph body"),
+      ("/// @par Paragraph header:\r\n"
+       "/// Paragraph body"),
+      ("/// @par Paragraph header:\n\r"
+       "/// Paragraph body"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 1));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg4) {
+  const char *Sources[] = {
+      ("/// @par Paragraph header:\n"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+      ("/// @par Paragraph header:\r\n"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+      ("/// @par Paragraph header:\n\r"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 2));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body1");
+      ASSERT_TRUE(GetChildAt(PC, 1, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body2");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg5) {
+  const char *Sources[] = {
+      ("/// @par \n"
+       "/// Paragraphs with no text before newline have no heading"),
+      ("/// @par \r\n"
+       "/// Paragraphs with no text before newline have no heading"),
+      ("/// @par \n\r"
+       "/// Paragraphs with no text before newline have no heading"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 1));
+      ASSERT_TRUE(BCC->getNumArgs() == 0);
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() ==
+                  "Paragraphs with no text before newline have no heading");
+    }
+  }
+}
 
 } // unnamed namespace
 

diff  --git a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp 
b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
index a113b02e19995..aee7d38786a51 100644
--- a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
+++ b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
@@ -32,8 +32,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
     Record &Tag = *Tags[i];
     OS << "  { "
        << "\"" << Tag.getValueAsString("Name") << "\", "
-       << "\"" << Tag.getValueAsString("EndCommandName") << "\", "
-       << i << ", "
+       << "\"" << Tag.getValueAsString("EndCommandName") << "\", " << i << ", "
        << Tag.getValueAsInt("NumArgs") << ", "
        << Tag.getValueAsBit("IsInlineCommand") << ", "
        << Tag.getValueAsBit("IsBlockCommand") << ", "
@@ -44,6 +43,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
        << Tag.getValueAsBit("IsThrowsCommand") << ", "
        << Tag.getValueAsBit("IsDeprecatedCommand") << ", "
        << Tag.getValueAsBit("IsHeaderfileCommand") << ", "
+       << Tag.getValueAsBit("IsParCommand") << ", "
        << Tag.getValueAsBit("IsEmptyParagraphAllowed") << ", "
        << Tag.getValueAsBit("IsVerbatimBlockCommand") << ", "
        << Tag.getValueAsBit("IsVerbatimBlockEndCommand") << ", "
@@ -52,8 +52,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
        << Tag.getValueAsBit("IsFunctionDeclarationCommand") << ", "
        << Tag.getValueAsBit("IsRecordLikeDetailCommand") << ", "
        << Tag.getValueAsBit("IsRecordLikeDeclarationCommand") << ", "
-       << /* IsUnknownCommand = */ "0"
-       << " }";
+       << /* IsUnknownCommand = */ "0" << " }";
     if (i + 1 != e)
       OS << ",";
     OS << "\n";


        
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] af6acd7 - [Clang][Comments] Support for parsing headers in Doxygen \par commands (#91100)

Reply via email to