[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-30 Thread via cfe-commits

https://github.com/mydeveloperday commented:

LGTM

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-30 Thread Hirofumi Nakamura via cfe-commits

hnakamura5 wrote:

@HazardyKnusperkeks 
Thank you for checking and accepting!

@mydeveloperday 
You will be able to see the points in the consequent PRs.

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-30 Thread Hirofumi Nakamura via cfe-commits

https://github.com/hnakamura5 closed 
https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-29 Thread Björn Schäpers via cfe-commits

https://github.com/HazardyKnusperkeks approved this pull request.

I didn't say anything because I was also waiting on @mydeveloperday.

I see no problem with the current approach and think token annotator tests are 
enough.

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-29 Thread Hirofumi Nakamura via cfe-commits


@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
   return;
 }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-   TT_TableGenMultiLineString)) {
-// Set again with finalizing. This must never be annotated as other types.
-Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-Tokens.back()->Tok.setKind(tok::string_literal);
-return;
+  if (Style.isTableGen()) {
+// TableGen's Multi line string starts with [{
+if (tryMergeTokens({tok::l_square, tok::l_brace},
+   TT_TableGenMultiLineString)) {
+  // Set again with finalizing. This must never be annotated as other 
types.
+  Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+  Tokens.back()->Tok.setKind(tok::string_literal);
+  return;
+}
+// TableGen's bang operator is the form !.
+// !cond is a special case with specific syntax.
+if (tryMergeTokens({tok::exclaim, tok::identifier},
+   TT_TableGenBangOperator)) {
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  if (Tokens.back()->TokenText == "!cond")
+Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+  else
+Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+  // Here, "! if" becomes "!if".  That is, ! captures if even when the 
space
+  // exists. That is only one possibility in TableGen's syntax.
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+// +, - with numbers are literals. Not unary operators.
+if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;

hnakamura5 wrote:

@mydeveloperday 
Thank you for reviewing this pull request.
Now it comes to 1 week since this PR is started. I want to continue before I 
forget.
Could you mind accepting or adding some suggestion?
Or if you do not intend neither, can I request another reviewer? 

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-23 Thread Hirofumi Nakamura via cfe-commits


@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
   return;
 }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-   TT_TableGenMultiLineString)) {
-// Set again with finalizing. This must never be annotated as other types.
-Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-Tokens.back()->Tok.setKind(tok::string_literal);
-return;
+  if (Style.isTableGen()) {
+// TableGen's Multi line string starts with [{
+if (tryMergeTokens({tok::l_square, tok::l_brace},
+   TT_TableGenMultiLineString)) {
+  // Set again with finalizing. This must never be annotated as other 
types.
+  Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+  Tokens.back()->Tok.setKind(tok::string_literal);
+  return;
+}
+// TableGen's bang operator is the form !.
+// !cond is a special case with specific syntax.
+if (tryMergeTokens({tok::exclaim, tok::identifier},
+   TT_TableGenBangOperator)) {
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  if (Tokens.back()->TokenText == "!cond")
+Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+  else
+Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+  // Here, "! if" becomes "!if".  That is, ! captures if even when the 
space
+  // exists. That is only one possibility in TableGen's syntax.
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+// +, - with numbers are literals. Not unary operators.
+if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;

hnakamura5 wrote:

https://llvm.org/docs/TableGen/ProgRef.html#values-and-expressions

As far as I read from the manual, TableGen does not have `+` as infix binary 
operator.
And as noted in the warning above, `-` is lexed as the integer's prefix rather 
than infix operator for range and slice.

> could we build a better set of FormatTableGen unit tests to ensure we don't 
> cause any regressions?could we build a better set of FormatTableGen unit 
> tests to ensure we don't cause any regressions?

I agree, and actually there is a comprehensive set of unit test for TableGen's 
syntax here. (Even this may be missing real examples of TableGen usage in 
target definition, mlir and so on.)
https://github.com/llvm/llvm-project/pull/76059/files#diff-2ce45a84684fe19d813e79bab2f732809f3544d38f344e3d2cfe23aa9216a1c8

Current pull request is separated from this PR. I'm wondering when to add the 
test. Because now it only recognizes tokens, and cannot format many part of 
that yet.


https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread via cfe-commits

https://github.com/mydeveloperday edited 
https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread via cfe-commits


@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
   return;
 }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-   TT_TableGenMultiLineString)) {
-// Set again with finalizing. This must never be annotated as other types.
-Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-Tokens.back()->Tok.setKind(tok::string_literal);
-return;
+  if (Style.isTableGen()) {
+// TableGen's Multi line string starts with [{
+if (tryMergeTokens({tok::l_square, tok::l_brace},
+   TT_TableGenMultiLineString)) {
+  // Set again with finalizing. This must never be annotated as other 
types.
+  Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+  Tokens.back()->Tok.setKind(tok::string_literal);
+  return;
+}
+// TableGen's bang operator is the form !.
+// !cond is a special case with specific syntax.
+if (tryMergeTokens({tok::exclaim, tok::identifier},
+   TT_TableGenBangOperator)) {
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  if (Tokens.back()->TokenText == "!cond")
+Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+  else
+Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+  // Here, "! if" becomes "!if".  That is, ! captures if even when the 
space
+  // exists. That is only one possibility in TableGen's syntax.
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+// +, - with numbers are literals. Not unary operators.
+if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;
+}
+if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;

mydeveloperday wrote:

Ditto

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread via cfe-commits


@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
   return;
 }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-   TT_TableGenMultiLineString)) {
-// Set again with finalizing. This must never be annotated as other types.
-Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-Tokens.back()->Tok.setKind(tok::string_literal);
-return;
+  if (Style.isTableGen()) {
+// TableGen's Multi line string starts with [{
+if (tryMergeTokens({tok::l_square, tok::l_brace},
+   TT_TableGenMultiLineString)) {
+  // Set again with finalizing. This must never be annotated as other 
types.
+  Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+  Tokens.back()->Tok.setKind(tok::string_literal);
+  return;
+}
+// TableGen's bang operator is the form !.
+// !cond is a special case with specific syntax.
+if (tryMergeTokens({tok::exclaim, tok::identifier},
+   TT_TableGenBangOperator)) {
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  if (Tokens.back()->TokenText == "!cond")
+Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+  else
+Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+  // Here, "! if" becomes "!if".  That is, ! captures if even when the 
space
+  // exists. That is only one possibility in TableGen's syntax.
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+// +, - with numbers are literals. Not unary operators.
+if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;

mydeveloperday wrote:

I'm not a TableGen expert, but what does this do to `[Offset + 1]` type code in 
a td file? could we build a better set of FormatTableGen unit tests to ensure 
we don't cause any regressions?

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread via cfe-commits

https://github.com/mydeveloperday commented:

I don't use TableGen myself so I can't really say if this is quite the correct 
thing to do, I feel it needs a TableGenTest.cpp example of real TableGen code 
for every example you give here. Then we should try and cover the other use 
cases. That we might be breaking

I have slight concern of merging `+` and a `number` because of `X + 1` type 
logic that might exist. but Like I said I don't know enough about TableGen 
syntax

https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-format

Author: Hirofumi Nakamura (hnakamura5)


Changes

Adds the support for tokens that have forms like unary operators.
- bang operators:  `!name`
- cond operator: `!cond`
- numeric literals: `+1`, `-1`
cond operator are one of bang operators but is distinguished because it has 
very specific syntax.

---
Full diff: https://github.com/llvm/llvm-project/pull/78996.diff


3 Files Affected:

- (modified) clang/lib/Format/FormatToken.h (+2) 
- (modified) clang/lib/Format/FormatTokenLexer.cpp (+38-7) 
- (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+20-4) 


``diff
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index dede89f2600150f..bace91b5f99b4df 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -148,6 +148,8 @@ namespace format {
   TYPE(StructLBrace)   
\
   TYPE(StructRBrace)   
\
   TYPE(StructuredBindingLSquare)   
\
+  TYPE(TableGenBangOperator)   
\
+  TYPE(TableGenCondOperator)   
\
   TYPE(TableGenMultiLineString)
\
   TYPE(TemplateCloser) 
\
   TYPE(TemplateOpener) 
\
diff --git a/clang/lib/Format/FormatTokenLexer.cpp 
b/clang/lib/Format/FormatTokenLexer.cpp
index 52a55ea23b5f2f7..d7de09ef0e12ab6 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
   return;
 }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-   TT_TableGenMultiLineString)) {
-// Set again with finalizing. This must never be annotated as other types.
-Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-Tokens.back()->Tok.setKind(tok::string_literal);
-return;
+  if (Style.isTableGen()) {
+// TableGen's Multi line string starts with [{
+if (tryMergeTokens({tok::l_square, tok::l_brace},
+   TT_TableGenMultiLineString)) {
+  // Set again with finalizing. This must never be annotated as other 
types.
+  Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+  Tokens.back()->Tok.setKind(tok::string_literal);
+  return;
+}
+// TableGen's bang operator is the form !.
+// !cond is a special case with specific syntax.
+if (tryMergeTokens({tok::exclaim, tok::identifier},
+   TT_TableGenBangOperator)) {
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  if (Tokens.back()->TokenText == "!cond")
+Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+  else
+Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+  // Here, "! if" becomes "!if".  That is, ! captures if even when the 
space
+  // exists. That is only one possibility in TableGen's syntax.
+  Tokens.back()->Tok.setKind(tok::identifier);
+  Tokens.back()->Tok.setIdentifierInfo(nullptr);
+  Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+  return;
+}
+// +, - with numbers are literals. Not unary operators.
+if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;
+}
+if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
+  Tokens.back()->Tok.setKind(tok::numeric_constant);
+  return;
+}
   }
 }
 
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp 
b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 3dbf504c35ed55e..cb93930e0fc3bc8 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2210,16 +2210,24 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
   EXPECT_TRUE(Tokens[0]->IsMultiline);
   EXPECT_EQ(Tokens[0]->LastLineColumnWidth, sizeof("   the string. }]") - 1);
 
+  // Numeric literals.
+  Tokens = Annotate("1234");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("-1");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("+1234");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("0b0110");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("0x1abC");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+
   // Ident

[clang] [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. (PR #78996)

2024-01-22 Thread Hirofumi Nakamura via cfe-commits

https://github.com/hnakamura5 edited 
https://github.com/llvm/llvm-project/pull/78996
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits