[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-17 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

Superseded by #78092.

Thanks for putting together this patch, @robozati. While we didn't go with the 
fix approach taken in this patch, it did prod me to look at the issue in more 
detail :laughing: 

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-17 Thread Nathan Ridge via cfe-commits

https://github.com/HighCommander4 closed 
https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-14 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

> What do you think about this as an alternative: if `spelledForExpanded` 
> receives as input an expanded token range `[firstExpanded, lastExpanded]` 
> where `lastExpanded` is the `eof` token, return the spelled tokens 
> corresponding to `[firstExpanded, lastExpanded - 1]` instead? (In the case 
> where the `eof` token is the only one in the range, we could return nullopt.)
> 
> This would have the advantage of gracefully handling AST nodes like this one 
> whose end location is the `eof`, and return all the spelled tokens actually 
> making up the node.

I wrote this up at https://github.com/llvm/llvm-project/pull/78092

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-08 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

> I am slightly suspicious of source locations pointing at `eof` in the AST in 
> the first place, even in invalid code. I wonder if we would be better off 
> just having an invalid source location there instead of pointing at `eof`.

I think in this case it's deliberate. If you look at the compiler error for the 
missing brace:

```c++
test.cpp:1:13: error: expected '}'
1 | int main() {
  | ^
test.cpp:1:12: note: to match this '{'
1 | int main() {
  |^
```

you can see that the diagnostic location points to one character past the 
opening brace, which is the `eof` token. I'm not sure how else the diagnostic 
location could end up there.

> Asserting that `tok::eof` should not be passed to `spelledForExpandedToken` 
> and returning `nullopt` in `spelledForExpanded` looks like a reasonable 
> workaround until we fix this. (The semantics looks reasonable as we don't 
> really have a spelled token we can map `eof` back to).

What do you think about this as an alternative: if `spelledForExpanded` 
receives as input an expanded token range `[firstExpanded, lastExpanded]` where 
`lastExpanded` is the `eof` token, return the spelled tokens corresponding to 
`[firstExpanded, lastExpanded - 1]` instead? (In the case where the `eof` token 
is the only one in the range, we could return nullopt.)

This would have the advantage of gracefully handling AST nodes like this one 
whose end location is the `eof`, and return all the spelled tokens actually 
making up the node.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-08 Thread Ilya Biryukov via cfe-commits

ilya-biryukov wrote:

I second everything that @HighCommander4 said. It would be really useful to add 
an assertion that we never get out-of-bounds here, but this indicates a logic 
error somewhere in the code that should be fixed.

I am slightly suspicious of source locations pointing at `eof` in the AST in 
the first place, even in invalid code. I wonder if we would be better off just 
having an invalid source location there instead of pointing at `eof`.

Asserting that `tok::eof` should not be passed to `spelledForExpandedToken` and 
returning `nullopt` in `spelledForExpanded` looks like a reasonable workaround 
until we fix this. (The semantics looks reasonable as we don't really have a 
spelled token we can map `eof` back to).

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-08 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

Anyways, in a case like this, this preferable behaviour would be for 
`spelledForExpanded()` to return the spelled tokens covering `int main() {` 
(not including the `eof` token but including everything else), **not** nullopt.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2024-01-08 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

I poked at this a bit more. A few more notes:

 * Since the test case currently in the patch does not trigger the crash 
(without the fix), and it was not obvious to me what sort of call to 
`spelledForExpanded()` to formulate to trigger the crash, I instead formulated 
a test case in `DumpASTTests.cpp` that calls `dumpAST()` on the 
`TranslationUnitDecl` of a file containing the code `int main() {`. This does 
trigger the crash.
 * The crash occurs when dumping the `FunctionDecl` node for `main`. When 
dumping a node, `dumpAST()` 
[calls](https://searchfox.org/llvm/rev/fd1c156e5d6584684ce58c0536dca96cedcc41f0/clang-tools-extra/clangd/DumpAST.cpp#93)
 `Tokens.spelledForExpanded(Tokens.expandedTokens(SR))` where `SR` is the 
node's source range.
 * For this `FunctionDecl` node, the end location of its source range points to 
a `tok::eof` token.
* This comes from the end location of the function's `CompoundStmt` body, 
which is what the parser assigns in 
[Parser::ParseCompoundStatementBody()](https://searchfox.org/llvm/rev/fd1c156e5d6584684ce58c0536dca96cedcc41f0/clang/lib/Parse/ParseStmt.cpp#1292)
 in this situation.
 * The range of expanded tokens returned by `Tok.expandedTokens(SR)` thus 
includes as its last element the `tok::eof` token.
 * The 
[comment](https://searchfox.org/llvm/rev/fd1c156e5d6584684ce58c0536dca96cedcc41f0/clang/include/clang/Tooling/Syntax/Tokens.h#162)
 above `TokenBuffer` says "the expanded token stream has a `tok::eof` token at 
the end, the spelled tokens never store a 'eof' token".

I haven't dug into the implementation of `spelledForExpanded()` yet, but from 
the above it's looking to me like `spelledForExpanded()` is not expecting an 
input expanded token range that includes the `eof` token, but we are passing it 
one (and such a token range can legitimately arise from the source range of the 
AST node in cases of invalid code like this).

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-12-03 Thread Nathan Ridge via cfe-commits

HighCommander4 wrote:

I spent a bit more time looking at this. I'd say I'm still quite far from 
having a complete understanding of this code, but two things occur to me:

 1. `TokenBuffer` operates at the level of the preprocessor. 
"spelledForExpanded" is basically an operation which takes as input a range of 
tokens in the preprocessed token stream ("expanded tokens"), and maps them back 
to a range of tokens in the un-preprocessed token stream ("spelled tokens").
* The preprocessor has no notion of matching or balancing braces. Braces 
only become relevant later, during parsing.
* So, I don't understand why an input token stream that contains unbalanced 
braces would pose a problem for the "spelledForExpanded" operation.
* I'm hesitant to approve a patch that adds an early exit to 
"spelledForExpanded" to prevent an out-of-bounds array access, without 
understanding in more detail **how** the out-of-bounds array access arises / 
how it's connected to the unbalanced braces. It's possible that the 
out-of-bounds access is indicative of a deeper logic error which we'd be 
covering up.
 2. Since we've observed the problem arise during the `dumpAST` operation 
(frame 10 of the stack trace 
[here](https://github.com/clangd/clangd/issues/1559#issue-1646293852)), it 
would be good to (also) add a test that exercises the `dumpAST` operation as a 
whole.
* We have such tests in 
[DumpASTTests.cpp](https://searchfox.org/llvm/source/clang-tools-extra/clangd/unittests/DumpASTTests.cpp).
* I think it would be preferable if the AST dump for an input with 
unbalanced braces, such as `int main() {`, was **not** empty. (I suspect the 
current patch may make it empty.) `clang -Xclang -ast-dump` manages to print 
AST nodes in this case, I think so should we.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-11-28 Thread via cfe-commits

robozati wrote:

@HighCommander4, thank you for reviewing this PR!

Unfortunately, I’m currently in my finals and will only have time to look at 
this next week. So, I’ll update this later.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-11-27 Thread Nathan Ridge via cfe-commits


@@ -585,6 +585,17 @@ TEST_F(TokenCollectorTest, DelayedParsing) {
   EXPECT_THAT(collectAndDump(Code), StartsWith(ExpectedTokens));
 }
 
+TEST_F(TokenCollectorTest, UnclosedToken) {

HighCommander4 wrote:

Indeed, it looks like `TokenBuffer::spelledForExpanded()` (which is the only 
call site of `spelledForExpandedToken()`)  is not called by anything currently 
in this test.

To exercise `spelledForExpanded()`, we will need to make an explicit call to 
it, the way a number of other tests in this file do.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-11-26 Thread Nathan Ridge via cfe-commits


@@ -585,6 +585,17 @@ TEST_F(TokenCollectorTest, DelayedParsing) {
   EXPECT_THAT(collectAndDump(Code), StartsWith(ExpectedTokens));
 }
 
+TEST_F(TokenCollectorTest, UnclosedToken) {

HighCommander4 wrote:

I started looking at this to try and make some progress towards getting this 
landed. However, when running this test locally, I'm not seeing either of the 
branches added to `spelledForExpandedToken()` being taken.

Am I missing something, or is this test not exercising the intended codepath?

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

robozati wrote:

Rewrote the test and now it passes.

Sorry for pinging you @HighCommander4, but I don't know who to ping and you 
participated in that issue.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

https://github.com/robozati ready_for_review 
https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

https://github.com/robozati updated 
https://github.com/llvm/llvm-project/pull/69849

>From 239302d66dea74333e27288acbceffdf3f2d3b6b Mon Sep 17 00:00:00 2001
From: Gabriel Pezati 
Date: Sat, 21 Oct 2023 19:28:57 -0300
Subject: [PATCH] Fix UB on unclosed parentheses, brackets or braces in Tokens

---
 clang/lib/Tooling/Syntax/Tokens.cpp   | 25 ++-
 clang/unittests/Tooling/Syntax/TokensTest.cpp | 11 
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/clang/lib/Tooling/Syntax/Tokens.cpp 
b/clang/lib/Tooling/Syntax/Tokens.cpp
index 2f28b9cf286a638..97f794e09cb5590 100644
--- a/clang/lib/Tooling/Syntax/Tokens.cpp
+++ b/clang/lib/Tooling/Syntax/Tokens.cpp
@@ -286,9 +286,13 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
   });
   // Our token could only be produced by the previous mapping.
   if (It == File.Mappings.begin()) {
-// No previous mapping, no need to modify offsets.
-return {[ExpandedIndex - File.BeginExpanded],
-/*Mapping=*/nullptr};
+const auto offset = ExpandedIndex - File.BeginExpanded;
+// Bounds check so parsing an unclosed parenthesis, brackets or braces do
+// not result in UB.
+if (offset >= File.SpelledTokens.size()) {
+  return {nullptr, nullptr};
+}
+return {[offset], /*Mapping=*/nullptr};
   }
   --It; // 'It' now points to last mapping that started before our token.
 
@@ -298,9 +302,12 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
 
   // Not part of the mapping, use the index from previous mapping to compute 
the
   // corresponding spelled token.
-  return {
-  [It->EndSpelled + (ExpandedIndex - It->EndExpanded)],
-  /*Mapping=*/nullptr};
+  const auto offset = It->EndSpelled + (ExpandedIndex - It->EndExpanded);
+  // This index can also result in UB when parsing unclosed tokens.
+  if (offset >= File.SpelledTokens.size()) {
+return {nullptr, nullptr};
+  }
+  return {[offset], /*Mapping=*/nullptr};
 }
 
 const TokenBuffer::Mapping *
@@ -410,6 +417,12 @@ 
TokenBuffer::spelledForExpanded(llvm::ArrayRef Expanded) const {
   auto [FirstSpelled, FirstMapping] = spelledForExpandedToken(First);
   auto [LastSpelled, LastMapping] = spelledForExpandedToken(Last);
 
+  // If there was an unclosed token, FirstSpelled or LastSpelled is null and 
the
+  // operation shouldn't continue.
+  if (FirstSpelled == nullptr || LastSpelled == nullptr) {
+return std::nullopt;
+  };
+
   FileID FID = SourceMgr->getFileID(FirstSpelled->location());
   // FIXME: Handle multi-file changes by trying to map onto a common root.
   if (FID != SourceMgr->getFileID(LastSpelled->location()))
diff --git a/clang/unittests/Tooling/Syntax/TokensTest.cpp 
b/clang/unittests/Tooling/Syntax/TokensTest.cpp
index 0c08318a637c0b6..e1b053ce45b841b 100644
--- a/clang/unittests/Tooling/Syntax/TokensTest.cpp
+++ b/clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -585,6 +585,17 @@ TEST_F(TokenCollectorTest, DelayedParsing) {
   EXPECT_THAT(collectAndDump(Code), StartsWith(ExpectedTokens));
 }
 
+TEST_F(TokenCollectorTest, UnclosedToken) {
+  llvm::StringLiteral Code = R"cpp(
+int main() {
+// this should not result in a segfault or UB.
+  )cpp";
+  std::string ExpectedTokens =
+  "expanded tokens:\n  int main ( ) {\nfile './input.cpp'\n  spelled "
+  "tokens:\nint main ( ) {\n  no mappings.\n";
+  EXPECT_THAT(collectAndDump(Code), StartsWith(ExpectedTokens));
+}
+
 TEST_F(TokenCollectorTest, MultiFile) {
   addFile("./foo.h", R"cpp(
 #define ADD(X, Y) X+Y

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

https://github.com/robozati converted_to_draft 
https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

robozati wrote:

Tests are failing because of the `findExpanded` function, but it didn’t result 
in UB. Will fix the test later.

https://github.com/llvm/llvm-project/pull/69849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

https://github.com/robozati updated 
https://github.com/llvm/llvm-project/pull/69849

>From e57657cf296e1e93fa17f021bb8ce591071b2176 Mon Sep 17 00:00:00 2001
From: robozati <139823421+roboz...@users.noreply.github.com>
Date: Sat, 21 Oct 2023 12:04:14 -0300
Subject: [PATCH 1/2] Fix UB on unclosed parentheses, brackets or braces in
 Tokens

---
 clang/lib/Tooling/Syntax/Tokens.cpp   | 23 ++-
 clang/unittests/Tooling/Syntax/TokensTest.cpp |  6 +
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/clang/lib/Tooling/Syntax/Tokens.cpp 
b/clang/lib/Tooling/Syntax/Tokens.cpp
index 2f28b9cf286a638..e4c24aa1ef3fbf0 100644
--- a/clang/lib/Tooling/Syntax/Tokens.cpp
+++ b/clang/lib/Tooling/Syntax/Tokens.cpp
@@ -286,9 +286,13 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
   });
   // Our token could only be produced by the previous mapping.
   if (It == File.Mappings.begin()) {
-// No previous mapping, no need to modify offsets.
-return {[ExpandedIndex - File.BeginExpanded],
-/*Mapping=*/nullptr};
+const auto offset = ExpandedIndex - File.BeginExpanded;
+// Bounds check so parsing an unclosed parenthesis, brackets or braces do
+// not result in UB.
+if (offset >= File.SpelledTokens.size()) {
+  return {nullptr, nullptr};
+}
+return {[offset], /*Mapping=*/nullptr};
   }
   --It; // 'It' now points to last mapping that started before our token.
 
@@ -298,9 +302,12 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
 
   // Not part of the mapping, use the index from previous mapping to compute 
the
   // corresponding spelled token.
-  return {
-  [It->EndSpelled + (ExpandedIndex - It->EndExpanded)],
-  /*Mapping=*/nullptr};
+  const auto offset = It->EndSpelled + (ExpandedIndex - It->EndExpanded);
+  // This index can also result in UB when parsing unclosed tokens.
+  if (offset >= File.SpelledTokens.size()) {
+return {nullptr, nullptr};
+  }
+  return {[offset], /*Mapping=*/nullptr};
 }
 
 const TokenBuffer::Mapping *
@@ -410,6 +417,10 @@ 
TokenBuffer::spelledForExpanded(llvm::ArrayRef Expanded) const {
   auto [FirstSpelled, FirstMapping] = spelledForExpandedToken(First);
   auto [LastSpelled, LastMapping] = spelledForExpandedToken(Last);
 
+  if (FirstSpelled == nullptr || LastSpelled == nullptr) {
+return std::nullopt;
+  };
+
   FileID FID = SourceMgr->getFileID(FirstSpelled->location());
   // FIXME: Handle multi-file changes by trying to map onto a common root.
   if (FID != SourceMgr->getFileID(LastSpelled->location()))
diff --git a/clang/unittests/Tooling/Syntax/TokensTest.cpp 
b/clang/unittests/Tooling/Syntax/TokensTest.cpp
index 0c08318a637c0b6..44f0c57b3bad875 100644
--- a/clang/unittests/Tooling/Syntax/TokensTest.cpp
+++ b/clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -814,6 +814,12 @@ TEST_F(TokenBufferTest, SpelledByExpanded) {
   EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("good")),
   ValueIs(SameRange(findSpelled("good";
   EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("prev good")), 
std::nullopt);
+
+  // Unclosed parentheses, brackets or braces do not result in a UB
+  recordTokens(R"cpp(
+foo(x
+  )cpp");
+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("x")), std::nullopt);
 }
 
 TEST_F(TokenBufferTest, ExpandedTokensForRange) {

>From 474e43fdf6682f83e4a7839eabe2f4c2f5fc1b92 Mon Sep 17 00:00:00 2001
From: robozati <139823421+roboz...@users.noreply.github.com>
Date: Sat, 21 Oct 2023 12:58:43 -0300
Subject: [PATCH 2/2] Fix failing test

---
 clang/unittests/Tooling/Syntax/TokensTest.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/unittests/Tooling/Syntax/TokensTest.cpp 
b/clang/unittests/Tooling/Syntax/TokensTest.cpp
index 44f0c57b3bad875..2b7b1474228c4d0 100644
--- a/clang/unittests/Tooling/Syntax/TokensTest.cpp
+++ b/clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -819,7 +819,7 @@ TEST_F(TokenBufferTest, SpelledByExpanded) {
   recordTokens(R"cpp(
 foo(x
   )cpp");
-  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("x")), std::nullopt);
+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("foo(x)")), std::nullopt);
 }
 
 TEST_F(TokenBufferTest, ExpandedTokensForRange) {

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Bounds checking on unclosed parentheses, brackets or braces in Expanded Tokens (PR #69849)

2023-10-21 Thread via cfe-commits

https://github.com/robozati created 
https://github.com/llvm/llvm-project/pull/69849

When getting the AST from clangd, if there is an unclosed token, a segfault or 
an UB can happen when accessing `File.SpelledTokens` because the supposed 
closing token is not present, so its index is greater than the vec's size.

This fixes the issue by returning a `nullptr` on the function that caused the 
UB and then checking if the spelled token is null, returning `nullopt` for the 
whole expanded token.

Fixes https://github.com/clangd/clangd/issues/1559.

>From e57657cf296e1e93fa17f021bb8ce591071b2176 Mon Sep 17 00:00:00 2001
From: robozati <139823421+roboz...@users.noreply.github.com>
Date: Sat, 21 Oct 2023 12:04:14 -0300
Subject: [PATCH] Fix UB on unclosed parentheses, brackets or braces in Tokens

---
 clang/lib/Tooling/Syntax/Tokens.cpp   | 23 ++-
 clang/unittests/Tooling/Syntax/TokensTest.cpp |  6 +
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/clang/lib/Tooling/Syntax/Tokens.cpp 
b/clang/lib/Tooling/Syntax/Tokens.cpp
index 2f28b9cf286a638..e4c24aa1ef3fbf0 100644
--- a/clang/lib/Tooling/Syntax/Tokens.cpp
+++ b/clang/lib/Tooling/Syntax/Tokens.cpp
@@ -286,9 +286,13 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
   });
   // Our token could only be produced by the previous mapping.
   if (It == File.Mappings.begin()) {
-// No previous mapping, no need to modify offsets.
-return {[ExpandedIndex - File.BeginExpanded],
-/*Mapping=*/nullptr};
+const auto offset = ExpandedIndex - File.BeginExpanded;
+// Bounds check so parsing an unclosed parenthesis, brackets or braces do
+// not result in UB.
+if (offset >= File.SpelledTokens.size()) {
+  return {nullptr, nullptr};
+}
+return {[offset], /*Mapping=*/nullptr};
   }
   --It; // 'It' now points to last mapping that started before our token.
 
@@ -298,9 +302,12 @@ TokenBuffer::spelledForExpandedToken(const syntax::Token 
*Expanded) const {
 
   // Not part of the mapping, use the index from previous mapping to compute 
the
   // corresponding spelled token.
-  return {
-  [It->EndSpelled + (ExpandedIndex - It->EndExpanded)],
-  /*Mapping=*/nullptr};
+  const auto offset = It->EndSpelled + (ExpandedIndex - It->EndExpanded);
+  // This index can also result in UB when parsing unclosed tokens.
+  if (offset >= File.SpelledTokens.size()) {
+return {nullptr, nullptr};
+  }
+  return {[offset], /*Mapping=*/nullptr};
 }
 
 const TokenBuffer::Mapping *
@@ -410,6 +417,10 @@ 
TokenBuffer::spelledForExpanded(llvm::ArrayRef Expanded) const {
   auto [FirstSpelled, FirstMapping] = spelledForExpandedToken(First);
   auto [LastSpelled, LastMapping] = spelledForExpandedToken(Last);
 
+  if (FirstSpelled == nullptr || LastSpelled == nullptr) {
+return std::nullopt;
+  };
+
   FileID FID = SourceMgr->getFileID(FirstSpelled->location());
   // FIXME: Handle multi-file changes by trying to map onto a common root.
   if (FID != SourceMgr->getFileID(LastSpelled->location()))
diff --git a/clang/unittests/Tooling/Syntax/TokensTest.cpp 
b/clang/unittests/Tooling/Syntax/TokensTest.cpp
index 0c08318a637c0b6..44f0c57b3bad875 100644
--- a/clang/unittests/Tooling/Syntax/TokensTest.cpp
+++ b/clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -814,6 +814,12 @@ TEST_F(TokenBufferTest, SpelledByExpanded) {
   EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("good")),
   ValueIs(SameRange(findSpelled("good";
   EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("prev good")), 
std::nullopt);
+
+  // Unclosed parentheses, brackets or braces do not result in a UB
+  recordTokens(R"cpp(
+foo(x
+  )cpp");
+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("x")), std::nullopt);
 }
 
 TEST_F(TokenBufferTest, ExpandedTokensForRange) {

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits