[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-29 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX closed this revision.
ArcsinX added a comment.

Don't know why this didn't close automatically
Commit: https://reviews.llvm.org/rGd8ba6b4ab3eceb6bbcdf4371d4ffaab9d1a5cebe


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-29 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX updated this revision to Diff 294924.
ArcsinX added a comment.

Fix possible UB at bitwise shift.
WordGain => MaxDistance.
Fix LineMin and LineMax values.
Fix comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

Files:
  clang-tools-extra/clangd/XRefs.cpp
  clang-tools-extra/clangd/unittests/XRefsTests.cpp


Index: clang-tools-extra/clangd/unittests/XRefsTests.cpp
===
--- clang-tools-extra/clangd/unittests/XRefsTests.cpp
+++ clang-tools-extra/clangd/unittests/XRefsTests.cpp
@@ -1428,6 +1428,11 @@
 
 
   // h^i
+
+
+
+
+  int x = hi;
 )cpp",
   R"cpp(
   // prefer nearest occurrence even if several matched tokens
Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -562,19 +562,34 @@
   auto Cost = [&](SourceLocation Loc) -> unsigned {
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
-if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
-if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
-return 0;
+return Line >= WordLine ? Line - WordLine : 2 * (WordLine - Line);
   };
   const syntax::Token *BestTok = nullptr;
-  // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = -1;
+  // Search bounds are based on word length:
+  // - forward: 2^N lines
+  // - backward: 2^(N-1) lines.
+  unsigned MaxDistance =
+  1U << std::min(Word.Text.size(),
+   std::numeric_limits::digits - 1);
+  // Line number for SM.translateLineCol() should be one-based, also
+  // SM.translateLineCol() can handle line number greater than
+  // number of lines in the file.
+  // - LineMin = max(1, WordLine + 1 - 2^(N-1))
+  // - LineMax = WordLine + 1 + 2^N
+  unsigned LineMin =
+  WordLine + 1 <= MaxDistance / 2 ? 1 : WordLine + 1 - MaxDistance / 2;
+  unsigned LineMax = WordLine + 1 + MaxDistance;
+  SourceLocation LocMin = SM.translateLineCol(File, LineMin, 1);
+  assert(LocMin.isValid());
+  SourceLocation LocMax = SM.translateLineCol(File, LineMax, 1);
+  assert(LocMax.isValid());
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
 // No point guessing the same location we started with.


Index: clang-tools-extra/clangd/unittests/XRefsTests.cpp
===
--- clang-tools-extra/clangd/unittests/XRefsTests.cpp
+++ clang-tools-extra/clangd/unittests/XRefsTests.cpp
@@ -1428,6 +1428,11 @@
 
 
   // h^i
+
+
+
+
+  int x = hi;
 )cpp",
   R"cpp(
   // prefer nearest occurrence even if several matched tokens
Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -562,19 +562,34 @@
   auto Cost = [&](SourceLocation Loc) -> unsigned {
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
-if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
-if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
-return 0;
+return Line >= WordLine ? Line - WordLine : 2 * (WordLine - Line);
   };
   const syntax::Token *BestTok = nullptr;
-  // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = -1;
+  // Search bounds are based on word length:
+  // - forward: 2^N lines
+  // - backward: 2^(N-1) lines.
+  unsigned MaxDistance =
+  1U << std::min(Word.Text.size(),
+   std::numeric_limits::digits - 1);
+  // Line number for SM.translateLineCol() should be one-based, also
+  // SM.translateLineCol() can handle line number greater than
+  // number of lines in the file.
+  // - LineMin = max(1, WordLine + 1 - 2^(N-1))
+  // - LineMax = WordLine + 1 + 2^N
+  unsigned LineMin =
+  WordLine + 1 <= MaxDistance / 2 ? 1 : WordLine + 1 - MaxDistance / 2;
+  unsigned LineMax = WordLine + 1 + MaxDistance;
+  SourceLocation LocMin = SM.translateLineCol(File, LineMin, 1);
+  assert(LocMin.isValid());
+  SourceLocation LocMax = SM.translateLineCol(File, LineMax, 1);
+  assert(LocMax.isValid());
 
   // Updates BestTok and BestCos

[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-29 Thread Sam McCall via Phabricator via cfe-commits
sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.



Comment at: clang-tools-extra/clangd/XRefs.cpp:572
+  // - backward: 2^(N-1) lines.
+  unsigned WordGain = 1U << Word.Text.size();
+  // Line number for SM.translateLineCol() should be one-based, also

std::min(Word.Text.size(), numeric_limits::digits() - 1) to avoid UB 
:-(



Comment at: clang-tools-extra/clangd/XRefs.cpp:572
+  // - backward: 2^(N-1) lines.
+  unsigned WordGain = 1U << Word.Text.size();
+  // Line number for SM.translateLineCol() should be one-based, also

sammccall wrote:
> std::min(Word.Text.size(), numeric_limits::digits() - 1) to avoid 
> UB :-(
name WordGain is unclear to me: MaxDistance?



Comment at: clang-tools-extra/clangd/XRefs.cpp:574
+  // Line number for SM.translateLineCol() should be one-based, also
+  // SM.translateLineCol() cares about line number greater than
+  // number of lines in the file.

cares about -> can handle?



Comment at: clang-tools-extra/clangd/XRefs.cpp:578
+  // - LineMax = WordLine + 1 + 2^(N-1)
+  unsigned LineMin = WordLine + 1 <= WordGain ? 1 : WordLine + 1 - WordGain;
+  unsigned LineMax = WordLine + 1 + WordGain / 2;

I think this is backwards: min should divide wordgain by two, max should not?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-28 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX added inline comments.



Comment at: clang-tools-extra/clangd/XRefs.cpp:589
 // but not things like disabled preprocessor sections.
 if (!(tokenSpelledAt(Tok.location(), TB) || TB.expansionStartingAt(&Tok)))
   return false;

here



Comment at: clang-tools-extra/clangd/XRefs.cpp:580
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)

ArcsinX wrote:
> sammccall wrote:
> > ArcsinX wrote:
> > > sammccall wrote:
> > > > why can't this happen anymore?
> > > As I understand, we call `findNearbyIdentifier()` only when the word is 
> > > not an identifier. And `Tok` is always identifier here.
> > > 
> > > Btw, even if `findNearbyIdentifier()` could be called for an identifier, 
> > > I could not get why it's bad to return current position here.
> > > 
> > > Am I wrong?
> > > As I understand, we call findNearbyIdentifier() only when the word is not 
> > > an identifier
> > 
> > I'm not sure where you're getting that.
> > If you mean the bail-out inside the function itself:
> > 
> > ```
> >   // Don't use heuristics if this is a real identifier.
> >   // Unlikely identifiers are OK if they were used as identifiers nearby.
> >   if (Word.ExpandedToken)
> > return nullptr;
> > ```
> > 
> > then "real identifier" here means it's an identifier after preprocessing.
> > We're still getting here if it's e.g. part of a macro body, or code that's 
> > `#ifdef`'d out.
> > The spelled token in those cases is an identifier, there's just no 
> > corresponding expanded token.
> > 
> > (I'm surprised we don't have a test covering this though!)
> > 
> > > Btw, even if findNearbyIdentifier() could be called for an identifier, I 
> > > could not get why it's bad to return current position here.
> > 
> > The point of this function is that *this* occurrence of the word can't be 
> > resolved, so let's try another one nearby.
> > If we just return the same occurrence, then we're certainly not going to be 
> > able to resolve it.
> Thanks for your clarification. I will revert this change (and will try to add 
> a test as well).
I failed to create a test for this.
I can create a test for identifiers in a macro body  or under `#ifdef`, but 
such a test passes without ` if (Tok.location() == Word.Location)` because of 
this check:
```
if (!(tokenSpelledAt(Tok.location(), TB) || TB.expansionStartingAt(&Tok)))
  return false;
```



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-28 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX updated this revision to Diff 294616.
ArcsinX added a comment.

Use SourceManager::translateLineCol(), code simplifications.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

Files:
  clang-tools-extra/clangd/XRefs.cpp
  clang-tools-extra/clangd/unittests/XRefsTests.cpp


Index: clang-tools-extra/clangd/unittests/XRefsTests.cpp
===
--- clang-tools-extra/clangd/unittests/XRefsTests.cpp
+++ clang-tools-extra/clangd/unittests/XRefsTests.cpp
@@ -1428,6 +1428,9 @@
 
 
   // h^i
+
+
+  int x = hi;
 )cpp",
   R"cpp(
   // prefer nearest occurrence even if several matched tokens
Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -562,19 +562,31 @@
   auto Cost = [&](SourceLocation Loc) -> unsigned {
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
-if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
-if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
-return 0;
+return Line >= WordLine ? Line - WordLine : 2 * (WordLine - Line);
   };
   const syntax::Token *BestTok = nullptr;
-  // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = -1;
+  // Search bounds are based on word length:
+  // - forward: 2^N lines
+  // - backward: 2^(N-1) lines.
+  unsigned WordGain = 1U << Word.Text.size();
+  // Line number for SM.translateLineCol() should be one-based, also
+  // SM.translateLineCol() cares about line number greater than
+  // number of lines in the file.
+  // - LineMin = max(1, WordLine + 1 - 2^N)
+  // - LineMax = WordLine + 1 + 2^(N-1)
+  unsigned LineMin = WordLine + 1 <= WordGain ? 1 : WordLine + 1 - WordGain;
+  unsigned LineMax = WordLine + 1 + WordGain / 2;
+  SourceLocation LocMin = SM.translateLineCol(File, LineMin, 1);
+  assert(LocMin.isValid());
+  SourceLocation LocMax = SM.translateLineCol(File, LineMax, 1);
+  assert(LocMax.isValid());
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
 // No point guessing the same location we started with.


Index: clang-tools-extra/clangd/unittests/XRefsTests.cpp
===
--- clang-tools-extra/clangd/unittests/XRefsTests.cpp
+++ clang-tools-extra/clangd/unittests/XRefsTests.cpp
@@ -1428,6 +1428,9 @@
 
 
   // h^i
+
+
+  int x = hi;
 )cpp",
   R"cpp(
   // prefer nearest occurrence even if several matched tokens
Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -562,19 +562,31 @@
   auto Cost = [&](SourceLocation Loc) -> unsigned {
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
-if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
-if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
-return 0;
+return Line >= WordLine ? Line - WordLine : 2 * (WordLine - Line);
   };
   const syntax::Token *BestTok = nullptr;
-  // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = -1;
+  // Search bounds are based on word length:
+  // - forward: 2^N lines
+  // - backward: 2^(N-1) lines.
+  unsigned WordGain = 1U << Word.Text.size();
+  // Line number for SM.translateLineCol() should be one-based, also
+  // SM.translateLineCol() cares about line number greater than
+  // number of lines in the file.
+  // - LineMin = max(1, WordLine + 1 - 2^N)
+  // - LineMax = WordLine + 1 + 2^(N-1)
+  unsigned LineMin = WordLine + 1 <= WordGain ? 1 : WordLine + 1 - WordGain;
+  unsigned LineMax = WordLine + 1 + WordGain / 2;
+  SourceLocation LocMin = SM.translateLineCol(File, LineMin, 1);
+  assert(LocMin.isValid());
+  SourceLocation LocMax = SM.translateLineCol(File, LineMax, 1);
+  assert(LocMax.isValid());
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, brea

[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-24 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX added inline comments.



Comment at: clang-tools-extra/clangd/XRefs.cpp:580
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)

sammccall wrote:
> ArcsinX wrote:
> > sammccall wrote:
> > > why can't this happen anymore?
> > As I understand, we call `findNearbyIdentifier()` only when the word is not 
> > an identifier. And `Tok` is always identifier here.
> > 
> > Btw, even if `findNearbyIdentifier()` could be called for an identifier, I 
> > could not get why it's bad to return current position here.
> > 
> > Am I wrong?
> > As I understand, we call findNearbyIdentifier() only when the word is not 
> > an identifier
> 
> I'm not sure where you're getting that.
> If you mean the bail-out inside the function itself:
> 
> ```
>   // Don't use heuristics if this is a real identifier.
>   // Unlikely identifiers are OK if they were used as identifiers nearby.
>   if (Word.ExpandedToken)
> return nullptr;
> ```
> 
> then "real identifier" here means it's an identifier after preprocessing.
> We're still getting here if it's e.g. part of a macro body, or code that's 
> `#ifdef`'d out.
> The spelled token in those cases is an identifier, there's just no 
> corresponding expanded token.
> 
> (I'm surprised we don't have a test covering this though!)
> 
> > Btw, even if findNearbyIdentifier() could be called for an identifier, I 
> > could not get why it's bad to return current position here.
> 
> The point of this function is that *this* occurrence of the word can't be 
> resolved, so let's try another one nearby.
> If we just return the same occurrence, then we're certainly not going to be 
> able to resolve it.
Thanks for your clarification. I will revert this change (and will try to add a 
test as well).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-24 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

In D87891#2291838 , @ArcsinX wrote:

> Thanks for your reply, I will rethink this patch.

FWIW I think if we drop most of the math in favor of using SourceManager's line 
translation, this doesn't add much complexity and is probably a bit more direct 
and efficient.




Comment at: clang-tools-extra/clangd/XRefs.cpp:580
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)

ArcsinX wrote:
> sammccall wrote:
> > why can't this happen anymore?
> As I understand, we call `findNearbyIdentifier()` only when the word is not 
> an identifier. And `Tok` is always identifier here.
> 
> Btw, even if `findNearbyIdentifier()` could be called for an identifier, I 
> could not get why it's bad to return current position here.
> 
> Am I wrong?
> As I understand, we call findNearbyIdentifier() only when the word is not an 
> identifier

I'm not sure where you're getting that.
If you mean the bail-out inside the function itself:

```
  // Don't use heuristics if this is a real identifier.
  // Unlikely identifiers are OK if they were used as identifiers nearby.
  if (Word.ExpandedToken)
return nullptr;
```

then "real identifier" here means it's an identifier after preprocessing.
We're still getting here if it's e.g. part of a macro body, or code that's 
`#ifdef`'d out.
The spelled token in those cases is an identifier, there's just no 
corresponding expanded token.

(I'm surprised we don't have a test covering this though!)

> Btw, even if findNearbyIdentifier() could be called for an identifier, I 
> could not get why it's bad to return current position here.

The point of this function is that *this* occurrence of the word can't be 
resolved, so let's try another one nearby.
If we just return the same occurrence, then we're certainly not going to be 
able to resolve it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-24 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX added a comment.

In D87891#2291760 , @kadircet wrote:

> The biggest problem I see is, this is not changing the fact that we are still 
> traversing the whole file:
>
> - You do one traversal over the whole file to find `FileLines`
> - Then possibly two more to find `OffsetMin` and `OffsetMax`

Seems you are right, but we do not compare strings during traversal to find 
`FileLines`.

> You can get rid of the first one by just using `getLocForEndOfFile` if 
> `positionToOffset` fails.
> For the latter, you can use `SourceManager::translateLineCol` instead, it 
> uses a cache and cheaper than the linear scan performed by `positionToOffset` 
> in case of multiple calls. The cache is already populated once we issue the 
> first `Cost` call, through `SM.getSpellingLineNumber`
>
> I was reluctant overall as the wins aren't clear. We are still going to 
> traverse the whole file at least once, and readability is hindered but I 
> suppose with the above mentioned two trics, runtime shouldn't be regressed.
> Also it is nice to get rid of 2 log calls for each token. Again all of these 
> feels like some micro optimizations that aren't justified.

Thanks for your reply, I will rethink this patch.




Comment at: clang-tools-extra/clangd/XRefs.cpp:568
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;

sammccall wrote:
> this has changed the relative ordering, if we're dropping the log then +1 
> should now become multiplication by two.
> (Or was this intended?)
Yes, you are right, my fault here.

But should we penalize backwards so hard?



Comment at: clang-tools-extra/clangd/XRefs.cpp:580
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)

sammccall wrote:
> why can't this happen anymore?
As I understand, we call `findNearbyIdentifier()` only when the word is not an 
identifier. And `Tok` is always identifier here.

Btw, even if `findNearbyIdentifier()` could be called for an identifier, I 
could not get why it's bad to return current position here.

Am I wrong?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-23 Thread Sam McCall via Phabricator via cfe-commits
sammccall added inline comments.



Comment at: clang-tools-extra/clangd/XRefs.cpp:565
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
+  return 1 + Line - WordLine;

Since costs are only compared, we can simplify this:

return (Line >= WordLine) ? (Line - WordLine) : 2 * (WordLine - Line)



Comment at: clang-tools-extra/clangd/XRefs.cpp:568
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;

this has changed the relative ordering, if we're dropping the log then +1 
should now become multiplication by two.
(Or was this intended?)



Comment at: clang-tools-extra/clangd/XRefs.cpp:573
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = 1 << std::min(Word.Text.size(), sizeof(unsigned) * 8 - 
1);
+  auto Code = SM.getBuffer(File)->getBuffer();

The initial value of BestCost was chosen so that no-result would be worse than 
any cost in the eligible range, but better than any cost outside it.
If you're going to truncate the search region, you can just initialize to -1 
(i.e. max).

(FWIW, I guess `sizeof(unsigned) * 8 - 1` -> 
`std::numeric_limits::digits - 1` would be more obvious)




Comment at: clang-tools-extra/clangd/XRefs.cpp:575
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  auto TruncUnsigned = [](unsigned U) {

I think simplest is to use SourceMgr's line table.

```
int MinLine = (signed)Line - Word.Text.size()/2 , MaxLine = Line + 
Word.Text.size();
SourceLocation LocMin = SM.translateLineCol(File, std::max(1, MinLine), 1);
SourceLocation LocMax = SM.translateLineCol(File, MaxLine); // past-end ok
```



Comment at: clang-tools-extra/clangd/XRefs.cpp:580
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)

why can't this happen anymore?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-23 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added a comment.

Hey! Sorry for the late reply, this has been open in my tabs since day 1 just 
didn't get a chance to take a look at it.

The biggest problem I see is, this is not changing the fact that we are still 
traversing the whole file:

- You do one traversal over the whole file to find `FileLines`
- Then possibly two more to find `OffsetMin` and `OffsetMax`

You can get rid of the first one by just using `getLocForEndOfFile` if 
`positionToOffset` fails.
For the latter, you can use `SourceManager::translateLineCol` instead, it uses 
a cache and cheaper than the linear scan performed by `positionToOffset` in 
case of multiple calls. The cache is already populated once we issue the first 
`Cost` call, through `SM.getSpellingLineNumber`

I was reluctant overall as the wins aren't clear. We are still going to 
traverse the whole file at least once, and readability is hindered but I 
suppose with the above mentioned two trics, runtime shouldn't be regressed.
Also it is nice to get rid of 2 log calls for each token. Again all of these 
feels like some micro optimizations that aren't justified.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-23 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX added a comment.

I feel like I'm doing something totally wrong here :)
Could someone give me an advice?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-23 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX updated this revision to Diff 293665.
ArcsinX added a comment.

std::pow => bitwise shift.  
Take care about integers overflow.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

Files:
  clang-tools-extra/clangd/XRefs.cpp


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,49 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = 1 << std::min(Word.Text.size(), sizeof(unsigned) * 8 - 
1);
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  auto TruncUnsigned = [](unsigned U) {
+return std::min(U, std::numeric_limits::max());
+  };
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  unsigned LineMin =
+  TruncUnsigned(WordLine < BestCost ? 0 : WordLine - BestCost);
+  auto OffsetMin = positionToOffset(Code, {static_cast(LineMin), 0},
+/*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  unsigned LineMax = TruncUnsigned(
+  std::min(WordLine > std::numeric_limits::max() - BestCost - 1
+   ? std::numeric_limits::max()
+   : WordLine + BestCost + 1,
+   FileLines));
+  auto OffsetMax = positionToOffset(Code, {static_cast(LineMax), 0},
+/*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMax);
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)
-  return false;
+
 // We've done cheap checks, compute cost so we can break the caller's loop.
 unsigned TokCost = Cost(Tok.location());
 if (TokCost >= BestCost)


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,49 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = 1 << std::min(Word.Text.size(), sizeof(unsigned) * 8 - 1);
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  auto TruncUnsigned = [](unsigned U) {
+return std::min(U, std::numeric_limits::max());
+  };
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  unsigned LineMin =
+  TruncUnsigned(WordLine < BestCost ? 0 : WordLine - BestCost);
+  auto OffsetMin = positionToOffset(Code, {static_cast(LineMin), 0},
+/*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  unsigned LineMax = TruncUnsigned(
+  std::min(WordLine > std::numeric_limits::max() - BestCost - 1
+   ? std::numeric_limits::max()
+   : WordLine + BestCost + 1,
+   FileLines));
+  auto OffsetMax = positionToOffset(Code, {static_cast(LineMax), 0},
+/*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*Offset

[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-18 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX updated this revision to Diff 292739.
ArcsinX added a comment.

Fix format


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87891/new/

https://reviews.llvm.org/D87891

Files:
  clang-tools-extra/clangd/XRefs.cpp


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,46 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = std::pow(2, Word.Text.size());
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  auto OffsetMin = positionToOffset(
+  Code,
+  {static_cast(WordLine < BestCost ? 0 : WordLine - BestCost), 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  auto OffsetMax =
+  positionToOffset(Code,
+   {static_cast(WordLine + BestCost + 1 < FileLines
+ ? WordLine + BestCost + 1
+ : FileLines),
+0},
+   /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMax);
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)
-  return false;
+
 // We've done cheap checks, compute cost so we can break the caller's loop.
 unsigned TokCost = Cost(Tok.location());
 if (TokCost >= BestCost)


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,46 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = std::pow(2, Word.Text.size());
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  auto OffsetMin = positionToOffset(
+  Code,
+  {static_cast(WordLine < BestCost ? 0 : WordLine - BestCost), 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  auto OffsetMax =
+  positionToOffset(Code,
+   {static_cast(WordLine + BestCost + 1 < FileLines
+ ? WordLine + BestCost + 1
+ : FileLines),
+0},
+   /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMax);
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.

[PATCH] D87891: [clangd] findNearbyIdentifier(): guaranteed to give up after 2^N lines

2020-09-18 Thread Aleksandr Platonov via Phabricator via cfe-commits
ArcsinX created this revision.
Herald added subscribers: cfe-commits, usaxena95, arphaman, jkorous.
Herald added a project: clang.
ArcsinX requested review of this revision.
Herald added subscribers: MaskRay, ilya-biryukov.

As @kadircet mentions in D84912#2184144 
, `findNearbyIdentifier()` traverses 
the whole file if there is no identifier for the word.
This patch ensures give up after 2^N lines in any case.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D87891

Files:
  clang-tools-extra/clangd/XRefs.cpp


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,45 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = std::pow(2, Word.Text.size());
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  auto OffsetMin = positionToOffset(
+  Code,
+  {static_cast(WordLine < BestCost ? 0 : WordLine - BestCost), 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  auto OffsetMax = positionToOffset(
+  Code, {static_cast(WordLine + BestCost + 1 < FileLines
+  ? WordLine + BestCost + 1
+  : FileLines),
+ 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMax);
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word.Text))
   return false;
-// No point guessing the same location we started with.
-if (Tok.location() == Word.Location)
-  return false;
+
 // We've done cheap checks, compute cost so we can break the caller's loop.
 unsigned TokCost = Cost(Tok.location());
 if (TokCost >= BestCost)


Index: clang-tools-extra/clangd/XRefs.cpp
===
--- clang-tools-extra/clangd/XRefs.cpp
+++ clang-tools-extra/clangd/XRefs.cpp
@@ -563,23 +563,45 @@
 assert(SM.getFileID(Loc) == File && "spelled token in wrong file?");
 unsigned Line = SM.getSpellingLineNumber(Loc);
 if (Line > WordLine)
-  return 1 + llvm::Log2_64(Line - WordLine);
+  return 1 + Line - WordLine;
 if (Line < WordLine)
-  return 2 + llvm::Log2_64(WordLine - Line);
+  return 2 + WordLine - Line;
 return 0;
   };
   const syntax::Token *BestTok = nullptr;
   // Search bounds are based on word length: 2^N lines forward.
-  unsigned BestCost = Word.Text.size() + 1;
+  unsigned BestCost = std::pow(2, Word.Text.size());
+  auto Code = SM.getBuffer(File)->getBuffer();
+  unsigned FileLines = std::count(Code.begin(), Code.end(), '\n');
+  // Min position:
+  // - Line: max(0, WordLine - 2^N)
+  // - Column: 0
+  auto OffsetMin = positionToOffset(
+  Code,
+  {static_cast(WordLine < BestCost ? 0 : WordLine - BestCost), 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMin);
+  auto LocMin = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMin);
+  // Max position:
+  // - Line: min(, WordLine + 2^N + 1)
+  // - Column: 0
+  auto OffsetMax = positionToOffset(
+  Code, {static_cast(WordLine + BestCost + 1 < FileLines
+  ? WordLine + BestCost + 1
+  : FileLines),
+ 0},
+  /*AllowColumnBeyondLineLength=*/true);
+  assert(OffsetMax);
+  auto LocMax = SM.getLocForStartOfFile(File).getLocWithOffset(*OffsetMax);
 
   // Updates BestTok and BestCost if Tok is a good candidate.
   // May return true if the cost is too high for this token.
   auto Consider = [&](const syntax::Token &Tok) {
+if (Tok.location() < LocMin || Tok.location() > LocMax)
+  return true; // we are too far from the word, break the outer loop.
 if (!(Tok.kind() ==