[PATCH] D93873: [clangd] Cache preambles of closed files

2021-01-12 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added a comment.

In D93873#2490314 , @sammccall wrote:

> 1. a *persistent* cache so closing+reopening clangd loses less state. (This 
> is complicated because only the PCH is easily serializable, the rest of the 
> PreambleData struct isn't)
> 2. building caches of preambles while background-indexing (this would be 
> great for modules but is probably way too big for whole preambles)
> 3. reusing the "wrong" preamble initially when you open a new file, to give 
> some basic functionality (using existing preamble patching logic, just in a 
> more aggressive scenario)
> 4. having the disk-based storage unlink the file preemptively, to eliminate 
> any chance of leaking the *.pch

It feels like a mixture of 1 and 3 is going to provide the most value for 
decreasing time until semantic features (but I might be a little biased :D, 
also we might hit a nice sweet spot with pseudoparsing too).
I don't think having a cache for previously built preambles will ever be 
enough. As Sam pointed out, scaling is one of the biggest problems, as I don't 
think it would be feasible to have tens of preambles lying around on the disk, 
especially when they are costly the built (as it implies increased size).
Surely it optimizes the case of users working on a small set of files but 
frequently closes and re-opens them. But that's just one use case, it is also 
quite common to open tens of library headers while investigating an issue, or 
trying to understand details of some code through chains of go-to-definitions.
Users won't have any preambles for a while on those files and even after 
building the preamble they'll just be sitting in the cache probably only to be 
evicted.

So I think having a cache of preambles while optimizing for reusability (by 
keeping a small set of preambles that cover different set of files, as we can't 
use a preamble for a source file if it covers the source file in question) and 
then patching those to be applicable for current file at hand sounds like a 
better compromise. Surely it won't be as effective for frequent close/re-open 
use case, but I think the costs of such a cache isn't justified if it is only 
applicable to a single workflow.

As for mixing idea-1 into the equation, all of these will require clangd to do 
the work from scratch per instance, if we can have some sort of persistent 
on-disk cache, we can both share the work (and associated storage costs) across 
clangd instances and ensure clangd is also responsive even at startup without 
requiring user to build a bunch of preambles with every new clangd instance 
first.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93873: [clangd] Cache preambles of closed files

2021-01-11 Thread Quentin Chateau via Phabricator via cfe-commits
qchateau added a comment.

Hey, don't worry about the delay, I won't have as much time on my hands anymore 
anyway.

- we can definitely make this opt-in
- MB instead of # is an idea, it's probably closer to what the user want to 
configure - if he does - but it would also probably give a worse default value. 
Your call !
- I mean why not, but this diff was more about bringing the feature in than 
making it perfect. The heuristic to decide how to evict preambles can be made 
very complicated if we want to optimize it
- we can do as you suggest, it related to the previous point
- Ah ! What would you recommend ? No forget it, I'll just make it opt-it for now

I've also tried to keep the ASTWorkers alive in a different branch. The result 
is very different: ASTWorkers always use RAM so we can't keep as many, but 
keeping them alive makes is even better (15s with no cache, 2s with preamble 
cache, virtually 0s with ASTWorker cache). I'd say both features are nice but 
they give different results for a different cost.
Also it would be great to stop the ASTWorker threads when they are just in 
cache, but the class needs a rework to be able to stop/restart it


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93873: [clangd] Cache preambles of closed files

2021-01-11 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a reviewer: kadircet.
sammccall added a comment.

Thanks! This is a really good idea, but not without its risks :-) Sorry for not 
getting to this for a while!

My only *really* high-level design question is: I wonder what the tradeoffs are 
in keeping just the preamble vs the whole ASTWorker. I'd *expect* this approach 
gets almost all the benefit without complicating the basic assumptions around 
ASTWorker, so it's probably the right way to go. Haven't thought about it much 
though...

---

I've got a bit of cognitive load because there's a few preamble-related ideas 
we've had, and I'm trying to work out how they interact. Let's start by writing 
them down...

1. a *persistent* cache so closing+reopening clangd loses less state. (This is 
complicated because only the PCH is easily serializable, the rest of the 
PreambleData struct isn't)
2. building caches of preambles while background-indexing (this would be great 
for modules but is probably way too big for whole preambles)
3. reusing the "wrong" preamble initially when you open a new file, to give 
some basic functionality (using existing preamble patching logic, just in a 
more aggressive scenario)
4. having the disk-based storage unlink the file preemptively, to eliminate any 
chance of leaking the *.pch

1&2 are both big projects, so I'm not really worried about the overlap or at 
least not going to try to predict how it'd be resolved.

3 is actually made *easier* by this patch I think: having a central repository 
of preambles makes it more natural to grab some "other" preamble.

4 I think doesn't really interact at all: the PreambleData object would own a 
file descriptor instead of a file, but it's all much the same.

So I think in summary there's not really any conflict to resolve with these 
ideas. cc @kadircet though who's done more thinking about #1 and #3 than I have.

---

I think we need to be fairly careful about policies around this cache. 
Preambles are large (we frequently see several hundred MB). Some workflows 
involve opening many files at a time. Some workflows end up running multiple 
copies of clangd on the same files. Some configurations keep them in memory 
rather than on disk. So a too-large cache could waste quite a lot of resources.

So, some pointy questions:

- landing so close to the 12 release, should we conservatively default this to 
0, and require opt-in?
- is MB or #preambles a better limit to the cache size?
- should we take size into account when deciding what to evict? (my guess is 
no, cost scales with size, and value scales with size * probability of reuse, 
so we should purely optimize for probability of reuse)
- can we do better than LRU? The cache is accessed so infrequently and misses 
are so horrendously expensive that we could certainly affort to e.g. track 
usage history of every file ever opened, if it would help performance and not 
add too much complexity.
- not a question, but I can say for sure that 1000 with no size limit isn't a 
safe default for disk :-(




Comment at: clang-tools-extra/clangd/TUScheduler.cpp:193
+  /// Get the preamble associated with a \p Key, removing
+  /// it from the cache
+  std::shared_ptr take(llvm::StringRef Key) {

we might instead consider keeping "active" preambles in the cache, and simply 
considering their cost to be 0/ineligible for eviction if the 
shared_ptr::usage_count > 1. 
This allows this cache to be a "registry" so we can try using a preamble from a 
different TU as mentioned above.

(but this could also be done later, or could be done with a separate table of 
weak_ptrs. No change needed for this patch, just thinking out loud)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93873: [clangd] Cache preambles of closed files

2021-01-09 Thread Quentin Chateau via Phabricator via cfe-commits
qchateau added a comment.

New year's ping ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93873: [clangd] Cache preambles of closed files

2020-12-30 Thread Quentin Chateau via Phabricator via cfe-commits
qchateau updated this revision to Diff 314152.
qchateau added a comment.

- Fix keep preamble command line option
- Fix tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

Files:
  clang-tools-extra/clangd/ClangdServer.cpp
  clang-tools-extra/clangd/ClangdServer.h
  clang-tools-extra/clangd/TUScheduler.cpp
  clang-tools-extra/clangd/TUScheduler.h
  clang-tools-extra/clangd/test/memory_tree.test
  clang-tools-extra/clangd/tool/ClangdMain.cpp
  clang-tools-extra/clangd/unittests/ClangdTests.cpp

Index: clang-tools-extra/clangd/unittests/ClangdTests.cpp
===
--- clang-tools-extra/clangd/unittests/ClangdTests.cpp
+++ clang-tools-extra/clangd/unittests/ClangdTests.cpp
@@ -1256,7 +1256,7 @@
   MemoryTree MT(&Alloc);
   Server.profile(MT);
   ASSERT_TRUE(MT.children().count("tuscheduler"));
-  EXPECT_TRUE(MT.child("tuscheduler").children().count(FooCpp));
+  EXPECT_TRUE(MT.child("tuscheduler").child("files").children().count(FooCpp));
 }
 } // namespace
 } // namespace clangd
Index: clang-tools-extra/clangd/tool/ClangdMain.cpp
===
--- clang-tools-extra/clangd/tool/ClangdMain.cpp
+++ clang-tools-extra/clangd/tool/ClangdMain.cpp
@@ -347,6 +347,16 @@
 init(getDefaultAsyncThreadsCount()),
 };
 
+// Magic value to know when the user does not specify a value
+constexpr size_t DefaultKeepPreambleValue = std::numeric_limits::max();
+constexpr size_t DefaultKeepPreambleMemory = 10;
+constexpr size_t DefaultKeepPreambleDisk = 1000;
+opt KeepPreambles{
+"keep-preambles", cat(Misc),
+desc("Number of preambles of closed files that clangd will keep in cache.\n"
+ "Note that preambles may be stored in memory or in disk."),
+init(DefaultKeepPreambleValue)};
+
 opt IndexFile{
 "index-file",
 cat(Misc),
@@ -821,6 +831,13 @@
 Opts.StaticIndex = PAI.get();
   }
   Opts.AsyncThreadsCount = WorkerThreadsCount;
+
+  if (KeepPreambles == DefaultKeepPreambleValue) // User did not specify a value
+Opts.KeepPreambles = Opts.StorePreamblesInMemory ? DefaultKeepPreambleMemory
+ : DefaultKeepPreambleDisk;
+  else
+Opts.KeepPreambles = KeepPreambles;
+
   Opts.BuildRecoveryAST = RecoveryAST;
   Opts.PreserveRecoveryASTType = RecoveryASTType;
   Opts.FoldingRanges = FoldingRanges;
Index: clang-tools-extra/clangd/test/memory_tree.test
===
--- clang-tools-extra/clangd/test/memory_tree.test
+++ clang-tools-extra/clangd/test/memory_tree.test
@@ -57,20 +57,32 @@
 # CHECK-NEXT: }
 # CHECK-NEXT:   },
 # CHECK-NEXT:   "tuscheduler": {
-# CHECK-NEXT: "{{.*}}main.cpp": {
+# CHECK-NEXT: "_self": {{[0-9]+}},
+# CHECK-NEXT: "_total": {{[0-9]+}},
+# CHECK-NEXT: "cache": {
 # CHECK-NEXT:   "_self": {{[0-9]+}},
 # CHECK-NEXT:   "_total": {{[0-9]+}},
-# CHECK-NEXT:   "ast": {
-# CHECK-NEXT: "_self": {{[0-9]+}},
-# CHECK-NEXT: "_total": {{[0-9]+}}
-# CHECK-NEXT:   },
-# CHECK-NEXT:   "preamble": {
+# CHECK-NEXT:   "containers": {
 # CHECK-NEXT: "_self": {{[0-9]+}},
 # CHECK-NEXT: "_total": {{[0-9]+}}
 # CHECK-NEXT:   }
 # CHECK-NEXT: },
-# CHECK-NEXT: "_self": {{[0-9]+}},
-# CHECK-NEXT: "_total": {{[0-9]+}}
+# CHECK-NEXT: "files": {
+# CHECK-NEXT:   "{{.*}}main.cpp": {
+# CHECK-NEXT: "_self": {{[0-9]+}},
+# CHECK-NEXT: "_total": {{[0-9]+}},
+# CHECK-NEXT: "ast": {
+# CHECK-NEXT:   "_self": {{[0-9]+}},
+# CHECK-NEXT:   "_total": {{[0-9]+}}
+# CHECK-NEXT: },
+# CHECK-NEXT: "preamble": {
+# CHECK-NEXT:   "_self": {{[0-9]+}},
+# CHECK-NEXT:   "_total": {{[0-9]+}}
+# CHECK-NEXT: }
+# CHECK-NEXT:   },
+# CHECK-NEXT:   "_self": {{[0-9]+}},
+# CHECK-NEXT:   "_total": {{[0-9]+}}
+# CHECK-NEXT: }
 # CHECK-NEXT:   }
 # CHECK-NEXT: }
 # CHECK-NEXT:   }
Index: clang-tools-extra/clangd/TUScheduler.h
===
--- clang-tools-extra/clangd/TUScheduler.h
+++ clang-tools-extra/clangd/TUScheduler.h
@@ -197,6 +197,10 @@
 /// No-op if AsyncThreadsCount is 0.
 bool AsyncPreambleBuilds = true;
 
+// The number of preambles that will be retained even after the file is
+// closed
+size_t KeepPreambles = 0;
+
 /// Used to create a context that wraps each single operation.
 /// Typically to inject per-file configuration.
 /// If the path is empty, context sholud be "generic".
@@ -305,6 +309,9 @@
   /// an LRU cache.
   class ASTCache;
 
+  /// Responsible for retaining preambles.
+

[PATCH] D93873: [clangd] Cache preambles of closed files

2020-12-28 Thread Quentin Chateau via Phabricator via cfe-commits
qchateau created this revision.
qchateau added a reviewer: sammccall.
Herald added subscribers: usaxena95, kadircet, arphaman, javed.absar.
qchateau requested review of this revision.
Herald added subscribers: cfe-commits, MaskRay, ilya-biryukov.
Herald added a project: clang.

When a file is closed, push its preamble to a LRU cache
When a file is opened, try to get the preamble from the LRU cache.

By default store 10 preambles if they are stored on
memory and 1000 if they are stored on disk. That value
can be modified with --keep-preambles=N


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D93873

Files:
  clang-tools-extra/clangd/ClangdServer.cpp
  clang-tools-extra/clangd/ClangdServer.h
  clang-tools-extra/clangd/TUScheduler.cpp
  clang-tools-extra/clangd/TUScheduler.h
  clang-tools-extra/clangd/tool/ClangdMain.cpp
  clang-tools-extra/clangd/unittests/ClangdTests.cpp

Index: clang-tools-extra/clangd/unittests/ClangdTests.cpp
===
--- clang-tools-extra/clangd/unittests/ClangdTests.cpp
+++ clang-tools-extra/clangd/unittests/ClangdTests.cpp
@@ -1256,7 +1256,7 @@
   MemoryTree MT(&Alloc);
   Server.profile(MT);
   ASSERT_TRUE(MT.children().count("tuscheduler"));
-  EXPECT_TRUE(MT.child("tuscheduler").children().count(FooCpp));
+  EXPECT_TRUE(MT.child("tuscheduler").child("files").children().count(FooCpp));
 }
 } // namespace
 } // namespace clangd
Index: clang-tools-extra/clangd/tool/ClangdMain.cpp
===
--- clang-tools-extra/clangd/tool/ClangdMain.cpp
+++ clang-tools-extra/clangd/tool/ClangdMain.cpp
@@ -347,6 +347,15 @@
 init(getDefaultAsyncThreadsCount()),
 };
 
+constexpr size_t DefaultKeepPreambleMemory = 10;
+constexpr size_t DefaultKeepPreambleDisk = 1000;
+opt> KeepPreambles{
+"keep-preambles",
+cat(Misc),
+desc("Number of preambles of closed files that clangd will keep in cache.\n"
+ "Note that preambles may be stored in memory or in disk.")
+};
+
 opt IndexFile{
 "index-file",
 cat(Misc),
@@ -821,6 +830,10 @@
 Opts.StaticIndex = PAI.get();
   }
   Opts.AsyncThreadsCount = WorkerThreadsCount;
+  Opts.KeepPreambles = KeepPreambles.getValueOr(
+Opts.StorePreamblesInMemory ? DefaultKeepPreambleMemory :
+DefaultKeepPreambleDisk
+  );
   Opts.BuildRecoveryAST = RecoveryAST;
   Opts.PreserveRecoveryASTType = RecoveryASTType;
   Opts.FoldingRanges = FoldingRanges;
Index: clang-tools-extra/clangd/TUScheduler.h
===
--- clang-tools-extra/clangd/TUScheduler.h
+++ clang-tools-extra/clangd/TUScheduler.h
@@ -197,6 +197,10 @@
 /// No-op if AsyncThreadsCount is 0.
 bool AsyncPreambleBuilds = true;
 
+// The number of preambles that will be retained even after the file is
+// closed
+size_t KeepPreambles = 0;
+
 /// Used to create a context that wraps each single operation.
 /// Typically to inject per-file configuration.
 /// If the path is empty, context sholud be "generic".
@@ -305,6 +309,9 @@
   /// an LRU cache.
   class ASTCache;
 
+  /// Responsible for retaining preambles.
+  class PreambleCache;
+
   // The file being built/processed in the current thread. This is a hack in
   // order to get the file name into the index implementations. Do not depend on
   // this inside clangd.
@@ -321,6 +328,7 @@
   std::unique_ptr Callbacks; // not nullptr
   Semaphore Barrier;
   llvm::StringMap> Files;
+  std::unique_ptr CachedPreambles;
   std::unique_ptr IdleASTs;
   // None when running tasks synchronously and non-None when running tasks
   // asynchronously.
Index: clang-tools-extra/clangd/TUScheduler.cpp
===
--- clang-tools-extra/clangd/TUScheduler.cpp
+++ clang-tools-extra/clangd/TUScheduler.cpp
@@ -179,6 +179,91 @@
   std::vector LRU; /* GUARDED_BY(Mut) */
 };
 
+/// LRU cache with amortized O(1) put and take
+/// Preambles can be stored on disk so we may want to store a high
+/// number of entries
+class TUScheduler::PreambleCache {
+public:
+  PreambleCache(size_t MaxSize, bool StorePreamblesInMemory)
+  : MaxSize(MaxSize), StorePreamblesInMemory(StorePreamblesInMemory) {
+vlog("TUScheduler will cache {0} preambles", MaxSize);
+  }
+
+  /// Get the preamble associated with a \p Key, removing
+  /// it from the cache
+  std::shared_ptr take(llvm::StringRef Key) {
+auto It = Data.find(Key);
+if (It == Data.end())
+  return nullptr;
+auto Result = std::move(It->second);
+
+// Remove the key from all internal data structures
+auto KeyToLRUIt = KeyToLRU.find(Key);
+assert(KeyToLRUIt != KeyToLRU.end() && "Key is missing");
+auto LRUIt = KeyToLRUIt->second;
+Data.erase(It);
+KeyToLRU.erase(KeyToLRUIt);
+LRU.erase(LRUIt);
+
+return Result;
+  }
+
+  /// Add a \p Preamble associated with a \p Key, the preamble must
+  ///