https://github.com/likewhatevs created 
https://github.com/llvm/llvm-project/pull/180402

Hi, awesome project, thank you!

I dabble with kernel stuff and no LSP works well there.

This PR aims to improve on that by adding better support for a (arguably too) 
common pattern in kernel stuff, the unity build pattern.

The tl;dr of this build pattern is "include one c file in another c file 
because reasons (make builds faster, i think)". It makes clangd less effective 
because it leads to there being no entries in the compilation database for 
included c files.

This PR works around that by using using already-present heuristics to 
propagate parent files's include's such that things resolve in c files included 
in c files correctly.

I tried to do this previously in the script where we generate a compilation 
database from build files (here: 
https://lore.kernel.org/all/[email protected]/), 
and while that did make folks's ide's happy, that ultimately ended up not 
working (here: https://lkml.org/lkml/2025/12/17/1294) because it broke some 
automation somewhere, so I am trying to make a similar-but-more-correct version 
of that patch here.

I tested this PR two ways, via a script which counted errors while building 
files in a subdir of the linux kernel and (ofc) in vim.

### The script test showed a 70% reduction in errors for files following the 
unity build pattern (0's excluded from results):

#### system installed clangd:
```
❯ python ~/clangd_errors.py /usr/bin/clangd --dir kernel
# kernel: 35 unity build targets
   58  kernel/locking/rwbase_rt.c
    9  kernel/sched/core_sched.c
    4  kernel/sched/deadline.c
   11  kernel/sched/debug.c
 1319  kernel/sched/ext.c
  444  kernel/sched/ext_idle.c
    1  kernel/sched/idle.c
    1  kernel/sched/loadavg.c
  389  kernel/sched/psi.c
    1  kernel/sched/rt.c
  169  kernel/trace/trace_selftest.c

files  35
 2406  total errors
 2480  total diagnostics (errors + warnings)
```

#### pr clangd:
```
❯ python ~/clangd_errors.py ~/repos/llvm-project/build/bin/clangd --dir kernel
# kernel: 35 unity build targets
   58  kernel/locking/rwbase_rt.c
    9  kernel/sched/core_sched.c
   11  kernel/sched/debug.c
    1  kernel/sched/loadavg.c
  389  kernel/sched/psi.c
  169  kernel/trace/trace_selftest.c

files  35
  637  total errors
  732  total diagnostics (errors + warnings)
```

### The vim test (the best test, my happy developer with go-to-definition, 
greyed-out-disabled-code and tab-auto-complete test) showed the following:

#### system installed clangd:
<img width="2880" height="1800" alt="Screenshot From 2026-02-08 04-50-48" 
src="https://github.com/user-attachments/assets/d203ac3d-5c91-428a-bcb3-721dc6fa9cef";
 />

#### pr clangd:
<img width="2880" height="1800" alt="Screenshot From 2026-02-08 04-06-22" 
src="https://github.com/user-attachments/assets/798b6625-e18b-407f-a2fb-15e7de8bc25e";
 />

Please LMK what you think, thanks!

>From 808630c426710ed1c945c0f8ed7a4dd4e2c69218 Mon Sep 17 00:00:00 2001
From: Pat Somaru <[email protected]>
Date: Sat, 7 Feb 2026 21:27:09 -0500
Subject: [PATCH] [clangd] Inject preceding includes for unity build targets

When a .c file is #include'd by another .c file (unity builds), inject
the preceding includes from the proxy as -include flags so the target
gets the declarations it depends on.

After the existing proxy compile command resolution (which handles both
warm-cache via HeaderIncluderCache and cold-start via interpolated CDB
commands), check the Heuristic field set by transferCompileCommand /
InterpolatingCompilationDatabase. If the command was inferred from a
non-header file for a non-header target, preprocess the proxy file to
collect resolved #include paths preceding the target and inject them.

The preprocessing follows the same pattern as scanPreamble() in
Preamble.cpp but uses real VFS instead of an empty one, so that
command-line includes (e.g. -include kconfig.h) are processed and
to avoid re-scanning on every update.

Signed-off-by: Pat Somaru <[email protected]>
---
 clang-tools-extra/clangd/TUScheduler.cpp      | 131 +++++++++++++++++
 .../clangd/unittests/TUSchedulerTests.cpp     | 139 ++++++++++++++++++
 2 files changed, 270 insertions(+)

diff --git a/clang-tools-extra/clangd/TUScheduler.cpp 
b/clang-tools-extra/clangd/TUScheduler.cpp
index 0661ecb58008e..20864e8e125f4 100644
--- a/clang-tools-extra/clangd/TUScheduler.cpp
+++ b/clang-tools-extra/clangd/TUScheduler.cpp
@@ -54,6 +54,7 @@
 #include "GlobalCompilationDatabase.h"
 #include "ParsedAST.h"
 #include "Preamble.h"
+#include "SourceCode.h"
 #include "clang-include-cleaner/Record.h"
 #include "support/Cancellation.h"
 #include "support/Context.h"
@@ -65,6 +66,7 @@
 #include "support/Trace.h"
 #include "clang/Basic/Stack.h"
 #include "clang/Frontend/CompilerInvocation.h"
+#include "clang/Frontend/FrontendActions.h"
 #include "clang/Tooling/CompilationDatabase.h"
 #include "llvm/ADT/FunctionExtras.h"
 #include "llvm/ADT/STLExtras.h"
@@ -349,6 +351,80 @@ bool isReliable(const tooling::CompileCommand &Cmd) {
   return Cmd.Heuristic.empty();
 }
 
+/// Run the preprocessor over a proxy file to collect resolved paths of
+/// #include directives that precede TargetFile. Uses real VFS for correct
+/// #ifdef evaluation and include path resolution.
+///
+/// This follows the same pattern as scanPreamble() in Preamble.cpp but uses
+/// real VFS instead of an empty one, so that command-line includes (e.g.
+/// -include kconfig.h) are processed and #ifdef guards evaluate correctly.
+std::vector<std::string>
+scanPrecedingIncludes(const ThreadsafeFS &TFS,
+                      const tooling::CompileCommand &ProxyCmd,
+                      PathRef TargetFile) {
+  // Resolve the proxy file's absolute path and read its contents.
+  llvm::SmallString<256> ProxyFile(ProxyCmd.Filename);
+  if (!llvm::sys::path::is_absolute(ProxyFile)) {
+    ProxyFile = ProxyCmd.Directory;
+    llvm::sys::path::append(ProxyFile, ProxyCmd.Filename);
+  }
+  llvm::sys::path::remove_dots(ProxyFile, /*remove_dot_dot=*/true);
+
+  auto FS = TFS.view(std::nullopt);
+
+  // Canonicalize TargetFile to match Inc.Resolved (which uses canonical
+  // paths from getCanonicalPath). This ensures symlinks don't cause
+  // mismatches.
+  llvm::SmallString<256> CanonTarget;
+  if (FS->getRealPath(TargetFile, CanonTarget))
+    CanonTarget = TargetFile; // Fallback if canonicalization fails.
+  llvm::sys::path::remove_dots(CanonTarget, /*remove_dot_dot=*/true);
+
+  auto Buf = FS->getBufferForFile(ProxyFile);
+  if (!Buf)
+    return {};
+
+  // Build a compiler invocation from the proxy's compile command.
+  ParseInputs PI;
+  PI.Contents = (*Buf)->getBuffer().str();
+  PI.TFS = &TFS;
+  PI.CompileCommand = ProxyCmd;
+  IgnoringDiagConsumer IgnoreDiags;
+  auto CI = buildCompilerInvocation(PI, IgnoreDiags);
+  if (!CI)
+    return {};
+  CI->getDiagnosticOpts().IgnoreWarnings = true;
+
+  auto ContentsBuffer = llvm::MemoryBuffer::getMemBuffer(PI.Contents);
+  auto Clang = prepareCompilerInstance(
+      std::move(CI), /*Preamble=*/nullptr, std::move(ContentsBuffer),
+      TFS.view(std::nullopt), IgnoreDiags);
+  if (!Clang || Clang->getFrontendOpts().Inputs.empty())
+    return {};
+
+  // Run preprocessor and collect main-file includes via IncludeStructure.
+  PreprocessOnlyAction Action;
+  if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0]))
+    return {};
+  IncludeStructure Includes;
+  Includes.collect(*Clang);
+  if (llvm::Error Err = Action.Execute()) {
+    llvm::consumeError(std::move(Err));
+    return {};
+  }
+  Action.EndSourceFile();
+
+  // Return resolved paths of includes that precede TargetFile.
+  std::vector<std::string> Result;
+  for (const auto &Inc : Includes.MainFileIncludes) {
+    if (Inc.Resolved == CanonTarget)
+      return Result;
+    if (!Inc.Resolved.empty())
+      Result.push_back(Inc.Resolved);
+  }
+  return {}; // TargetFile not found among the proxy's includes.
+}
+
 /// Threadsafe manager for updating a TUStatus and emitting it after each
 /// update.
 class SynchronizedTUStatus {
@@ -765,6 +841,12 @@ class ASTWorker {
   std::atomic<unsigned> ASTBuildCount = {0};
   std::atomic<unsigned> PreambleBuildCount = {0};
 
+  /// Cached result of scanPrecedingIncludes for unity build support.
+  /// Only accessed from the worker thread, no locking needed.
+  std::string CachedProxyPath;
+  llvm::sys::TimePoint<> CachedProxyMTime;
+  std::vector<std::string> CachedPrecedingIncludes;
+
   SynchronizedTUStatus Status;
   PreambleThread PreamblePeer;
 };
@@ -885,6 +967,55 @@ void ASTWorker::update(ParseInputs Inputs, WantDiagnostics 
WantDiags,
         }
       }
     }
+    // Unity builds: when a non-header .c file has a command inferred from
+    // another non-header .c file (e.g. ext.c inferred from build_policy.c),
+    // preprocess the proxy to find preceding #include directives and inject
+    // them as -include flags. This works for both warm-cache (proxy from
+    // HeaderIncluderCache, command via transferCompileCommand) and cold-start
+    // (interpolated command from CDB), since both set the Heuristic field.
+    if (Cmd && !isHeaderFile(FileName)) {
+      llvm::StringRef Heuristic = Cmd->Heuristic;
+      if (Heuristic.consume_front("inferred from ") &&
+          !isHeaderFile(Heuristic)) {
+        // Resolve proxy path (Heuristic may be a relative filename).
+        llvm::SmallString<256> ProxyPath(Heuristic);
+        if (!llvm::sys::path::is_absolute(ProxyPath)) {
+          ProxyPath = Cmd->Directory;
+          llvm::sys::path::append(ProxyPath, Heuristic);
+        }
+        llvm::sys::path::remove_dots(ProxyPath, /*remove_dot_dot=*/true);
+        // Use cached result if the proxy hasn't changed, otherwise
+        // fetch the proxy's compile command and preprocess it.
+        auto ProxyStat = Inputs.TFS->view(std::nullopt)
+                             ->status(ProxyPath);
+        auto ProxyMTime = ProxyStat
+                              ? ProxyStat->getLastModificationTime()
+                              : llvm::sys::TimePoint<>();
+        std::vector<std::string> *Preceding = nullptr;
+        if (CachedProxyPath == ProxyPath &&
+            CachedProxyMTime == ProxyMTime) {
+          Preceding = &CachedPrecedingIncludes;
+        } else if (auto ProxyCmd =
+                       CDB.getCompileCommand(ProxyPath)) {
+          if (isReliable(*ProxyCmd)) {
+            CachedProxyPath = std::string(ProxyPath);
+            CachedProxyMTime = ProxyMTime;
+            CachedPrecedingIncludes = scanPrecedingIncludes(
+                *Inputs.TFS, *ProxyCmd, FileName);
+            Preceding = &CachedPrecedingIncludes;
+          }
+        }
+        if (Preceding && !Preceding->empty()) {
+          std::vector<std::string> Flags;
+          for (const auto &Inc : *Preceding) {
+            Flags.push_back("-include");
+            Flags.push_back(Inc);
+          }
+          auto It = llvm::find(Cmd->CommandLine, "--");
+          Cmd->CommandLine.insert(It, Flags.begin(), Flags.end());
+        }
+      }
+    }
     if (Cmd)
       Inputs.CompileCommand = std::move(*Cmd);
     else
diff --git a/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp 
b/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
index c6862b5eba6fa..b5cab2efab611 100644
--- a/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
+++ b/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
@@ -1438,6 +1438,145 @@ TEST_F(TUSchedulerTests, IncluderCache) {
       << "association invalidated and then claimed by main3";
 }
 
+// When a non-header file (e.g. child.c) is #included by another non-header
+// file (parent.c), the proxy mechanism should inject -include flags for
+// includes that precede child.c in parent.c. Header files should not get
+// preceding includes injected even when they have a proxy.
+TEST_F(TUSchedulerTests, IncluderCacheNonHeaderProxy) {
+  static std::string Parent = testPath("parent.c"),
+                     Child = testPath("child.c"),
+                     Header = testPath("header.h"),
+                     NoCmd = testPath("no_cmd.h");
+  struct NonHeaderCDB : public GlobalCompilationDatabase {
+    std::optional<tooling::CompileCommand>
+    getCompileCommand(PathRef File) const override {
+      if (File == Child || File == NoCmd)
+        return std::nullopt;
+      auto Basic = getFallbackCommand(File);
+      Basic.Heuristic.clear();
+      if (File == Parent)
+        Basic.CommandLine.push_back("-DPARENT");
+      return Basic;
+    }
+  } CDB;
+  TUScheduler S(CDB, optsForTest());
+  auto GetFlags = [&](PathRef File) {
+    S.update(File, getInputs(File, ";"), WantDiagnostics::Yes);
+    EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60)));
+    Notification CmdDone;
+    tooling::CompileCommand Cmd;
+    S.runWithPreamble(
+        "GetFlags", File, TUScheduler::StaleOrAbsent,
+        [&](llvm::Expected<InputsAndPreamble> Inputs) {
+          ASSERT_FALSE(!Inputs) << Inputs.takeError();
+          Cmd = std::move(Inputs->Command);
+          CmdDone.notify();
+        });
+    CmdDone.wait();
+    EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60)));
+    return Cmd.CommandLine;
+  };
+
+  FS.Files[Header] = "";
+  FS.Files[Child] = ";";
+  FS.Files[NoCmd] = ";";
+
+  // Parent includes header.h, then no_cmd.h, then child.c.
+  // no_cmd.h has no CDB entry, so it gets parent.c as proxy.
+  const char *ParentContents = R"cpp(
+    #include "header.h"
+    #include "no_cmd.h"
+    #include "child.c"
+  )cpp";
+  FS.Files[Parent] = ParentContents;
+  S.update(Parent, getInputs(Parent, ParentContents),
+           WantDiagnostics::Yes);
+  EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60)));
+
+  // child.c is a non-header included by a non-header: should get
+  // -include flags for preceding includes.
+  auto ChildFlags = GetFlags(Child);
+  EXPECT_THAT(ChildFlags, Contains("-DPARENT"))
+      << "child.c should use parent.c's compile command";
+  EXPECT_THAT(ChildFlags, Contains("-include"))
+      << "child.c should have preceding includes injected";
+
+  // no_cmd.h is a header included by a non-header: should get proxy
+  // flags but NOT preceding -include injection.
+  auto HdrFlags = GetFlags(NoCmd);
+  EXPECT_THAT(HdrFlags, Contains("-DPARENT"))
+      << "no_cmd.h should use parent.c's compile command";
+  EXPECT_THAT(HdrFlags, Not(Contains("-include")))
+      << "header files should not get preceding includes";
+}
+
+// Cold-start: when a non-header file is opened directly without the proxy
+// (parent) ever being opened, the proxy cache is empty. The cold-start path
+// should preprocess the proxy file (respecting #ifdef guards) and inject
+// preceding includes.
+TEST_F(TUSchedulerTests, IncluderCacheNonHeaderProxyColdStart) {
+  // Cold-start: child.c is opened directly without parent.c being opened
+  // first. The CDB returns an interpolated command for child.c inferred from
+  // parent.c. Preprocessing parent.c should find preceding includes and
+  // inject them as -include flags.
+  static std::string Parent = testPath("parent.c"),
+                     Child = testPath("child.c"),
+                     Header = testPath("header.h"),
+                     Sibling = testPath("sibling.c");
+  FS.Files[Header] = "";
+  FS.Files[Sibling] = ";";
+  FS.Files[Child] = ";";
+  FS.Files[Parent] = R"cpp(
+    #include "header.h"
+    #include "sibling.c"
+    #include "child.c"
+  )cpp";
+
+  struct InterpolatingCDB : public GlobalCompilationDatabase {
+    std::optional<tooling::CompileCommand>
+    getCompileCommand(PathRef File) const override {
+      if (File == Parent) {
+        auto Basic = getFallbackCommand(File);
+        Basic.Heuristic.clear();
+        Basic.CommandLine.push_back("-DPARENT");
+        return Basic;
+      }
+      if (File == Child) {
+        // Simulate interpolated command inferred from parent.c
+        auto Basic = getFallbackCommand(File);
+        Basic.Heuristic = "inferred from " + Parent;
+        Basic.CommandLine.push_back("-DPARENT");
+        return Basic;
+      }
+      return std::nullopt;
+    }
+  } CDB;
+
+  TUScheduler S(CDB, optsForTest());
+  S.update(Child, getInputs(Child, ";"), WantDiagnostics::Yes);
+  EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60)));
+
+  Notification CmdDone;
+  tooling::CompileCommand Cmd;
+  S.runWithPreamble(
+      "GetFlags", Child, TUScheduler::StaleOrAbsent,
+      [&](llvm::Expected<InputsAndPreamble> Inputs) {
+        ASSERT_FALSE(!Inputs) << Inputs.takeError();
+        Cmd = std::move(Inputs->Command);
+        CmdDone.notify();
+      });
+  CmdDone.wait();
+  EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60)));
+
+  EXPECT_THAT(Cmd.CommandLine, Contains("-DPARENT"))
+      << "child.c should use the interpolated compile command";
+  EXPECT_THAT(Cmd.CommandLine, Contains("-include"))
+      << "child.c should have preceding includes from cold-start proxy scan";
+  // header.h and sibling.c should precede child.c.
+  EXPECT_THAT(Cmd.CommandLine, Contains(Header));
+  EXPECT_THAT(Cmd.CommandLine, Contains(Sibling));
+}
+
 TEST_F(TUSchedulerTests, PreservesLastActiveFile) {
   for (bool Sync : {false, true}) {
     auto Opts = optsForTest();

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to