[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-06 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 closed 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

@sam-mccall Do you have any additional feedback? You might want to check how 
this PR and #66966 impact performance for you.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> Thanks for iterating! I find the current implementation much clearer.

Thanks for your patience!

> The only thing I might quibble about is the "child" vs. "parent" terminology 
> you changed: I think it's fairly ambiguous either way, because the node is 
> the "child" from the perspective of a top-down include hierarchy, but it's 
> the "parent" from the perspective of the bottom-up search. You could maybe 
> change it to IncludedFile or something, but I don't feel very strongly about 
> it. Child is no worse than parent so if you prefer child I don't think you 
> need to change it.

I understand. My problem with using "parent" here is that we're using "parent" 
just a couple lines above to describe the opposite relationship:

```
// - one loc is a parent of the other (we consider the parent as "first")
```

So I believe framing all the relationships in terms of the top-down 
include/expansion hierarchy makes more sense than mixing them up with the 
bottom-up tree walk.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits

https://github.com/benlangmuir approved this pull request.

Thanks for iterating! I find the current implementation much clearer.

The only thing I might quibble about is the "child" vs. "parent" terminology 
you changed: I think it's fairly ambiguous either way, because the node is the 
"child" from the perspective of a top-down include hierarchy, but it's the 
"parent" from the perspective of the bottom-up search. You could maybe change 
it to IncludedFile or something, but I don't feel very strongly about it. Child 
is no worse than parent so if you prefer child I don't think you need to change 
it.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 61e8961cde95e9e8ce8cea3efd6aa52273f430e9 Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/6] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..85f49e21b2e2ec1 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  // ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits


@@ -1986,13 +1984,15 @@ bool SourceManager::isInTheSameTranslationUnitImpl(
   if (isLoadedFileID(LOffs.first) != isLoadedFileID(ROffs.first))
 return false;
 
-  // If both are loaded from different AST files.
   if (isLoadedFileID(LOffs.first) && isLoadedFileID(ROffs.first)) {
-auto FindTableSegment = [this](FileID FID) {
-  return llvm::upper_bound(LoadedSLocEntryTableSegments, -FID.ID - 2);
+auto FindSLocEntryAlloc = [this](FileID FID) {
+  // FileIDs are negative, we store the beginning of each allocation (the
+  // lowest FileID), later allocations have lower FileIDs.
+  return llvm::upper_bound(LoadedSLocEntryAllocBegin, FID, std::greater{});

jansvoboda11 wrote:

But yes, `upper_bound` ended up working because I was incorrectly doing `- 2` 
instead of `- 1` on the base `FileID`.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits


@@ -1986,13 +1984,15 @@ bool SourceManager::isInTheSameTranslationUnitImpl(
   if (isLoadedFileID(LOffs.first) != isLoadedFileID(ROffs.first))
 return false;
 
-  // If both are loaded from different AST files.
   if (isLoadedFileID(LOffs.first) && isLoadedFileID(ROffs.first)) {
-auto FindTableSegment = [this](FileID FID) {
-  return llvm::upper_bound(LoadedSLocEntryTableSegments, -FID.ID - 2);
+auto FindSLocEntryAlloc = [this](FileID FID) {
+  // FileIDs are negative, we store the beginning of each allocation (the
+  // lowest FileID), later allocations have lower FileIDs.
+  return llvm::upper_bound(LoadedSLocEntryAllocBegin, FID, std::greater{});

jansvoboda11 wrote:

No, that's just me not being able to visualize binary search over anything else 
than positive increasing integers 

You're right, this should be `lower_bound`. `upper_bound` would report the 
incorrect allocation for the first `FileID`. Fixed in the latest commit and 
added a unit test.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 edited 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits


@@ -458,14 +458,12 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-
-  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;
-  LoadedSLocEntryTableSegments.push_back(NewTableSize);
-  LoadedSLocEntryTable.resize(NewTableSize);
-  SLocEntryLoaded.resize(NewTableSize);
-
+  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
+  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
   CurrentLoadedOffset -= TotalSize;
-  return std::make_pair(-NewTableSize - 1, CurrentLoadedOffset);
+  int ID = LoadedSLocEntryTable.size();
+  LoadedSLocEntryAllocBegin.push_back(FileID::get(-ID - 2));

jansvoboda11 wrote:

*Sigh*, I don't think my previous comment is right.

The `- 1` here is just to make space for the sentinel `FileID(-1)`.

The `- 1` in `ASTReader::TranslateFileID()` is to account for the fact that the 
module had it's own invalid `FileID(0)` which isn't necessary to 
serialize/deserialize, since it won't show up in valid AST or PP state. 
However, we serialize `FileID` values without accounting for the fact that 
`FileID(0)` gets dropped. `ASTReader` doesn't allocate space in `SourceManager` 
for `FileID(0)` and so we need to decrease all deserialized `FileID` values by 
one to fit into the allocation.

The "we rely on `ASTWriter` serializing the wrong thing" I got confused with 
the index for `SLocEntry` that should be preloaded, which used to be 
`SLocEntryOffsets.size()` **after** pushing the entry offset into the vector, 
so the `ASTReader` always needed to do do `- 1` when computing the index to 
jump to: `int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;`. 
That's actually removed by this patch, though.

So in short, `AllocateLoadedSLocEntries()` returns the correct base ID and 
doing `FileID(ID - 2)` is wrong. Fixed in the latest commit.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits


@@ -458,14 +458,12 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-
-  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;
-  LoadedSLocEntryTableSegments.push_back(NewTableSize);
-  LoadedSLocEntryTable.resize(NewTableSize);
-  SLocEntryLoaded.resize(NewTableSize);
-
+  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
+  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
   CurrentLoadedOffset -= TotalSize;
-  return std::make_pair(-NewTableSize - 1, CurrentLoadedOffset);
+  int ID = LoadedSLocEntryTable.size();
+  LoadedSLocEntryAllocBegin.push_back(FileID::get(-ID - 2));

benlangmuir wrote:

藍 okay, maybe drop a FIXME on the return statement for good measure, but SGTM.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits

https://github.com/benlangmuir edited 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits


@@ -458,14 +458,12 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-
-  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;
-  LoadedSLocEntryTableSegments.push_back(NewTableSize);
-  LoadedSLocEntryTable.resize(NewTableSize);
-  SLocEntryLoaded.resize(NewTableSize);
-
+  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
+  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
   CurrentLoadedOffset -= TotalSize;
-  return std::make_pair(-NewTableSize - 1, CurrentLoadedOffset);
+  int ID = LoadedSLocEntryTable.size();
+  LoadedSLocEntryAllocBegin.push_back(FileID::get(-ID - 2));

jansvoboda11 wrote:

Yeah, I think the ID `AllocateLoadedSLocEntries()` returns is incorrect. It's 
stored in `ModuleFile::SLocEntryBaseID` and used in 
`ASTReader::TranslateFileID()` where we actually do `- 1` again (to get the 
correct `ID - 2`). I do want to fix this, but there are places in `ASTWriter` 
that rely on this being wrong, so it's a bit more involved to make it into this 
PR.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits


@@ -458,14 +458,12 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-
-  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;
-  LoadedSLocEntryTableSegments.push_back(NewTableSize);
-  LoadedSLocEntryTable.resize(NewTableSize);
-  SLocEntryLoaded.resize(NewTableSize);
-
+  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
+  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
   CurrentLoadedOffset -= TotalSize;
-  return std::make_pair(-NewTableSize - 1, CurrentLoadedOffset);
+  int ID = LoadedSLocEntryTable.size();
+  LoadedSLocEntryAllocBegin.push_back(FileID::get(-ID - 2));

benlangmuir wrote:

To clarify (from `LoadedSLocEntryAllocBegin`):
> we keep the first FileID

but `AllocateLoadedSLocEntries` says
> The lowest ID ... of the entries will be returned.

So it's odd that one is -ID-2 and the other is -ID-1. Based on the comments I 
would have expected they'd either be the same or one would be offset from the 
other by the number of entries allocated.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits


@@ -1986,13 +1984,15 @@ bool SourceManager::isInTheSameTranslationUnitImpl(
   if (isLoadedFileID(LOffs.first) != isLoadedFileID(ROffs.first))
 return false;
 
-  // If both are loaded from different AST files.
   if (isLoadedFileID(LOffs.first) && isLoadedFileID(ROffs.first)) {
-auto FindTableSegment = [this](FileID FID) {
-  return llvm::upper_bound(LoadedSLocEntryTableSegments, -FID.ID - 2);
+auto FindSLocEntryAlloc = [this](FileID FID) {
+  // FileIDs are negative, we store the beginning of each allocation (the
+  // lowest FileID), later allocations have lower FileIDs.
+  return llvm::upper_bound(LoadedSLocEntryAllocBegin, FID, std::greater{});

benlangmuir wrote:

I would have expected lower_bound; is this related to -2 vs -1?

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 61e8961cde95e9e8ce8cea3efd6aa52273f430e9 Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/5] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..85f49e21b2e2ec1 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  // ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 61e8961cde95e9e8ce8cea3efd6aa52273f430e9 Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/4] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..85f49e21b2e2ec1 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  // ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-04 Thread Ben Langmuir via cfe-commits


@@ -701,6 +701,10 @@ class SourceManager : public RefCountedBase 
{
   /// use (-ID - 2).
   SmallVector LoadedSLocEntryTable;
 
+  /// For each allocation in LoadedSLocEntryTable, we keep the new size. This
+  /// can be used to determine whether two FileIDs come from the same AST file.
+  SmallVector LoadedSLocEntryTableSegments;

benlangmuir wrote:

> I guess I could store `FileID(ModuleFile::SLocEntryBaseID - 1)` instead 
> (which is the module's `FileID(1)` translated into the importer address space 
> for loaded entries) and do `lower_bound` over that:
> 
> ```c++
> SmallVector LoadedSLocEntryTableSegments{
>   FileID(-10),  // represents FileIDs from -10 to -2
>   FileID(-50),  // represents FileIDs from -50 to -11
>   FileID(-90)}; // represents FileIDs from -90 to -51
> 
> llvm::lower_bound(LoadedSLocEntryTableSegments, FID, [](FileID Element, 
> FileID Value) {
>   return Element.ID > Value.ID;
> });
> ```
> 
> The lambda is still confusing... Any better ideas?

This is roughly what I had in mind, though I would just do `std::greater` for 
that comparison function (FileID has comparison operators) and leave a comment 
that loaded FileID is negative.  I find this much clearer than searching for 
`-ID - 2`.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 edited 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 edited 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits


@@ -701,6 +701,10 @@ class SourceManager : public RefCountedBase 
{
   /// use (-ID - 2).
   SmallVector LoadedSLocEntryTable;
 
+  /// For each allocation in LoadedSLocEntryTable, we keep the new size. This
+  /// can be used to determine whether two FileIDs come from the same AST file.
+  SmallVector LoadedSLocEntryTableSegments;

jansvoboda11 wrote:

Or this?

```c++
SmallVector LoadedSLocEntryTableSegments{
  FileID(-1),   // represents FileIDs from -10 to -2
  FileID(-10),  // represents FileIDs from -50 to -11
  FileID(-50)}; // represents FileIDs from -90 to -51

llvm::upper_bound(llvm::reverse(LoadedSLocEntryTableSegments), FID);
```

The stored `FileIDs` have pretty weird values, though...

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits


@@ -458,11 +458,14 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
-  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
+
+  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;

jansvoboda11 wrote:

I think `size_t` fits better, I'll use that in the next revision.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits


@@ -701,6 +701,10 @@ class SourceManager : public RefCountedBase 
{
   /// use (-ID - 2).
   SmallVector LoadedSLocEntryTable;
 
+  /// For each allocation in LoadedSLocEntryTable, we keep the new size. This
+  /// can be used to determine whether two FileIDs come from the same AST file.
+  SmallVector LoadedSLocEntryTableSegments;

jansvoboda11 wrote:

> IIUC what you're ultimately checking is whether two FileIDs were allocated in 
> the same call to AllocateLoadedSLocEntries, and relying on ASTReader to make 
> a single call to that function for each AST file.

That's right.

> Is there a reason you're storing the size (ie translating to 0-based) instead 
> of storing the actual boundary FileID and doing the search based on that?

I just found it easy to reason about, since it's an `upper_bound` on increasing 
positive values.

I guess I could store `FileID(ModuleFile::SLocEntryBaseID - 1)` instead (which 
is the module's `FileID(1)` translated into the importer address space for 
loaded entries) and do `lower_bound` over that:

```c++
SmallVector LoadedSLocEntryTableSegments{
  FileID(-10),  // represents FileIDs from -10 to -2
  FileID(-50),  // represents FileIDs from -50 to -11
  FileID(-90)}; // represents FileIDs from -90 to -51

llvm::lower_bound(LoadedSLocEntryTableSegments, FID, [](FileID Element, FileID 
Value) {
  return Element.ID > Value.ID;
});
```

The lambda is still confusing... Any better ideas?

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Ben Langmuir via cfe-commits


@@ -2082,6 +2082,11 @@ std::pair 
SourceManager::isInTheSameTranslationUnit(
   if (LOffs.first == ROffs.first)
 return std::make_pair(true, LOffs.second < ROffs.second);
 
+  // Built-ins are not considered part of any TU.

benlangmuir wrote:

Yes, this comment is dead and I didn't notice it was still there when I 
submitted.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits


@@ -2082,6 +2082,11 @@ std::pair 
SourceManager::isInTheSameTranslationUnit(
   if (LOffs.first == ROffs.first)
 return std::make_pair(true, LOffs.second < ROffs.second);
 
+  // Built-ins are not considered part of any TU.

jansvoboda11 wrote:

This comment appeared in your latest review, but refers to outdated diff. I 
assume that's not intentional?

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Ben Langmuir via cfe-commits


@@ -458,11 +458,14 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
-  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
+
+  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;

benlangmuir wrote:

You're storing `size_t` here, but the value is calculated as `unsigned`.  Not 
sure which is correct

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Ben Langmuir via cfe-commits


@@ -458,11 +458,14 @@ SourceManager::AllocateLoadedSLocEntries(unsigned 
NumSLocEntries,
   CurrentLoadedOffset - TotalSize < NextLocalOffset) {
 return std::make_pair(0, 0);
   }
-  LoadedSLocEntryTable.resize(LoadedSLocEntryTable.size() + NumSLocEntries);
-  SLocEntryLoaded.resize(LoadedSLocEntryTable.size());
+
+  unsigned NewTableSize = LoadedSLocEntryTable.size() + NumSLocEntries;
+  LoadedSLocEntryTableSegments.push_back(NewTableSize);
+  LoadedSLocEntryTable.resize(NewTableSize);
+  SLocEntryLoaded.resize(NewTableSize);
+
   CurrentLoadedOffset -= TotalSize;
-  int ID = LoadedSLocEntryTable.size();

benlangmuir wrote:

Nit: I find it clearer if we keep the `ID` variable just to clarify the role 
here since you then return a non-trivial expression using the value.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Ben Langmuir via cfe-commits


@@ -701,6 +701,10 @@ class SourceManager : public RefCountedBase 
{
   /// use (-ID - 2).
   SmallVector LoadedSLocEntryTable;
 
+  /// For each allocation in LoadedSLocEntryTable, we keep the new size. This
+  /// can be used to determine whether two FileIDs come from the same AST file.
+  SmallVector LoadedSLocEntryTableSegments;

benlangmuir wrote:

I feel like this could use more explanation. IIUC what you're ultimately 
checking is whether two FileIDs were allocated in the same call to 
`AllocateLoadedSLocEntries`, and relying on ASTReader to make a single call to 
that function for each AST file.  Is there a reason you're storing the size (ie 
translating to 0-based) instead of storing the actual boundary FileID and doing 
the search based on that? I feel like searching for -ID - 2 is pretty 
non-obvious.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Ben Langmuir via cfe-commits


@@ -2082,6 +2082,11 @@ std::pair 
SourceManager::isInTheSameTranslationUnit(
   if (LOffs.first == ROffs.first)
 return std::make_pair(true, LOffs.second < ROffs.second);
 
+  // Built-ins are not considered part of any TU.

benlangmuir wrote:

Does this have to come before `getInBeforeInTUCache`? I'm wondering if it is 
possible and profitable to sink this off the hot path.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> > That said, isBeforeInTranslationUnit is slow already, and I don't think 
> > it's hot in clang, moreso in tools like clang-tidy whose performance is 
> > less-critical. So let's make it right, and then optimize it again if 
> > problems arise.
> 
> It may not be a hot function in Clang itself, but it is quite hot for some 
> downstream Clang forks. While at Coverity, I had to avoid direct use of it 
> because it was far too slow. Anything that speeds it up would be appreciated.

That's good to know, thanks! I think in it's current version, this patch 
shouldn't make that function slower, but I'll verify this after I get a green 
light on the semantic changes.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-03 Thread Tom Honermann via cfe-commits

tahonermann wrote:

> That said, isBeforeInTranslationUnit is slow already, and I don't think it's 
> hot in clang, moreso in tools like clang-tidy whose performance is 
> less-critical. So let's make it right, and then optimize it again if problems 
> arise.

It may not be a hot function in Clang itself, but it is quite hot for some 
downstream Clang forks. While at Coverity, I had to avoid direct use of it 
because it was far too slow. Anything that speeds it up would be appreciated.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-02 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 61e8961cde95e9e8ce8cea3efd6aa52273f430e9 Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/3] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..85f49e21b2e2ec1 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  // ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-02 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

I don't understand why the tie-breaking code calls the previously visited 
`FileID` a parent. We're walking **up** the include/expansion tree, so I think 
it should be called child instead. LMK if I'm misunderstanding.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-02 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

Turns out some clients are calling `isBeforeInTranslationUnit()` before 
checking if both `SourceLocations` are indeed in the same TU. I left behind a 
FIXME to call `llvm_unreachable()`, but for now, I just compare the `FileIDs` 
to keep things working.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-02 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

Since I moved up the special-casing of built-ins buffer up in my last commit, 
it was happening before we got chance to walk up from `FileID(3)` to 
`FileID(2)`, meaning it wasn't kicking in. I think we need to sink the 
special-casing into `isInTheSameTranslationUnit()` and apply it to whatever 
`FileID` we end up with after not finding the common ancestor without assuming 
they are in different TUs entirely.

Also, it seems that the code previously assumed that two `SourceLocations` are 
in the same TU iff they have the same common ancestor. There are at least two 
counter-examples to this:
* The built-ins buffer doesn't have any ancestor, but still is part of the TU.
* With implicit modules (or lazy loaded explicit modules), all top-level 
SLocEntries loaded from PCM files have their include location adjusted to the 
import statement, so they became part of the importer SLocEntry tree and are 
all considered to be part of the importing TU.

I started tracking the SLocEntries allocated for each AST file and use that to 
cut off the include chain walk in order to stay in the same TU we started in. 
This basically make sure we don't skip the special case for loaded built-ins 
buffers.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-10-02 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

I got around investigating 
"SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp" and found out it uses 
this pattern:
```
%clang_cc1 ... -include %s %s
```

This means `FileID(1)` is the main input file `%s` and `FileID(2)` is the 
built-ins buffer that includes `%s` as `FileID(3)`.

Since I moved up the special-casing of built-ins buffer up in my last commit, 
it was happening **before** we got chance to walk up from `FileID(3)` to 
`FileID(2)`, meaning it wasn't kicking in. I think we need to sink the 
special-casing into `isInTheSameTranslationUnit()` and apply it to whatever 
`FileID` we end up with after not finding the common ancestor.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-29 Thread Sam McCall via cfe-commits

sam-mccall wrote:

Ripping out this feature sounds great, as does fixing the non-translation of 
builtin buffer locations.

My only concern is that this will make isBeforeInTranslationUnit more 
expensive. A loc being in one of these special "" is the rare case, 
and the code recognizing them doesn't look fast (which didn't previously 
matter, as we only reached it rarely).

That said, isBeforeInTranslationUnit is slow already, and I don't think it's 
hot in clang, moreso in tools like clang-tidy whose performance is 
less-critical. So let's make it right, and then optimize it again if problems 
arise.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-28 Thread Ben Langmuir via cfe-commits

benlangmuir wrote:

Thanks for laying it out. I also find this easier to understand in your latest 
changes because the comments in `isBeforeInTranslationUnit` lay out the 
intended ordering without relying on `isInTheSameTranslationUnit` returning 
`false` for builtin buffers, which is the part that seemed suspect.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-28 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 1e36c68229a98ffbdb61ea652077bc3356fc177f Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/4] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-28 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> > I changed SourceManager::isInTheSameTranslationUnit() to check for that 
> > case explicitly instead of relying on the fact/bug that built-ins buffers 
> > happened to be assigned an untranslated include SourceLocation.
> 
> What is the translated location you get here? It's hard for me to tell if the 
> comment in `isBeforeInTranslationUnit` is describing the desired behaviour or 
> just a consequence of how it happened to work at the time.

When a module is built, its `SourceLocation` offsets go from 0 to N. When it's 
imported, they are remapped into the range from 232-N to 
232.

So the fact that we didn't remap the offset of the built-ins buffer meant it 
was not considered part of the imported module. This made 
`isInTheSameTranslationUnit(, module.h)` return `[false, false]` and 
the special case in `isBeforeInTranslationUnit()` would make sure to return 
`true` in order to enforce that the `` buffer always compares `<` to 
other offsets in the module.

It took me a while to figure out why that is desired. My understanding is that 
whether one offset is before another can't be just simple comparison of the 
numeric values. (More details on this are in `isInTheSameTranslationUnit()`.)

Here's a little example of how this ends up working with modules:

| entry   | ID [t] | offset| `isBefore(?, t(250))` | 
`isBefore(t(250), ?)` |
|-||---|---|---|
| dummy   | 0 [-6] | <0,1) | yes   | no 
   |
| modulemap   | 1 [-5] | <1,10)| yes   | no 
   |
| `` | 2 [-4] | <10,100)  | yes   | no 
   |
| ``| 3 [-3] | <100,200) | no| yes
   |
| header  | 4 [-2] | <200,300) | yes -> no | no -> yes  
   |

The `` buffer is parent of the header, but the `` 
buffer ends up being created in between them. This causes the `isBefore` 
relationships to not be sorted/partitioned, as one might expect.

This is the reason the binary search in 
`ASTReader::findPreprocessedEntitiesInRange()` broke.

So the special case in `isBeforeInTranslationUnit()` that moves `` 
before all other entries makes sense, but now needs to happen earlier, before 
calling `isInTheSameTranslationUnit()`. That's what I've done in the latest 
commit, and it actually fixes "Index/annotate-module.m". (I guess we should 
also consider merging both of those functions, because 
`isInTheSameTranslationUnit()` doesn't really stand on its own now.) 
Unfortunately, "SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp" started 
failing, so I'll look into that next.

@benlangmuir, let me know if this explanation makes sense.

Also adding @sam-mccall as a reviewer, since he's worked in this area recently.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-27 Thread Ben Langmuir via cfe-commits

benlangmuir wrote:

> I changed SourceManager::isInTheSameTranslationUnit() to check for that case 
> explicitly instead of relying on the fact/bug that built-ins buffers happened 
> to be assigned an untranslated include SourceLocation.

What is the translated location you get here?  It's hard for me to tell if the 
comment in `isBeforeInTranslationUnit` is describing the desired behaviour or 
just a consequence of how it happened to work at the time.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-25 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 resolved 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-25 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 1e36c68229a98ffbdb61ea652077bc3356fc177f Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/3] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-25 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 updated 
https://github.com/llvm/llvm-project/pull/66962

>From 1e36c68229a98ffbdb61ea652077bc3356fc177f Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH 1/2] [clang][modules] Remove preloaded SLocEntries from PCM
 files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-25 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

Ok, the comment below seems to suggests that the built-ins buffers are not 
considered to be part of any TU:

```c++
bool SourceManager::isBeforeInTranslationUnit(SourceLocation LHS,
  SourceLocation RHS) const {
  // ...

  if (isInTheSameTranslationUnit(LHS, RHS))
return X;

  // If we arrived here, the location is either in a built-ins buffer or
  // associated with global inline asm. PR5662 and PR22576 are examples.

  // ...
}
```

I changed `SourceManager::isInTheSameTranslationUnit()` to check for that case 
explicitly instead of relying on the fact/bug that built-ins buffers happened 
to be assigned an untranslated include `SourceLocation`.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-25 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

The "Index/annotate-module.m" test fails with this patch. For run line 29, the 
previous input is:
```
Punctuation: "#" [1:1 - 1:2] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Identifier: "include" [1:2 - 1:9] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Punctuation: "<" [1:10 - 1:11] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Identifier: "Module" [1:11 - 1:17] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Punctuation: "/" [1:17 - 1:18] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Identifier: "Sub2" [1:18 - 1:22] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Punctuation: "." [1:22 - 1:23] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Identifier: "h" [1:23 - 1:24] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Punctuation: ">" [1:24 - 1:25] inclusion directive=Module/Sub2.h 
(/Users/Jan/Code/upstream-llvm/clang/test/Index/../Modules/Inputs/Module.framework/Headers/Sub2.h)
Keyword: "int" [2:1 - 2:4] VarDecl=Module_Sub:2:6
Punctuation: "*" [2:5 - 2:6] VarDecl=Module_Sub:2:6
Identifier: "Module_Sub" [2:6 - 2:16] VarDecl=Module_Sub:2:6
Punctuation: ";" [2:16 - 2:17]
...
```
while the current output is:
```
Punctuation: "#" [1:1 - 1:2] preprocessing directive=
Identifier: "include" [1:2 - 1:9] preprocessing directive=
Punctuation: "<" [1:10 - 1:11] preprocessing directive=
Identifier: "Module" [1:11 - 1:17] preprocessing directive=
Punctuation: "/" [1:17 - 1:18] preprocessing directive=
Identifier: "Sub2" [1:18 - 1:22] preprocessing directive=
Punctuation: "." [1:22 - 1:23] preprocessing directive=
Identifier: "h" [1:23 - 1:24] preprocessing directive=
Punctuation: ">" [1:24 - 1:25] preprocessing directive=
Keyword: "int" [2:1 - 2:4] VarDecl=Module_Sub:2:6
Punctuation: "*" [2:5 - 2:6] VarDecl=Module_Sub:2:6
Identifier: "Module_Sub" [2:6 - 2:16] VarDecl=Module_Sub:2:6
Punctuation: ";" [2:16 - 2:17]
...
```

This seems to be the stack trace where things go wrong:
```
clang_annotateTokens()
  PreprocessingRecord::getPreprocessedEntitiesInRange(Slow)
ASTReader::findPreprocessedEntitiesInRange()
  ASTReader::findPreprocessedEntity()
SourceManager::isBeforeInTranslationUnit()
  SourceManager::isInTheSameTranslationUnit()
MoveUpIncludeHierarchy()
  SourceManager::getDecomposedIncludedLoc()
```

Here's my current understanding:
* previously:
  * `ModuleFile` gets created with **untranslated** `IncludeLoc`
  * built-in buffer is preloaded and inherits the **untranslated** `IncludeLoc`
  * `ModuleFile::IncludeLoc` gets translated
  * `SourceManager::getDecomposedIncludedLoc()` reports the incorrect 
`IncludeLoc` for the built-in buffer
* currently:
  * `ModuleFile` gets created with **untranslated** `IncludeLoc`
  * `ModuleFile::IncludeLoc` gets translated
  * built-in buffer is loaded on-demand and inherits the **translated** 
`IncludeLoc`
  * `SourceManager::getDecomposedIncludedLoc()` reports the correct `IncludeLoc`

I need to dig a bit deeper to figure out what exactly needs fixing here.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-21 Thread Ben Langmuir via cfe-commits

https://github.com/benlangmuir approved this pull request.

LGTM other than my nit about the comment

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-21 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> > in my experience, it's not actually referenced in most modules
> 
> Can you clarify what you looked at here? I assume you checked non-scanner 
> modules as well?

I took a typical Xcode project and chose one TU that happens to transitively 
depend on 37 modules. I built it with implicit modules and scanned its 
dependencies while counting all `SourceManager` requests specifically asking 
for the predefines `SLocEntry`:

| workload   | cache | before | after | change [%] |
||---||---||
| implicit build | clean |   2077 |  1197 |  -42.4 |
| implicit build |  full | 84 |19 |  -77.3 |
| scan   | clean |   1015 |   568 |  -44.0 |
| scan   |  full | 39 |18 |  -53.8 |

So saying *most* was incorrect. The data suggests that importers don't care 
about the predefines of ~43% of their transitive dependencies. Note that this 
number is higher when the cache is full, where the percentage only represents 
requests issued from the main TU. So it makes sense that deserializing 
predefines `SLocEntry` on-demand is a performance win, especially since it 
shouldn't have any overhead compared to the preloading approach.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-21 Thread Ben Langmuir via cfe-commits

https://github.com/benlangmuir edited 
https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-21 Thread Ben Langmuir via cfe-commits

https://github.com/benlangmuir commented:

> in my experience, it's not actually referenced in most modules

Can you clarify what you looked at here? I assume you checked non-scanner 
modules as well?

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-21 Thread Ben Langmuir via cfe-commits


@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.

benlangmuir wrote:

Nit: this should be a normal `//` comment, because it doesn't attach to 
anything as a doc comment.

https://github.com/llvm/llvm-project/pull/66962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-20 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-modules


Changes

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.


---
Full diff: https://github.com/llvm/llvm-project/pull/66962.diff


4 Files Affected:

- (modified) clang/include/clang/Serialization/ASTBitCodes.h (+2-8) 
- (modified) clang/include/clang/Serialization/ModuleFile.h (-3) 
- (modified) clang/lib/Serialization/ASTReader.cpp (-23) 
- (modified) clang/lib/Serialization/ASTWriter.cpp (-8) 


``diff
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index = int(F.PreloadSLocEntries[I] - 1) + F.SLocEntryBaseID;
-  // Load it through the SourceManager and don't call ReadSLocEntry()
-  // directly because the entry may have already been loaded in which case
-  // calling ReadSLocEntry() directly would trigger an assertion in
-  // SourceManager.
-  SourceMgr.getLoadedSLocEntryByID(Index);
-}
-
 // Map the original source file ID into the ID space of the current
 // compilation.
 if (F.OriginalSourceFileID.isValid())
diff --git a/clang/lib/Serialization/ASTWriter.cpp 
b/clang/lib/Serialization/ASTWriter.cpp
index 65bee806d2c5571..3392243d7ac4ba7 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -838,7 +838,6 @@ void 

[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

2023-09-20 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 created 
https://github.com/llvm/llvm-project/pull/66962

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.


>From 1e36c68229a98ffbdb61ea652077bc3356fc177f Mon Sep 17 00:00:00 2001
From: Jan Svoboda 
Date: Wed, 20 Sep 2023 16:50:48 -0700
Subject: [PATCH] [clang][modules] Remove preloaded SLocEntries from PCM files

This commit removes the list of SLocEntry offsets to preload eagerly from PCM 
files. Commit introducing this functionality (258ae54a) doesn't clarify why 
this would be more performant than the lazy approach used regularly.

Currently, the only SLocEntry the reader is supposed to preload is the 
predefines buffer, but in my experience, it's not actually referenced in most 
modules, so the time spent deserializing its SLocEntry is wasted. This is 
especially noticeable in the dependency scanner, where this change brings 4.56% 
speedup on my benchmark.
---
 .../include/clang/Serialization/ASTBitCodes.h | 10 ++--
 .../include/clang/Serialization/ModuleFile.h  |  3 ---
 clang/lib/Serialization/ASTReader.cpp | 23 ---
 clang/lib/Serialization/ASTWriter.cpp |  8 ---
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 9e115f2a5cce3f9..6cbaff71cc33960 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -51,7 +51,7 @@ const unsigned VERSION_MAJOR = 29;
 /// for the previous version could still support reading the new
 /// version by ignoring new kinds of subblocks), this number
 /// should be increased.
-const unsigned VERSION_MINOR = 0;
+const unsigned VERSION_MINOR = 1;
 
 /// An ID number that refers to an identifier in an AST file.
 ///
@@ -524,13 +524,7 @@ enum ASTRecordTypes {
   /// of source-location information.
   SOURCE_LOCATION_OFFSETS = 14,
 
-  /// Record code for the set of source location entries
-  /// that need to be preloaded by the AST reader.
-  ///
-  /// This set contains the source location entry for the
-  /// predefines buffer and for any file entries that need to be
-  /// preloaded.
-  SOURCE_LOCATION_PRELOADS = 15,
+  /// ID 15 used to be for source location entry preloads.
 
   /// Record code for the set of ext_vector type names.
   EXT_VECTOR_DECLS = 16,
diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 0af5cae6aebc375..48be8676cc26a4c 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -292,9 +292,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// SLocEntries that we're going to preload.
-  SmallVector PreloadSLocEntries;
-
   /// Remapping table for source locations in this module.
   ContinuousRangeMap
   SLocRemap;
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0952244d037a77c..e796617455ac390 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3226,7 +3226,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   case SOURCE_LOCATION_OFFSETS:
   case MODULE_OFFSET_MAP:
   case SOURCE_MANAGER_LINE_TABLE:
-  case SOURCE_LOCATION_PRELOADS:
   case PPD_ENTITIES_OFFSETS:
   case HEADER_SEARCH_TABLE:
   case IMPORTED_MODULES:
@@ -3576,18 +3575,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile ,
   ParseLineTable(F, Record);
   break;
 
-case SOURCE_LOCATION_PRELOADS: {
-  // Need to transform from the local view (1-based IDs) to the global 
view,
-  // which is based off F.SLocEntryBaseID.
-  if (!F.PreloadSLocEntries.empty())
-return llvm::createStringError(
-std::errc::illegal_byte_sequence,
-"Multiple SOURCE_LOCATION_PRELOADS records in AST file");
-
-  F.PreloadSLocEntries.swap(Record);
-  break;
-}
-
 case EXT_VECTOR_DECLS:
   for (unsigned I = 0, N = Record.size(); I != N; ++I)
 ExtVectorDecls.push_back(getGlobalDeclID(F, Record[I]));
@@ -4417,16 +4404,6 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
   for (ImportedModule  : Loaded) {
 ModuleFile  = *M.Mod;
 
-// Preload SLocEntries.
-for (unsigned I = 0, N = F.PreloadSLocEntries.size(); I != N; ++I) {
-  int Index =