Author: Chuanqi Xu
Date: 2024-05-06T13:35:16+08:00
New Revision: 947b06282324db8fe2784c4054af9de493a876af

URL: 
https://github.com/llvm/llvm-project/commit/947b06282324db8fe2784c4054af9de493a876af
DIFF: 
https://github.com/llvm/llvm-project/commit/947b06282324db8fe2784c4054af9de493a876af.diff

LOG: Reland "[Modules] No transitive source location change (#86912)"

This relands 6c31104.

The patch was reverted due to incorrectly introduced alignment. And the
patch was re-commited after fixing the alignment issue.

Following off are the original message:

This is part of "no transitive change" patch series, "no transitive
source location change". I talked this with @Bigcheese in the tokyo's
WG21 meeting.

The idea comes from @jyknight posted on LLVM discourse. That for:

```
// A.cppm
export module A;
...

// B.cppm
export module B;
import A;
...

//--- C.cppm
export module C;
import C;
```

Almost every time A.cppm changes, we need to recompile `B`. Due to we
think the source location is significant to the semantics. But it may be
good if we can avoid recompiling `C` if the change from `A` wouldn't
change the BMI of B.

This patch only cares source locations. So let's focus on source
location's example. We can see the full example from the attached test.

```
//--- A.cppm
export module A;
export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- A.v1.cppm
export module A;

export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- B.cppm
export module B;
import A;

export int funcB() {
    return funcA();
}

//--- C.cppm
export module C;
import A;
export void testD() {
    C<int> c;
    c.func();
}
```

Here the only difference between `A.cppm` and `A.v1.cppm` is that
`A.v1.cppm` has an additional blank line. Then the test shows that two
BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other
specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same
contents.

However, it is a different story for C, since C instantiates templates
from A, and the instantiation records the source information from module
A, which is different from `A` and `A.v1`, so it is expected that the
BMI `C.pcm` and `C.v1.pcm` can and should differ.

To fully understand the patch, we need to understand how we encodes
source locations and how we serialize and deserialize them.

For source locations, we encoded them as:

```
|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0
```

As the diagram shows, we encode the local (unloaded) source location
from 0 to higher bits. And we allocate the space for source locations
from the loaded modules from high bits to 0. Then the source locations
from the loaded modules will be mapped to our source location space
according to the allocated offset.

For example, for,

```
// a.cppm
export module a;
...

// b.cppm
export module b;
import a;
...
```

Assuming the offset of a source location (let's name the location as
`S`) in a.cppm is 45 and we will record the value `45` into the BMI
`a.pcm`. Then in b.cppm, when we import a, the source manager will
allocate a space for module 'a' (according to the recorded number of
source locations) as the base offset of module 'a' in the current source
location spaces. Let's assume the allocated base offset as 90 in this
example. Then when we want to get the location in the current source
location space for `S`, we can get it simply by adding `45` to `90` to
`135`. Finally we can get the source location for `S` in module B as
`135`.

And when we want to write module `b`, we would also write the source
location of `S` as `135` directly in the BMI. And to clarify the
location `S` comes from module `a`, we also need to record the base
offset of module `a`, 90 in the BMI of `b`.

Then the problem comes. Since the base offset of module 'a' is computed
by the number source locations in module 'a'. In module 'b', the
recorded base offset of module 'a' will change every time the number of
source locations in module 'a' increase or decrease. In other words, the
contents of BMI of B will change every time the number of locations in
module 'a' changes. This is pretty sensitive. Almost every change will
change the number of locations. So this is the problem this patch want
to solve.

Let's continue with the existing design to understand what's going on.
Another interesting case is:

```
// c.cppm
export module c;
import whatever;
import a;
import b;
...
```

In `c.cppm`, when we import `a`, we still need to allocate a base
location offset for it, let's say the value becomes to `200` somehow.
Then when we reach the location `S` recorded in module `b`, we need to
translate it into the current source location space. The solution is
quite simple, we can get it by `135 + (200 - 90) = 245`. In another
word, the offset of a source location in current module can be computed
as `Recorded Offset + Base Offset of the its module file - Recorded Base
Offset`.

Then we're almost done about how we handle the offset of source
locations in serializers.

>From the abstract level, what we want to do is to remove the hardcoded
base offset of imported modules and remain the ability to calculate the
source location in a new module unit. To achieve this, we need to be
able to find the module file owning a source location from the encoding
of the source location.

So in this patch, for each source location, we will store the local
offset of the location and the module file index. For the above example,
in `b.pcm`, the source location of `S` will be recorded as `135`
directly. And in the new design, the source location of `S` will be
recorded as `<1, 45>`. Here `1` stands for the module file index of `a`
in module `b`. And `45` means the offset of `S` to the base offset of
module `a`.

So the trade-off here is that, to make the BMI more independent, we need
to record more abstract information. And I feel it is worthy. The
recompilation problem of modules is really annoying and there are still
people complaining this. But if we can make this (including stopping
other changes transitively), I think this may be a killer feature for
modules. And from @Bigcheese , this should be helpful for clang explicit
modules too.

And the benchmarking side, I tested this patch against
https://github.com/alibaba/async_simple/tree/CXX20Modules. No
significant change on compilation time. The size of .pcm files becomes
to 204M from 200M. I think the trade-off is pretty fair.

I didn't use another slot to record the module file index. I tried to
use the higher 32 bits of the existing source location encodings to
store that information. This design may be safe. Since we use `unsigned`
to store source locations but we use uint64_t in serialization. And
generally `unsigned` is 32 bit width in most platforms. So it might not
be a safe problem. Since all the bits we used to store the module file
index is not used before. So the new encodings may be:

```
   |-----------------------|-----------------------|
   |           A           |         B         | C |

  * A: 32 bit. The index of the module file in the module manager + 1.
  * The +1
          here is necessary since we wish 0 stands for the current
module file.
  * B: 31 bit. The offset of the source location to the module file
  * containing it.
  * C: The macro bit. We rotate it to the lowest bit so that we can save
  * some
          space in case the index of the module file is 0.
```

(The B and C is the existing raw encoding for source locations)

Another reason to reuse the same slot of the source location is to
reduce the impact of the patch. Since there are a lot of places assuming
we can store and get a source location from a slot. And if I tried to
add another slot, a lot of codes breaks. I don't feel it is worhty.

Another impact of this decision is that, the existing small
optimizations for encoding source location may be invalided. The key of
the optimization is that we can turn large values into small values then
we can use VBR6 format to reduce the size. But if we decided to put the
module file index into the higher bits, then maybe it simply doesn't
work. An example may be the `SourceLocationSequence` optimization.

This will only affect the size of on-disk .pcm files. I don't expect
this impact the speed and memory use of compilations. And seeing my
small experiments above, I feel this trade off is worthy.

The mental model for handling source location offsets is not so complex
and I believe we can solve it by adding module file index to each stored
source location.

For the practical side, since the source location is pretty sensitive,
and the patch can pass all the in-tree tests and a small scale projects,
I feel it should be correct.

I'll continue to work on no transitive decl change and no transitive
identifier change (if matters) to achieve the goal to stop the
propagation of unnecessary changes. But all of this depends on this
patch. Since, clearly, the source locations are the most sensitive
thing.

---

The release nots and documentation will be added seperately.

Added: 
    clang/test/Modules/no-transitive-source-location-change.cppm

Modified: 
    clang/include/clang/Basic/SourceLocation.h
    clang/include/clang/Serialization/ASTBitCodes.h
    clang/include/clang/Serialization/ASTReader.h
    clang/include/clang/Serialization/ASTWriter.h
    clang/include/clang/Serialization/ModuleFile.h
    clang/include/clang/Serialization/SourceLocationEncoding.h
    clang/lib/Frontend/ASTUnit.cpp
    clang/lib/Serialization/ASTReader.cpp
    clang/lib/Serialization/ASTReaderDecl.cpp
    clang/lib/Serialization/ASTWriter.cpp
    clang/lib/Serialization/ASTWriterDecl.cpp
    clang/lib/Serialization/ModuleFile.cpp
    clang/unittests/Serialization/SourceLocationEncodingTest.cpp

Removed: 
    


################################################################################
diff  --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait<SourceLocation, void>;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;

diff  --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index b7aaceacc00627..ae9521e4270991 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -23,6 +23,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include <cassert>
@@ -165,99 +166,95 @@ using SubmoduleID = uint32_t;
 /// The number of predefined submodule IDs.
 const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
+/// 32 aligned uint64_t in the AST file. Use splitted 64-bit integer into
+/// low/high parts to keep structure alignment 32-bit (it is important
+/// because blobs in bitstream are 32-bit aligned). This structure is
+/// serialized "as is" to the AST file.
+class UnalignedUInt64 {
+  uint32_t BitLow = 0;
+  uint32_t BitHigh = 0;
+
+public:
+  UnalignedUInt64() = default;
+  UnalignedUInt64(uint64_t BitOffset) { set(BitOffset); }
+
+  void set(uint64_t Offset) {
+    BitLow = Offset;
+    BitHigh = Offset >> 32;
+  }
+
+  uint64_t get() const { return BitLow | (uint64_t(BitHigh) << 32); }
+};
+
 /// Source range/offset of a preprocessed entity.
-struct PPEntityOffset {
+class PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  UnalignedUInt64 Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  UnalignedUInt64 End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-        BitOffset(BitOffset) {}
+public:
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+      : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
+  RawLocEncoding getBegin() const { return Begin.get(); }
+  RawLocEncoding getEnd() const { return End.get(); }
 
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  uint32_t getOffset() const { return BitOffset; }
 };
 
 /// Source range of a skipped preprocessor region
-struct PPSkippedRange {
+class PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  UnalignedUInt64 Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  UnalignedUInt64 End;
 
-  PPSkippedRange(SourceRange R)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+public:
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+      : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin.get(); }
+  RawLocEncoding getEnd() const { return End.get(); }
 };
 
-/// Offset in the AST file. Use splitted 64-bit integer into low/high
-/// parts to keep structure alignment 32-bit (it is important because
-/// blobs in bitstream are 32-bit aligned). This structure is serialized
-/// "as is" to the AST file.
-struct UnderalignedInt64 {
-  uint32_t BitOffsetLow = 0;
-  uint32_t BitOffsetHigh = 0;
-
-  UnderalignedInt64() = default;
-  UnderalignedInt64(uint64_t BitOffset) { setBitOffset(BitOffset); }
-
-  void setBitOffset(uint64_t Offset) {
-    BitOffsetLow = Offset;
-    BitOffsetHigh = Offset >> 32;
-  }
-
-  uint64_t getBitOffset() const {
-    return BitOffsetLow | (uint64_t(BitOffsetHigh) << 32);
-  }
-};
+/// Source location and bit offset of a declaration. Keep
+/// structure alignment 32-bit since the blob is assumed as 32-bit aligned.
+class DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
 
-/// Source location and bit offset of a declaration.
-struct DeclOffset {
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  UnalignedUInt64 RawLoc;
 
-  /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
-  /// structure alignment 32-bit and avoid padding gap because undefined
-  /// value in the padding affects AST hash.
-  UnderalignedInt64 BitOffset;
+  /// Offset relative to the start of the DECLTYPES_BLOCK block.
+  UnalignedUInt64 BitOffset;
 
+public:
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
-             uint64_t DeclTypesBlockStartOffset) {
-    setLocation(Loc);
+  DeclOffset(RawLocEncoding RawLoc, uint64_t BitOffset,
+             uint64_t DeclTypesBlockStartOffset)
+      : RawLoc(RawLoc) {
     setBitOffset(BitOffset, DeclTypesBlockStartOffset);
   }
 
-  void setLocation(SourceLocation L) { Loc = L.getRawEncoding(); }
+  void setRawLoc(RawLocEncoding Loc) { RawLoc = Loc; }
 
-  SourceLocation getLocation() const {
-    return SourceLocation::getFromRawEncoding(Loc);
-  }
+  RawLocEncoding getRawLoc() const { return RawLoc.get(); }
 
   void setBitOffset(uint64_t Offset, const uint64_t DeclTypesBlockStartOffset) 
{
-    BitOffset.setBitOffset(Offset - DeclTypesBlockStartOffset);
+    BitOffset.set(Offset - DeclTypesBlockStartOffset);
   }
 
   uint64_t getBitOffset(const uint64_t DeclTypesBlockStartOffset) const {
-    return BitOffset.getBitOffset() + DeclTypesBlockStartOffset;
+    return BitOffset.get() + DeclTypesBlockStartOffset;
   }
 };
 

diff  --git a/clang/include/clang/Serialization/ASTReader.h 
b/clang/include/clang/Serialization/ASTReader.h
index 64f1ebc117b327..e24fa121528f3f 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -1771,6 +1771,7 @@ class ASTReader
 
   /// Retrieve the module manager.
   ModuleManager &getModuleManager() { return ModuleMgr; }
+  const ModuleManager &getModuleManager() const { return ModuleMgr; }
 
   /// Retrieve the preprocessor.
   Preprocessor &getPreprocessor() const { return PP; }
@@ -2177,8 +2178,8 @@ class ASTReader
 
   /// Retrieve the global submodule ID given a module and its local ID
   /// number.
-  serialization::SubmoduleID
-  getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID);
+  serialization::SubmoduleID getGlobalSubmoduleID(ModuleFile &M,
+                                                  unsigned LocalID) const;
 
   /// Retrieve the submodule that corresponds to a global submodule ID.
   ///
@@ -2191,7 +2192,7 @@ class ASTReader
 
   /// Retrieve the module file with a given local ID within the specified
   /// ModuleFile.
-  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID);
+  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID) const;
 
   /// Get an ID for the given module file.
   unsigned getModuleFileID(ModuleFile *M);
@@ -2227,33 +2228,46 @@ class ASTReader
     return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-                                                LocSeq *Seq = nullptr) const {
+  std::pair<SourceLocation, unsigned>
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+                                 LocSeq *Seq = nullptr) const {
     return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile &ModuleFile,
-                                    SourceLocation::UIntTy Raw,
+  SourceLocation ReadSourceLocation(ModuleFile &MF, RawLocEncoding Raw,
                                     LocSeq *Seq = nullptr) const {
-    SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-    return TranslateSourceLocation(ModuleFile, Loc);
+    if (!MF.ModuleOffsetMap.empty())
+      ReadModuleOffsetMap(MF);
+
+    auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+    ModuleFile *OwningModuleFile =
+        ModuleFileIndex == 0 ? &MF : MF.DependentModules[ModuleFileIndex - 1];
+
+    assert(!SourceMgr.isLoadedSourceLocation(Loc) &&
+           "Run out source location space");
+
+    return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile &ModuleFile,
                                          SourceLocation Loc) const {
-    if (!ModuleFile.ModuleOffsetMap.empty())
-      ReadModuleOffsetMap(ModuleFile);
-    assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-               ModuleFile.SLocRemap.end() &&
-           "Cannot find offset to remap.");
-    SourceLocation::IntTy Remap =
-        ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-    return Loc.getLocWithOffset(Remap);
+    if (Loc.isInvalid())
+      return Loc;
+
+    // FIXME: TranslateSourceLocation is not re-enterable. It is problematic
+    // to call TranslateSourceLocation on a translated source location.
+    // We either need a method to know whether or not a source location is
+    // translated or refactor the code to make it clear that
+    // TranslateSourceLocation won't be called with translated source location.
+
+    return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2);
   }
 
   /// Read a source location.

diff  --git a/clang/include/clang/Serialization/ASTWriter.h 
b/clang/include/clang/Serialization/ASTWriter.h
index a55dfd327670f6..6847c1db39c8ac 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -274,7 +274,7 @@ class ASTWriter : public ASTDeserializationListener,
 
   /// Offset of each type in the bitstream, indexed by
   /// the type's ID.
-  std::vector<serialization::UnderalignedInt64> TypeOffsets;
+  std::vector<serialization::UnalignedUInt64> TypeOffsets;
 
   /// The first ID number we can use for our own identifiers.
   serialization::IdentID FirstIdentID = serialization::NUM_PREDEF_IDENT_IDS;
@@ -676,6 +676,10 @@ class ASTWriter : public ASTDeserializationListener,
   void AddSourceLocation(SourceLocation Loc, RecordDataImpl &Record,
                          LocSeq *Seq = nullptr);
 
+  /// Return the raw encodings for source locations.
+  SourceLocationEncoding::RawLocEncoding
+  getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq = nullptr);
+
   /// Emit a source range.
   void AddSourceRange(SourceRange Range, RecordDataImpl &Record,
                       LocSeq *Seq = nullptr);

diff  --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 25f644e76edb1a..8e0fce1fd48ce4 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -295,10 +295,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// Remapping table for source locations in this module.
-  ContinuousRangeMap<SourceLocation::UIntTy, SourceLocation::IntTy, 2>
-      SLocRemap;
-
   // === Identifiers ===
 
   /// The number of identifiers in this AST file.
@@ -495,7 +491,7 @@ class ModuleFile {
 
   /// Offset of each type within the bitstream, indexed by the
   /// type ID, or the representation of a Type*.
-  const UnderalignedInt64 *TypeOffsets = nullptr;
+  const UnalignedUInt64 *TypeOffsets = nullptr;
 
   /// Base type ID for types local to this module as represented in
   /// the global type ID space.
@@ -512,9 +508,17 @@ class ModuleFile {
   /// List of modules which depend on this module
   llvm::SetVector<ModuleFile *> ImportedBy;
 
-  /// List of modules which this module depends on
+  /// List of modules which this module directly imported
   llvm::SetVector<ModuleFile *> Imports;
 
+  /// List of modules which this modules dependent on. Different
+  /// from `Imports`, this includes indirectly imported modules too.
+  /// The order of DependentModules is significant. It should keep
+  /// the same order with that module file manager when we write
+  /// the current module file. The value of the member will be initialized
+  /// in `ASTReader::ReadModuleOffsetMap`.
+  llvm::SmallVector<ModuleFile *, 16> DependentModules;
+
   /// Determine whether this module was directly imported at
   /// any point during translation.
   bool isDirectlyImported() const { return DirectlyImported; }

diff  --git a/clang/include/clang/Serialization/SourceLocationEncoding.h 
b/clang/include/clang/Serialization/SourceLocationEncoding.h
index 9bb0dbe2e4d6f6..33ca1728fa4797 100644
--- a/clang/include/clang/Serialization/SourceLocationEncoding.h
+++ b/clang/include/clang/Serialization/SourceLocationEncoding.h
@@ -6,28 +6,33 @@
 //
 
//===----------------------------------------------------------------------===//
 //
-// Source locations are stored pervasively in the AST, making up a third of
-// the size of typical serialized files. Storing them efficiently is important.
+// We wish to encode the SourceLocation from other module file not dependent
+// on the other module file. So that the source location changes from other
+// module file may not affect the contents of the current module file. Then the
+// users don't need to recompile the whole project due to a new line in a 
module
+// unit in the root of the dependency graph.
 //
-// We use integers optimized by VBR-encoding, because:
-//  - when abbreviations cannot be used, VBR6 encoding is our only choice
-//  - in the worst case a SourceLocation can be ~any 32-bit number, but in
-//    practice they are highly predictable
+// To achieve this, we need to encode the index of the module file into the
+// encoding of the source location. The encoding of the source location may be:
 //
-// We encode the integer so that likely values encode as small numbers that
-// turn into few VBR chunks:
-//  - the invalid sentinel location is a very common value: it encodes as 0
-//  - the "macro or not" bit is stored at the bottom of the integer
-//    (rather than at the top, as in memory), so macro locations can have
-//    small representations.
-//  - related locations (e.g. of a left and right paren pair) are usually
-//    similar, so when encoding a sequence of locations we store only
-//    
diff erences between successive elements.
+//      |-----------------------|-----------------------|
+//      |          A            |         B         | C |
+//
+//  * A: 32 bit. The index of the module file in the module manager + 1. The +1
+//  here is necessary since we wish 0 stands for the current module file.
+//  * B: 31 bit. The offset of the source location to the module file 
containing
+//  it.
+//  * C: The macro bit. We rotate it to the lowest bit so that we can save some
+//  space in case the index of the module file is 0.
+//
+// Specially, if the index of the module file is 0, we allow to encode a
+// sequence of locations we store only 
diff erences between successive elements.
 //
 
//===----------------------------------------------------------------------===//
 
-#include <climits>
 #include "clang/Basic/SourceLocation.h"
+#include "llvm/Support/MathExtras.h"
+#include <climits>
 
 #ifndef LLVM_CLANG_SERIALIZATION_SOURCELOCATIONENCODING_H
 #define LLVM_CLANG_SERIALIZATION_SOURCELOCATIONENCODING_H
@@ -52,9 +57,13 @@ class SourceLocationEncoding {
   friend SourceLocationSequence;
 
 public:
-  static uint64_t encode(SourceLocation Loc,
-                         SourceLocationSequence * = nullptr);
-  static SourceLocation decode(uint64_t, SourceLocationSequence * = nullptr);
+  using RawLocEncoding = uint64_t;
+
+  static RawLocEncoding encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence * = nullptr);
+  static std::pair<SourceLocation, unsigned>
+  decode(RawLocEncoding, SourceLocationSequence * = nullptr);
 };
 
 /// Serialized encoding of a sequence of SourceLocations.
@@ -149,14 +158,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return &Seq; }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-                                               SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the 
diff erences.
+  if (!BaseOffset)
+    return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+    return 0;
+
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+
+  // 16 bits should be sufficient to store the module file index.
+  assert(BaseModuleFileIndex < (1 << 16));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;
+  return Encoded;
 }
-inline SourceLocation
-SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) {
-  return Seq ? Seq->decode(Encoded)
-             : SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+inline std::pair<SourceLocation, unsigned>
+SourceLocationEncoding::decode(RawLocEncoding Encoded,
+                               SourceLocationSequence *Seq) {
+  unsigned ModuleFileIndex = Encoded >> 32;
+
+  if (!ModuleFileIndex)
+    return {Seq ? Seq->decode(Encoded)
+                : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)),
+            ModuleFileIndex};
+
+  Encoded &= llvm::maskTrailingOnes<RawLocEncoding>(32);
+  SourceLocation Loc = SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+
+  return {Loc, ModuleFileIndex};
 }
 
 } // namespace clang

diff  --git a/clang/lib/Frontend/ASTUnit.cpp b/clang/lib/Frontend/ASTUnit.cpp
index 1b93588553a276..755aaddc0ad747 100644
--- a/clang/lib/Frontend/ASTUnit.cpp
+++ b/clang/lib/Frontend/ASTUnit.cpp
@@ -2374,8 +2374,6 @@ bool ASTUnit::serialize(raw_ostream &OS) {
   return serializeUnit(Writer, Buffer, getSema(), OS);
 }
 
-using SLocRemap = ContinuousRangeMap<unsigned, int, 2>;
-
 void ASTUnit::TranslateStoredDiagnostics(
                           FileManager &FileMgr,
                           SourceManager &SrcMgr,

diff  --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 36ec6d6b2eb7ea..4050d66d45170c 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3038,8 +3038,10 @@ ASTReader::ReadControlBlock(ModuleFile &F,
         // The import location will be the local one for now; we will adjust
         // all import locations of module imports after the global source
         // location info are setup, in ReadAST.
-        SourceLocation ImportLoc =
+        auto [ImportLoc, ImportModuleFileIndex] =
             ReadUntranslatedSourceLocation(Record[Idx++]);
+        // The import location must belong to the current module file itself.
+        assert(ImportModuleFileIndex == 0);
         off_t StoredSize = !IsImportingStdCXXModule ? (off_t)Record[Idx++] : 0;
         time_t StoredModTime =
             !IsImportingStdCXXModule ? (time_t)Record[Idx++] : 0;
@@ -3347,7 +3349,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
         return llvm::createStringError(
             std::errc::illegal_byte_sequence,
             "duplicate TYPE_OFFSET record in AST file");
-      F.TypeOffsets = reinterpret_cast<const UnderalignedInt64 *>(Blob.data());
+      F.TypeOffsets = reinterpret_cast<const UnalignedUInt64 *>(Blob.data());
       F.LocalNumTypes = Record[0];
       unsigned LocalBaseTypeIndex = Record[1];
       F.BaseTypeIndex = getTotalNumTypes();
@@ -3661,13 +3663,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
           std::make_pair(SourceManager::MaxLoadedOffset - F.SLocEntryBaseOffset
                            - SLocSpaceSize,&F));
 
-      // Initialize the remapping table.
-      // Invalid stays invalid.
-      F.SLocRemap.insertOrReplace(std::make_pair(0U, 0));
-      // This module. Base was 2 when being compiled.
-      F.SLocRemap.insertOrReplace(std::make_pair(
-          2U, static_cast<SourceLocation::IntTy>(F.SLocEntryBaseOffset - 2)));
-
       TotalNumSLocEntries += F.LocalNumSLocEntries;
       break;
     }
@@ -4055,18 +4050,7 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const 
{
   const unsigned char *DataEnd = Data + F.ModuleOffsetMap.size();
   F.ModuleOffsetMap = StringRef();
 
-  // If we see this entry before SOURCE_LOCATION_OFFSETS, add placeholders.
-  if (F.SLocRemap.find(0) == F.SLocRemap.end()) {
-    F.SLocRemap.insert(std::make_pair(0U, 0));
-    F.SLocRemap.insert(std::make_pair(2U, 1));
-  }
-
-  // Continuous range maps we may be updating in our module.
-  using SLocRemapBuilder =
-      ContinuousRangeMap<SourceLocation::UIntTy, SourceLocation::IntTy,
-                         2>::Builder;
   using RemapBuilder = ContinuousRangeMap<uint32_t, int, 2>::Builder;
-  SLocRemapBuilder SLocRemap(F.SLocRemap);
   RemapBuilder IdentifierRemap(F.IdentifierRemap);
   RemapBuilder MacroRemap(F.MacroRemap);
   RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
@@ -4075,6 +4059,9 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
   RemapBuilder DeclRemap(F.DeclRemap);
   RemapBuilder TypeRemap(F.TypeRemap);
 
+  auto &ImportedModuleVector = F.DependentModules;
+  assert(ImportedModuleVector.empty());
+
   while (Data < DataEnd) {
     // FIXME: Looking up dependency modules by filename is horrible. Let's
     // start fixing this with prebuilt, explicit and implicit modules and see
@@ -4090,15 +4077,14 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) 
const {
                           ? ModuleMgr.lookupByModuleName(Name)
                           : ModuleMgr.lookupByFileName(Name));
     if (!OM) {
-      std::string Msg =
-          "SourceLocation remap refers to unknown module, cannot find ";
+      std::string Msg = "refers to unknown module, cannot find ";
       Msg.append(std::string(Name));
       Error(Msg);
       return;
     }
 
-    SourceLocation::UIntTy SLocOffset =
-        endian::readNext<uint32_t, llvm::endianness::little>(Data);
+    ImportedModuleVector.push_back(OM);
+
     uint32_t IdentifierIDOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t MacroIDOffset =
@@ -4122,13 +4108,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const 
{
                                     static_cast<int>(BaseOffset - Offset)));
     };
 
-    constexpr SourceLocation::UIntTy SLocNone =
-        std::numeric_limits<SourceLocation::UIntTy>::max();
-    if (SLocOffset != SLocNone)
-      SLocRemap.insert(std::make_pair(
-          SLocOffset, static_cast<SourceLocation::IntTy>(
-                          OM->SLocEntryBaseOffset - SLocOffset)));
-
     mapOffset(IdentifierIDOffset, OM->BaseIdentifierID, IdentifierRemap);
     mapOffset(MacroIDOffset, OM->BaseMacroID, MacroRemap);
     mapOffset(PreprocessedEntityIDOffset, OM->BasePreprocessedEntityID,
@@ -6264,8 +6243,8 @@ SourceRange ASTReader::ReadSkippedRange(unsigned 
GlobalIndex) {
   unsigned LocalIndex = GlobalIndex - M->BasePreprocessedSkippedRangeID;
   assert(LocalIndex < M->NumPreprocessedSkippedRanges);
   PPSkippedRange RawRange = M->PreprocessedSkippedRangeOffsets[LocalIndex];
-  SourceRange Range(TranslateSourceLocation(*M, RawRange.getBegin()),
-                    TranslateSourceLocation(*M, RawRange.getEnd()));
+  SourceRange Range(ReadSourceLocation(*M, RawRange.getBegin()),
+                    ReadSourceLocation(*M, RawRange.getEnd()));
   assert(Range.isValid());
   return Range;
 }
@@ -6284,7 +6263,7 @@ PreprocessedEntity 
*ASTReader::ReadPreprocessedEntity(unsigned Index) {
 
   SavedStreamPosition SavedPosition(M.PreprocessorDetailCursor);
   if (llvm::Error Err = M.PreprocessorDetailCursor.JumpToBit(
-          M.MacroOffsetsBase + PPOffs.BitOffset)) {
+          M.MacroOffsetsBase + PPOffs.getOffset())) {
     Error(std::move(Err));
     return nullptr;
   }
@@ -6301,8 +6280,8 @@ PreprocessedEntity 
*ASTReader::ReadPreprocessedEntity(unsigned Index) {
     return nullptr;
 
   // Read the record.
-  SourceRange Range(TranslateSourceLocation(M, PPOffs.getBegin()),
-                    TranslateSourceLocation(M, PPOffs.getEnd()));
+  SourceRange Range(ReadSourceLocation(M, PPOffs.getBegin()),
+                    ReadSourceLocation(M, PPOffs.getEnd()));
   PreprocessingRecord &PPRec = *PP.getPreprocessingRecord();
   StringRef Blob;
   RecordData Record;
@@ -6414,7 +6393,7 @@ struct PPEntityComp {
   }
 
   SourceLocation getLoc(const PPEntityOffset &PPE) const {
-    return Reader.TranslateSourceLocation(M, PPE.getBegin());
+    return Reader.ReadSourceLocation(M, PPE.getBegin());
   }
 };
 
@@ -6458,7 +6437,7 @@ PreprocessedEntityID 
ASTReader::findPreprocessedEntity(SourceLocation Loc,
       PPI = First;
       std::advance(PPI, Half);
       if (SourceMgr.isBeforeInTranslationUnit(
-              TranslateSourceLocation(M, PPI->getEnd()), Loc)) {
+              ReadSourceLocation(M, PPI->getEnd()), Loc)) {
         First = PPI;
         ++First;
         Count = Count - Half - 1;
@@ -6499,7 +6478,7 @@ std::optional<bool> 
ASTReader::isPreprocessedEntityInFileID(unsigned Index,
   unsigned LocalIndex = PPInfo.second;
   const PPEntityOffset &PPOffs = M.PreprocessedEntityOffsets[LocalIndex];
 
-  SourceLocation Loc = TranslateSourceLocation(M, PPOffs.getBegin());
+  SourceLocation Loc = ReadSourceLocation(M, PPOffs.getBegin());
   if (Loc.isInvalid())
     return false;
 
@@ -6690,9 +6669,8 @@ ASTReader::RecordLocation 
ASTReader::TypeCursorForIndex(unsigned Index) {
   GlobalTypeMapType::iterator I = GlobalTypeMap.find(Index);
   assert(I != GlobalTypeMap.end() && "Corrupted global type map");
   ModuleFile *M = I->second;
-  return RecordLocation(
-      M, M->TypeOffsets[Index - M->BaseTypeIndex].getBitOffset() +
-             M->DeclsBlockStartOffset);
+  return RecordLocation(M, M->TypeOffsets[Index - M->BaseTypeIndex].get() +
+                               M->DeclsBlockStartOffset);
 }
 
 static std::optional<Type::TypeClass> getTypeClassForCode(TypeCode code) {
@@ -8975,7 +8953,7 @@ MacroID ASTReader::getGlobalMacroID(ModuleFile &M, 
unsigned LocalID) {
 }
 
 serialization::SubmoduleID
-ASTReader::getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID) {
+ASTReader::getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID) const {
   if (LocalID < NUM_PREDEF_SUBMODULE_IDS)
     return LocalID;
 
@@ -9008,7 +8986,7 @@ Module *ASTReader::getModule(unsigned ID) {
   return getSubmodule(ID);
 }
 
-ModuleFile *ASTReader::getLocalModuleFile(ModuleFile &M, unsigned ID) {
+ModuleFile *ASTReader::getLocalModuleFile(ModuleFile &M, unsigned ID) const {
   if (ID & 1) {
     // It's a module, look it up by submodule ID.
     auto I = GlobalSubmoduleMap.find(getGlobalSubmoduleID(M, ID >> 1));

diff  --git a/clang/lib/Serialization/ASTReaderDecl.cpp 
b/clang/lib/Serialization/ASTReaderDecl.cpp
index 6afbbfe20f26b7..089ede4f49260b 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -3247,7 +3247,7 @@ ASTReader::RecordLocation 
ASTReader::DeclCursorForID(GlobalDeclID ID,
   ModuleFile *M = I->second;
   const DeclOffset &DOffs =
       M->DeclOffsets[ID.get() - M->BaseDeclID - NUM_PREDEF_DECL_IDS];
-  Loc = TranslateSourceLocation(*M, DOffs.getLocation());
+  Loc = ReadSourceLocation(*M, DOffs.getRawLoc());
   return RecordLocation(M, DOffs.getBitOffset(M->DeclsBlockStartOffset));
 }
 

diff  --git a/clang/lib/Serialization/ASTWriter.cpp 
b/clang/lib/Serialization/ASTWriter.cpp
index afe98e803b6c24..cf77b4c0df2bbb 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -2729,8 +2729,10 @@ void 
ASTWriter::WritePreprocessorDetail(PreprocessingRecord &PPRec,
 
     uint64_t Offset = Stream.GetCurrentBitNo() - MacroOffsetsBase;
     assert((Offset >> 32) == 0 && "Preprocessed entity offset too large");
-    PreprocessedEntityOffsets.push_back(
-        PPEntityOffset(getAdjustedRange((*E)->getSourceRange()), Offset));
+    SourceRange R = getAdjustedRange((*E)->getSourceRange());
+    PreprocessedEntityOffsets.emplace_back(
+        getRawSourceLocationEncoding(R.getBegin()),
+        getRawSourceLocationEncoding(R.getEnd()), Offset);
 
     if (auto *MD = dyn_cast<MacroDefinitionRecord>(*E)) {
       // Record this macro definition's ID.
@@ -2797,7 +2799,9 @@ void 
ASTWriter::WritePreprocessorDetail(PreprocessingRecord &PPRec,
     std::vector<PPSkippedRange> SerializedSkippedRanges;
     SerializedSkippedRanges.reserve(SkippedRanges.size());
     for (auto const& Range : SkippedRanges)
-      SerializedSkippedRanges.emplace_back(Range);
+      SerializedSkippedRanges.emplace_back(
+          getRawSourceLocationEncoding(Range.getBegin()),
+          getRawSourceLocationEncoding(Range.getEnd()));
 
     using namespace llvm;
     auto Abbrev = std::make_shared<BitCodeAbbrev>();
@@ -2963,8 +2967,8 @@ void ASTWriter::WriteSubmodules(Module *WritingModule) {
       ParentID = SubmoduleIDs[Mod->Parent];
     }
 
-    uint64_t DefinitionLoc =
-        
SourceLocationEncoding::encode(getAdjustedLocation(Mod->DefinitionLoc));
+    SourceLocationEncoding::RawLocEncoding DefinitionLoc =
+        getRawSourceLocationEncoding(getAdjustedLocation(Mod->DefinitionLoc));
 
     // Emit the definition of the block.
     {
@@ -3247,7 +3251,7 @@ void ASTWriter::WriteType(QualType T) {
     TypeOffsets.emplace_back(Offset);
   else if (TypeOffsets.size() < Index) {
     TypeOffsets.resize(Index + 1);
-    TypeOffsets[Index].setBitOffset(Offset);
+    TypeOffsets[Index].set(Offset);
   } else {
     llvm_unreachable("Types emitted in wrong order");
   }
@@ -5380,7 +5384,6 @@ ASTFileSignature ASTWriter::WriteASTCore(Sema &SemaRef, 
StringRef isysroot,
 
         // These values should be unique within a chain, since they will be 
read
         // as keys into ContinuousRangeMaps.
-        writeBaseIDOrNone(M.SLocEntryBaseOffset, M.LocalNumSLocEntries);
         writeBaseIDOrNone(M.BaseIdentifierID, M.LocalNumIdentifiers);
         writeBaseIDOrNone(M.BaseMacroID, M.LocalNumMacros);
         writeBaseIDOrNone(M.BasePreprocessedEntityID,
@@ -5873,10 +5876,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl 
&Record) {
   Record.push_back(getAdjustedFileID(FID).getOpaqueValue());
 }
 
+SourceLocationEncoding::RawLocEncoding
+ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) {
+  unsigned BaseOffset = 0;
+  unsigned ModuleFileIndex = 0;
+
+  // See SourceLocationEncoding.h for the encoding details.
+  if (Context->getSourceManager().isLoadedSourceLocation(Loc) &&
+      Loc.isValid()) {
+    assert(getChain());
+    auto SLocMapI = getChain()->GlobalSLocOffsetMap.find(
+        SourceManager::MaxLoadedOffset - Loc.getOffset() - 1);
+    assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() &&
+           "Corrupted global sloc offset map");
+    ModuleFile *F = SLocMapI->second;
+    BaseOffset = F->SLocEntryBaseOffset - 2;
+    // 0 means the location is not loaded. So we need to add 1 to the index to
+    // make it clear.
+    ModuleFileIndex = F->Index + 1;
+    assert(&getChain()->getModuleManager()[F->Index] == F);
+  }
+
+  return SourceLocationEncoding::encode(Loc, BaseOffset, ModuleFileIndex, Seq);
+}
+
 void ASTWriter::AddSourceLocation(SourceLocation Loc, RecordDataImpl &Record,
                                   SourceLocationSequence *Seq) {
   Loc = getAdjustedLocation(Loc);
-  Record.push_back(SourceLocationEncoding::encode(Loc, Seq));
+  Record.push_back(getRawSourceLocationEncoding(Loc, Seq));
 }
 
 void ASTWriter::AddSourceRange(SourceRange Range, RecordDataImpl &Record,

diff  --git a/clang/lib/Serialization/ASTWriterDecl.cpp 
b/clang/lib/Serialization/ASTWriterDecl.cpp
index 0edc4feda3ef23..6201d284f0e025 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -2809,14 +2809,16 @@ void ASTWriter::WriteDecl(ASTContext &Context, Decl *D) 
{
 
   // Record the offset for this declaration
   SourceLocation Loc = D->getLocation();
+  SourceLocationEncoding::RawLocEncoding RawLoc =
+      getRawSourceLocationEncoding(getAdjustedLocation(Loc));
+
   unsigned Index = ID.get() - FirstDeclID.get();
   if (DeclOffsets.size() == Index)
-    DeclOffsets.emplace_back(getAdjustedLocation(Loc), Offset,
-                             DeclTypesBlockStartOffset);
+    DeclOffsets.emplace_back(RawLoc, Offset, DeclTypesBlockStartOffset);
   else if (DeclOffsets.size() < Index) {
     // FIXME: Can/should this happen?
     DeclOffsets.resize(Index+1);
-    DeclOffsets[Index].setLocation(getAdjustedLocation(Loc));
+    DeclOffsets[Index].setRawLoc(RawLoc);
     DeclOffsets[Index].setBitOffset(Offset, DeclTypesBlockStartOffset);
   } else {
     llvm_unreachable("declarations should be emitted in ID order");

diff  --git a/clang/lib/Serialization/ModuleFile.cpp 
b/clang/lib/Serialization/ModuleFile.cpp
index db896fd361159d..2c42d33a8f5dd3 100644
--- a/clang/lib/Serialization/ModuleFile.cpp
+++ b/clang/lib/Serialization/ModuleFile.cpp
@@ -59,7 +59,6 @@ LLVM_DUMP_METHOD void ModuleFile::dump() {
   // Remapping tables.
   llvm::errs() << "  Base source location offset: " << SLocEntryBaseOffset
                << '\n';
-  dumpLocalRemap("Source location offset local -> global map", SLocRemap);
 
   llvm::errs() << "  Base identifier ID: " << BaseIdentifierID << '\n'
                << "  Number of identifiers: " << LocalNumIdentifiers << '\n';

diff  --git a/clang/test/Modules/no-transitive-source-location-change.cppm 
b/clang/test/Modules/no-transitive-source-location-change.cppm
new file mode 100644
index 00000000000000..303142a1af890b
--- /dev/null
+++ b/clang/test/Modules/no-transitive-source-location-change.cppm
@@ -0,0 +1,87 @@
+// Testing that adding a new line in a module interface unit won't cause the 
BMI
+// of consuming module unit changes.
+//
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+//
+// RUN: %clang_cc1 -std=c++20 %t/A.cppm -emit-module-interface -o %t/A.pcm
+// RUN: %clang_cc1 -std=c++20 %t/A.v1.cppm -emit-module-interface -o 
%t/A.v1.pcm
+//
+// The BMI may not be the same since the source location 
diff ers.
+// RUN: not 
diff  %t/A.pcm %t/A.v1.pcm &> /dev/null
+//
+// The BMI of B shouldn't change since all the locations remain the same.
+// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-module-interface 
-fmodule-file=A=%t/A.pcm \
+// RUN:     -o %t/B.pcm
+// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-module-interface 
-fmodule-file=A=%t/A.v1.pcm \
+// RUN:     -o %t/B.v1.pcm
+// RUN: 
diff  %t/B.v1.pcm %t/B.pcm  &> /dev/null
+//
+// The BMI of C may change since the locations for instantiations changes.
+// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-module-interface 
-fmodule-file=A=%t/A.pcm \
+// RUN:     -o %t/C.pcm
+// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-module-interface 
-fmodule-file=A=%t/A.v1.pcm \
+// RUN:     -o %t/C.v1.pcm
+// RUN: not 
diff  %t/C.v1.pcm %t/C.pcm  &> /dev/null
+//
+// Test again with reduced BMI.
+// RUN: %clang_cc1 -std=c++20 %t/A.cppm -emit-reduced-module-interface -o 
%t/A.pcm
+// RUN: %clang_cc1 -std=c++20 %t/A.v1.cppm -emit-reduced-module-interface -o 
%t/A.v1.pcm
+//
+// The BMI may not be the same since the source location 
diff ers.
+// RUN: not 
diff  %t/A.pcm %t/A.v1.pcm &> /dev/null
+//
+// The BMI of B shouldn't change since all the locations remain the same.
+// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-reduced-module-interface 
-fmodule-file=A=%t/A.pcm \
+// RUN:     -o %t/B.pcm
+// RUN: %clang_cc1 -std=c++20 %t/B.cppm -emit-reduced-module-interface 
-fmodule-file=A=%t/A.v1.pcm \
+// RUN:     -o %t/B.v1.pcm
+// RUN: 
diff  %t/B.v1.pcm %t/B.pcm  &> /dev/null
+//
+// The BMI of C may change since the locations for instantiations changes.
+// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-reduced-module-interface 
-fmodule-file=A=%t/A.pcm \
+// RUN:     -o %t/C.pcm
+// RUN: %clang_cc1 -std=c++20 %t/C.cppm -emit-reduced-module-interface 
-fmodule-file=A=%t/A.v1.pcm \
+// RUN:     -o %t/C.v1.pcm
+// RUN: not 
diff  %t/C.v1.pcm %t/C.pcm  &> /dev/null
+
+//--- A.cppm
+export module A;
+export template <class T>
+struct C {
+    T func() {
+        return T(43);
+    }
+};
+export int funcA() {
+    return 43;
+}
+
+//--- A.v1.cppm
+export module A;
+
+export template <class T>
+struct C {
+    T func() {
+        return T(43);
+    }
+};
+export int funcA() {
+    return 43;
+}
+
+//--- B.cppm
+export module B;
+import A;
+
+export int funcB() {
+    return funcA();
+}
+
+//--- C.cppm
+export module C;
+import A;
+export inline void testD() {
+    C<int> c;
+    c.func();
+}

diff  --git a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp 
b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
index 141da4c27f8d0b..c80a8fd0e52b17 100644
--- a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
+++ b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
@@ -23,13 +23,14 @@ using LocSeq = SourceLocationSequence;
 // Loc is the raw (in-memory) form of SourceLocation.
 void roundTrip(SourceLocation::UIntTy Loc,
                std::optional<uint64_t> ExpectedEncoded = std::nullopt) {
-  uint64_t ActualEncoded =
-      SourceLocationEncoding::encode(SourceLocation::getFromRawEncoding(Loc));
+  uint64_t ActualEncoded = SourceLocationEncoding::encode(
+      SourceLocation::getFromRawEncoding(Loc), /*BaseOffset=*/0,
+      /*BaseModuleFileIndex=*/0);
   if (ExpectedEncoded) {
     ASSERT_EQ(ActualEncoded, *ExpectedEncoded) << "Encoding " << Loc;
   }
   SourceLocation::UIntTy DecodedEncoded =
-      SourceLocationEncoding::decode(ActualEncoded).getRawEncoding();
+      SourceLocationEncoding::decode(ActualEncoded).first.getRawEncoding();
   ASSERT_EQ(DecodedEncoded, Loc) << "Decoding " << ActualEncoded;
 }
 
@@ -41,7 +42,8 @@ void roundTrip(std::vector<SourceLocation::UIntTy> Locs,
     LocSeq::State Seq;
     for (auto L : Locs)
       ActualEncoded.push_back(SourceLocationEncoding::encode(
-          SourceLocation::getFromRawEncoding(L), Seq));
+          SourceLocation::getFromRawEncoding(L), /*BaseOffset=*/0,
+          /*BaseModuleFileIndex=*/0, Seq));
     if (!ExpectedEncoded.empty()) {
       ASSERT_EQ(ActualEncoded, ExpectedEncoded)
           << "Encoding " << testing::PrintToString(Locs);
@@ -51,7 +53,7 @@ void roundTrip(std::vector<SourceLocation::UIntTy> Locs,
   {
     LocSeq::State Seq;
     for (auto L : ActualEncoded) {
-      SourceLocation Loc = SourceLocationEncoding::decode(L, Seq);
+      SourceLocation Loc = SourceLocationEncoding::decode(L, Seq).first;
       DecodedEncoded.push_back(Loc.getRawEncoding());
     }
     ASSERT_EQ(DecodedEncoded, Locs)


        
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to