[clang] [Modules] No transitive source location change (PR #86912)

2024-05-05 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

I've relanded this in 
https://github.com/llvm/llvm-project/commit/947b06282324db8fe2784c4054af9de493a876af.
Let's see what happens.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-05-01 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

thanks,  it is pretty helpful. 

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-05-01 Thread Rainer Orth via cfe-commits

rorth wrote:

This is certainly a case of unaligned access.  In a local build, I've run the 
first failing `clang` invocation under `truss` (the Solaris syscall tracer).  
For
```
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang
 -cc1 -internal-isystem 
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include
 -nostdsysteminc -fmodules -Wno-private-module -fimplicit-module-maps 
-fmodules-cache-path=/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/APINotes/Output/availability.m.tmp/ModulesCache
 -fapinotes-modules -fsyntax-only -I 
/vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Headers -F 
/vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Frameworks 
/vol/llvm/src/llvm-project/dist/clang/test/APINotes/availability.m -verify
```
this reveals
```
14552:  Incurred fault #5, FLTACCESS  %pc = 0x1083824E0
14552:siginfo: SIGBUS BUS_ADRALN addr=0x7F5E7B6C
14552:  Received signal #10, SIGBUS [caught]
14552:siginfo: SIGBUS BUS_ADRALN addr=0x7F5E7B6C
```
`gdb` shows exactly that:
```
Thread 2 received signal SIGBUS, Bus error.
[Switching to Thread 1 (LWP 1)]
0x0001083824e0 in clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, 
clang::SourceLocation&) ()
1: x/i $pc
=> 0x1083824e0 
<_ZN5clang9ASTReader15DeclCursorForIDENS_12GlobalDeclIDERNS_14SourceLocationE+168>:
  ldx  [ %i5 + %i1 ], %o2
(gdb) p/x $i5
$1 = 0x7f5e7b4c
(gdb) p/x $i1
$2 = 0x20
```
The `ldx` insn (Load Extended Word) takes a doubleword address with natural 
(64-bit) alignment.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

Oh, maybe I found the reason. It is because my patch breaks the alignments of 
`DeclOffset`: 
https://github.com/llvm/llvm-project/blob/8d28e5861f8b117a547850ffbb9a332aa6e91459/clang/include/clang/Serialization/ASTBitCodes.h#L237-L240

then it explains why it work well in some platforms but not in other platforms.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> > I'll revert this. Due to I can't reproduce this. When the bot gets stable, 
> > please tell if it is the real problem.
> 
> You can reproduce this: the [GCC compile farm](https://portal.cfarm.net/) 
> does have a Solaris/sparcv9 system (`cfarm215`) which is perfectly equipped 
> to run LLVM builds (I've tried).
> 
> I think the stack traces from the bot are a pretty strong indication that 
> your patch is the culprit:
> 
> ```
> Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH 
> or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
> 0  clang-19  0x0001076d87b8 
> llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 36
> 1  clang-19  0x0001076d910c SignalHandler(int) + 896
> 2  libc.so.1 0x7f0c62a8 __sighndlr + 12
> 3  libc.so.1 0x7f0b8b50 call_user_handler + 1024
> 4  libc.so.1 0x7f0b8f10 sigacthandler + 160
> 5  clang-19  0x0001083824e0 
> clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, 
> clang::SourceLocation&) + 168
> 6  clang-19  0x00010838aca0 
> clang::ASTReader::ReadDeclRecord(clang::GlobalDeclID) + 48
> 7  clang-19  0x0001082fb4ec 
> clang::ASTReader::GetDecl(clang::GlobalDeclID) + 232
> 8  clang-19  0x0001082cb820 
> clang::ASTReader::SetGloballyVisibleDecls(clang::IdentifierInfo*, 
> llvm::SmallVectorImpl const&, 
> llvm::SmallVectorImpl*) + 252
> 9  clang-19  0x0001083144a0 clang::ASTReader::finishPendingActions() + 572
> 10 clang-19  0x000108319e10 clang::ASTReader::FinishedDeserializing() + 92
> 11 clang-19  0x00010830dbf4 clang::ASTReader::get(llvm::StringRef) + 680
> 12 clang-19  0x0001078a84fc clang::IdentifierTable::get(llvm::StringRef) 
> + 84
> 13 clang-19  0x00010a130fcc clang::Sema::Initialize() + 1208
> 14 clang-19  0x000109fd1814 clang::Parser::Initialize() + 1260
> 15 clang-19  0x000109fccb68 clang::ParseAST(clang::Sema&, bool, bool) + 
> 556
> 16 clang-19  0x0001081b10d8 clang::ASTFrontendAction::ExecuteAction() + 
> 248
> 17 clang-19  0x0001081b06f8 clang::FrontendAction::Execute() + 92
> 18 clang-19  0x0001081196c8 
> clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1572
> 19 clang-19  0x0001082b87b8 
> clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 684
> 20 clang-19  0x0001048a2980 cc1_main(llvm::ArrayRef, char 
> const*, void*) + 4296
> 21 clang-19  0x00010489f6f8 ExecuteCC1Tool(llvm::SmallVectorImpl const*>&, llvm::ToolContext const&) + 1184
> 22 clang-19  0x00010489e018 clang_main(int, char**, llvm::ToolContext 
> const&) + 4424
> 23 clang-19  0x0001048aee0c main + 60
> 24 clang-19  0x00010489c904 _start + 100
> /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.script:
>  line 2: 12701 Bus Error   
> /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang
>  -cc1 -internal-isystem 
> /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include
>  -nostdsysteminc -include-pch 
> /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.tmp
>  -fsyntax-only 
> /vol/llvm/src/llvm-project/dist/clang/test/PCH/opencl-extensions.cl -triple 
> spir-unknown-unknown
> ```
> 
> One thing I see immediately that this uses a triple the bot is not configured 
> to handle. Nonetheless clang shouldn't die with `SIGBUS` in such as case.

Reverted. It looks like the configuration isn't in our bots actually. I can't 
open that site. I need to take another look at the code though.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Rainer Orth via cfe-commits

rorth wrote:

> I'll revert this. Due to I can't reproduce this. When the bot gets stable, 
> please tell if it is the real problem.

You can reproduce this: the [GCC compile farm](https://portal.cfarm.net/) does 
have a Solaris/sparcv9 system (`cfarm215`) which is perfectly equipped to run 
LLVM builds (I've tried).

I think the stack traces from the bot are a pretty strong indication that your 
patch is the culprit:
```
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH 
or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-19  0x0001076d87b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, 
int) + 36
1  clang-19  0x0001076d910c SignalHandler(int) + 896
2  libc.so.1 0x7f0c62a8 __sighndlr + 12
3  libc.so.1 0x7f0b8b50 call_user_handler + 1024
4  libc.so.1 0x7f0b8f10 sigacthandler + 160
5  clang-19  0x0001083824e0 
clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, clang::SourceLocation&) 
+ 168
6  clang-19  0x00010838aca0 
clang::ASTReader::ReadDeclRecord(clang::GlobalDeclID) + 48
7  clang-19  0x0001082fb4ec clang::ASTReader::GetDecl(clang::GlobalDeclID) 
+ 232
8  clang-19  0x0001082cb820 
clang::ASTReader::SetGloballyVisibleDecls(clang::IdentifierInfo*, 
llvm::SmallVectorImpl const&, 
llvm::SmallVectorImpl*) + 252
9  clang-19  0x0001083144a0 clang::ASTReader::finishPendingActions() + 572
10 clang-19  0x000108319e10 clang::ASTReader::FinishedDeserializing() + 92
11 clang-19  0x00010830dbf4 clang::ASTReader::get(llvm::StringRef) + 680
12 clang-19  0x0001078a84fc clang::IdentifierTable::get(llvm::StringRef) + 
84
13 clang-19  0x00010a130fcc clang::Sema::Initialize() + 1208
14 clang-19  0x000109fd1814 clang::Parser::Initialize() + 1260
15 clang-19  0x000109fccb68 clang::ParseAST(clang::Sema&, bool, bool) + 556
16 clang-19  0x0001081b10d8 clang::ASTFrontendAction::ExecuteAction() + 248
17 clang-19  0x0001081b06f8 clang::FrontendAction::Execute() + 92
18 clang-19  0x0001081196c8 
clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1572
19 clang-19  0x0001082b87b8 
clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 684
20 clang-19  0x0001048a2980 cc1_main(llvm::ArrayRef, char 
const*, void*) + 4296
21 clang-19  0x00010489f6f8 ExecuteCC1Tool(llvm::SmallVectorImpl&, llvm::ToolContext const&) + 1184
22 clang-19  0x00010489e018 clang_main(int, char**, llvm::ToolContext 
const&) + 4424
23 clang-19  0x0001048aee0c main + 60
24 clang-19  0x00010489c904 _start + 100
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.script:
 line 2: 12701 Bus Error   
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang
 -cc1 -internal-isystem 
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include
 -nostdsysteminc -include-pch 
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.tmp
 -fsyntax-only 
/vol/llvm/src/llvm-project/dist/clang/test/PCH/opencl-extensions.cl -triple 
spir-unknown-unknown
```
One thing I see immediately that this uses a triple the bot is not configured 
to handle.  Nonetheless clang shouldn't die with `SIGBUS` in such as case.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> I strongly suspect that this patch badly broke the [Solaris/sparcv9 
> buildbot](https://lab.llvm.org/buildbot/#/builders/72/builds/4046): it 
> introduced more than 1000 failures.
> 
> Please fix or revert.

I'll revert this. Due to I can't reproduce this. When the bot gets stable, 
please tell if it is the real problem.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Rainer Orth via cfe-commits

rorth wrote:

I strongly suspect that this patch badly broke the [Solaris/sparcv9 
buildbot](https://lab.llvm.org/buildbot/#/builders/72/builds/4046): it 
introduced more than 1000 failures.

Please fix or revert.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 closed 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-30 Thread Michael Spencer via cfe-commits

https://github.com/Bigcheese approved this pull request.

I did some testing today and this change seems fine. For scanning modules I 
actually saw some get smaller with your change.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-25 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> The changes LGTM, don't want to block this on my remaining nits.

Thanks for reviewing this.

> 
> I believe @Bigcheese wanted to test test impact on PCM size on our side 
> before this lands.

I've rebased this with main. I'll wait for the results from @Bigcheese 

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-25 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From 2c20a6200fb2790b3a891ffc8c43682c113c7e8a Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 +---
 clang/include/clang/Serialization/ASTReader.h | 48 ++
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  | 14 ++-
 .../Serialization/SourceLocationEncoding.h| 91 +--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 57 
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 ++
 clang/test/Modules/pr61067.cppm   | 25 -
 .../SourceLocationEncodingTest.cpp| 12 ++-
 15 files changed, 269 insertions(+), 162 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 186c3b722ced16..94a3d24d47926b 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -23,6 +23,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -167,45 +168,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -231,8 +225,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -240,17 +236,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
- 

[clang] [Modules] No transitive source location change (PR #86912)

2024-04-24 Thread Jan Svoboda via cfe-commits

https://github.com/jansvoboda11 approved this pull request.

The changes LGTM, don't want to block this on my remaining nits.

I believe @Bigcheese wanted to test test impact on PCM size on our side before 
this lands.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-24 Thread Jan Svoboda via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;

jansvoboda11 wrote:

I see, thanks for explaining that.

I think we should still attempt to unify things so that 
`TranslateSourceLocation()` is only called from `ReadSourceLocation()`. Even if 
it means carrying something like `std::variant` (or `, UntranslatedSourceLocation>`) 
in few places. Juggling translated and untranslated `SourceLocation` instances 
and making functions resilient to both seems unnecessarily complex. I'm fine 
with leaving a FIXME in this patch and doing a follow-up.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-24 Thread Jan Svoboda via cfe-commits


@@ -2221,33 +,45 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex == 0 ?  : MF.DependentModules[ModuleFileIndex - 1];
+
+assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location 
space");

jansvoboda11 wrote:

What does `Loc` being invalid has to do with this?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-22 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

@jansvoboda11 @Bigcheese ping

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From ddb4074b0460daf7b42531ec62e97347b3f2e14d Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH 1/4] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 ++--
 clang/include/clang/Serialization/ASTReader.h | 54 +++-
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  |  4 -
 .../Serialization/SourceLocationEncoding.h| 88 +--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 84 +++---
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 +++
 clang/test/Modules/pr61067.cppm   | 25 --
 .../SourceLocationEncodingTest.cpp| 12 +--
 15 files changed, 275 insertions(+), 176 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 500098dd3dab1d..eca776a77e4557 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
- 

[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Chuanqi Xu via cfe-commits


@@ -4082,14 +4069,14 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) 
const {
   : ModuleMgr.lookupByFileName(Name));
 if (!OM) {
   std::string Msg =
-  "SourceLocation remap refers to unknown module, cannot find ";
+  "cannot find module ";

ChuanqiXu9 wrote:

Done

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Chuanqi Xu via cfe-commits


@@ -2221,33 +,45 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex == 0 ?  : MF.DependentModules[ModuleFileIndex - 1];
+
+assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location 
space");

ChuanqiXu9 wrote:

But the value of `Loc`  may not be valid. Also I feel it is fine to have some 
redundant assertions. It helps the reader to understand the codes better.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Chuanqi Xu via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;

ChuanqiXu9 wrote:

> Now that TranslateSourceLocation() is only called from ReadSourceLocation()

Sadly, this is not true. `TranslateSourceLocation()` may be called in 
`ASTReader::ReadAST()`:

https://github.com/llvm/llvm-project/blob/aac4d03423dd6b7bdef0f2eb03c570f3e2ca6630/clang/lib/Serialization/ASTReader.cpp#L4588-L4591

The input value of  `TranslateSourceLocation()` there may come from a reading 
of untranslated source location in `ASTReader::ReadControlBlock` when reading 
imported modules. Or the input value may come from the argument of  
`ASTReader::ReadAST()`, where must be a translated source location. 

Then it looks really dangerous to me. So I add the FIXME. We may not be able to 
change the signature of the argument of `TranslateSourceLocation()` to 
`UntranslatedSourceLocation` since that will require us to change the signature 
of `ASTReader::ReadAST()`.

The reason why actually it works, is that, in the case the input value comes 
from a translated source location (passed directly in `ASTReader::ReadAST()`), 
the value of `M.ImportedBy` may always be null **now**.  But I feel it is 
dangerous if someone changes it suddenly.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Jan Svoboda via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;

jansvoboda11 wrote:

I don't think this an issue. Now that `TranslateSourceLocation()` is only 
called from `ReadSourceLocation()` there's no reason for taking care to handle 
already-translated source locations correctly. Maybe now that it doesn't use 
any `ASTWriter` members we can make the function a non-member and make it 
static `ASTWriter.cpp`. In my mind that's enough to leave out the FIXME and 
just assume it only works when used on an untranslated source location read in 
`ReadSourceLocation()`.

As an optional clarification we can return a special type (e.g. 
`UntranslatedSourceLocation`) from `ReadUntranslatedSourceLocation()` and make 
it so that only `TranslateSourceLocation()` can transform that into the regular 
(now always-translated) `SourceLocation`.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Jan Svoboda via cfe-commits


@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl 
) {
   Record.push_back(getAdjustedFileID(FID).getOpaqueValue());
 }
 
+SourceLocationEncoding::RawLocEncoding
+ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) {
+  unsigned BaseOffset = 0;
+  unsigned ModuleFileIndex = 0;
+
+  // See SourceLocationEncoding.h for the encoding details.
+  if (Context->getSourceManager().isLoadedSourceLocation(Loc) &&
+  Loc.isValid()) {
+assert(getChain());
+auto SLocMapI = getChain()->GlobalSLocOffsetMap.find(
+SourceManager::MaxLoadedOffset - Loc.getOffset() - 1);
+assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() &&
+   "Corrupted global sloc offset map");
+ModuleFile *F = SLocMapI->second;
+BaseOffset = F->SLocEntryBaseOffset - 2;
+// 0 means the location is not loaded. So we need to add 1 to the index to
+// make it clear.
+ModuleFileIndex = F->Index + 1;
+assert(()->getModuleManager()[F->Index] == F);

jansvoboda11 wrote:

I don't think `ASTWriter` should be re-checking `ModuleManager` invariants 
either way.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Jan Svoboda via cfe-commits


@@ -4082,14 +4069,14 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) 
const {
   : ModuleMgr.lookupByFileName(Name));
 if (!OM) {
   std::string Msg =
-  "SourceLocation remap refers to unknown module, cannot find ";
+  "cannot find module ";

jansvoboda11 wrote:

Would be good to keep some context in the error message, maybe remove just the 
`"SourceLocation"` part and leave the rest as it was?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Jan Svoboda via cfe-commits


@@ -2221,33 +,45 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex == 0 ?  : MF.DependentModules[ModuleFileIndex - 1];
+
+assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location 
space");

jansvoboda11 wrote:

This is already guarded against in `ReadASTBlock()`, no need to have redundant 
checks.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-18 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

Fix conflicts and rebase with main.

@Bigcheese @jansvoboda11  ping~

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));

ChuanqiXu9 wrote:

Oh, the compiler may be too smart : )

My intention was like, if someday we turned 64 bit source location on, this 
assertion can make us remember  to update here.

But given the warning, I don't know how to write the assertion here. So I just 
removed it.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 

ChuanqiXu9 wrote:

Done.

I feel the idea to store `MF` itself at index `0` may be slightly tricky. 

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;

ChuanqiXu9 wrote:

My thought was like "all the recorded source location are local to its module 
file", so it won't be translated. But after I review how we judge whether a 
source location is loaded in source manager, I find this may be dangerous in 
extreme cases.  We judge if a source location is loaded by comparing the offset 
with the current loaded offset. So it may be dangerous if we have too many 
loaded source location in the TU and the source location we read is too big, 
then we may misread it as loaded source location.

My intention for this code is to make `TranslateSourceLocation` to be symmetric 
with the original one. The original `TranslateSourceLocation` is re-enterable. 
We could call `TranslateSourceLocation`  on a translated source location. But 
we can't do this for the `TranslateSourceLocation`   in the current patch.

So I just removed the check and leave a FIXME comment. I'd like to fix this as 
a NFC patch after we land this.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH 1/3] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 ++--
 clang/include/clang/Serialization/ASTReader.h | 54 +++-
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  |  4 -
 .../Serialization/SourceLocationEncoding.h| 88 +--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 84 +++---
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 +++
 clang/test/Modules/pr61067.cppm   | 25 --
 .../SourceLocationEncodingTest.cpp| 12 +--
 15 files changed, 275 insertions(+), 176 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index f762116fea956c..f94b32d762effc 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
- 

[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Jan Svoboda via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*OwningModuleFile, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;

jansvoboda11 wrote:

How can we get here with a `SourceLocation` that's already been translated?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Jan Svoboda via cfe-commits


@@ -2220,33 +2221,40 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *OwningModuleFile =
+ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : 

jansvoboda11 wrote:

Since the type of `ModuleFileIndex` is not obvious due to `auto`, I suggest 
comparing it to `0` explicitly.

```suggestion
ModuleFileIndex == 0 ?  : MF.DependentModules[ModuleFileIndex - 1];
```

Or maybe the element at index `0` in `DependentModules` could be `MF` itself to 
simplify this further?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Jan Svoboda via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));

jansvoboda11 wrote:

No, I'm referring to this line. I assume the compiler sees through 
`RawLocEncoding` and knows it's being initialized from `SourceLocation::UIntTy`.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> I have a few minor comments on the patch. I want to do some additional perf 
> testing on module scanning perf because I'm a bit concerned about the cost of 
> `ASTWriter::getRawSourceLocationEncoding`.

What's the perf results about scanning?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =
+ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*ModuleFileHomingLoc, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;
+
+return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2);

ChuanqiXu9 wrote:

There are a lot `- 2` in Serialization and SourceManager. I just tried to 
follow that. Sadly I didn't find clear comments explaining this. By reading the 
codes, I think it is trying to skip 2 dummy SLocEntries (the invalid source 
location and invalid expansion location, see `SourceManager::clearIDTables()`) 
since they are not sensitive to the context.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =
+ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : 

ChuanqiXu9 wrote:

We have a bounds check in the `operator[]` of SmallVector. So I guess it may be 
fine.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -4078,8 +4065,8 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const {
   return;

ChuanqiXu9 wrote:

Done.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =

ChuanqiXu9 wrote:

Done.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;
+  return Encoded;
 }
-inline SourceLocation
-SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) {
-  return Seq ? Seq->decode(Encoded)
- : SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+inline std::pair
+SourceLocationEncoding::decode(RawLocEncoding Encoded,
+   SourceLocationSequence *Seq) {
+  unsigned ModuleFileIndex = Encoded >> 32;
+
+  if (!ModuleFileIndex)
+return {Seq ? Seq->decode(Encoded)
+ : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)),
+ModuleFileIndex};
+
+  Encoded &= ((RawLocEncoding)1 << 33) - 1;

ChuanqiXu9 wrote:

Done.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;

ChuanqiXu9 wrote:

Now I assert it should be less than `(1<<16)`

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -696,7 +696,7 @@ class ASTReader
   /// Mapping from global submodule IDs to the module file in which the
   /// submodule resides along with the offset that should be added to the
   /// global submodule ID to produce a local ID.
-  GlobalSubmoduleMapType GlobalSubmoduleMap;
+  mutable GlobalSubmoduleMapType GlobalSubmoduleMap;

ChuanqiXu9 wrote:

Oh, I don't know why it remains. Maybe I added during the work progress.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,

ChuanqiXu9 wrote:

Done.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl 
) {
   Record.push_back(getAdjustedFileID(FID).getOpaqueValue());
 }
 
+SourceLocationEncoding::RawLocEncoding
+ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) {
+  unsigned BaseOffset = 0;
+  unsigned ModuleFileIndex = 0;
+
+  // See SourceLocationEncoding.h for the encoding details.
+  if (Context->getSourceManager().isLoadedSourceLocation(Loc) &&
+  Loc.isValid()) {
+assert(getChain());
+auto SLocMapI = getChain()->GlobalSLocOffsetMap.find(
+SourceManager::MaxLoadedOffset - Loc.getOffset() - 1);
+assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() &&
+   "Corrupted global sloc offset map");
+ModuleFile *F = SLocMapI->second;
+BaseOffset = F->SLocEntryBaseOffset - 2;
+// 0 means the location is not loaded. So we need to add 1 to the index to
+// make it clear.
+ModuleFileIndex = F->Index + 1;
+assert(()->getModuleManager()[F->Index] == F);

ChuanqiXu9 wrote:

Yes but I'd like to remain it. The assertions make me feel better.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -942,6 +942,12 @@ class ASTReader
   /// Sema tracks these to emit deferred diags.
   llvm::SmallSetVector DeclsToCheckForDeferredDiags;
 
+  /// The module files imported by different module files. Indirectly imported
+  /// module files are included too. The information comes from
+  /// ReadModuleOffsetMap(ModuleFile&).
+  mutable llvm::DenseMap>
+  ImportedModuleFiles;

ChuanqiXu9 wrote:

Done in the new commit.

There was a similar member `Imports` in `ModuleFile`, which records the 
directly imported module files from the coding. But, yes, it is good to remove 
the map lookups and let's try to handle the ambiguous by commenting.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));

ChuanqiXu9 wrote:

I guess you may refer to the next line and that one is indeed comparing 
unsigned with `(1<<32)`, and I fixed that. This line should be fine since 
Encoded has type `RawLocEncoding `.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-15 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH 1/2] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 ++--
 clang/include/clang/Serialization/ASTReader.h | 54 +++-
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  |  4 -
 .../Serialization/SourceLocationEncoding.h| 88 +--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 84 +++---
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 +++
 clang/test/Modules/pr61067.cppm   | 25 --
 .../SourceLocationEncodingTest.cpp| 12 +--
 15 files changed, 275 insertions(+), 176 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index f762116fea956c..f94b32d762effc 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
- 

[clang] [Modules] No transitive source location change (PR #86912)

2024-04-10 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> Could you share the rest of the "no transitive change" patches, or at least 
> their outline? I'm curious what else goes into making that work.

Oh, I haven't started them. The patch in my mind includes two at least:
(1) Add a new block recording the semantical hash value of the current named 
module.
(2) A similar patch for DeclID to refactor the existing linear offset to 
{ModuleFileIndex, LocalDeclID} patch.

BTW, with (2), maybe we can refactor the existing super large vector of Decls, 
`DeclsLoaded`, in the ASTReader into a array of vectors or into a map. But this 
may not be included in the series of the patches.

The motivation for (1) is that:

```
export module a;
export import b;
...
```

We want the BMI of `a` to change all the time if `b` changes. 

Also for the motivation example, if we changed the definition of `funcB` into 
`inline`,

```
//--- A.cppm
export module A;
export inline int funcA() {
return 43;
}

//--- B.cppm
export module B;
import A;

export inline int funcB() {
return funcA();
}
```

We hope the BMI of module B will change after the function A changes. But the 
declaration level hash may be tricky, so probably we need to change the BMI of 
the module B after the module A changes as the initial step.

So the result may be, we need to add a new hash or signature value to the BMI 
for named modules:

```
signature = hash_combine(hash value of the current buffer, combined hash value 
of the signature export imported modules, combined hash value of touched 
non-export import modules)
```

(Maybe we need to do this for all module types)

Maybe we can reuse the existing ASTSignature or we need to add a new block. But 
I am sure about this point now.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =

Bigcheese wrote:

`ModuleFileHomingLoc` sounds a bit confusing to me. Would a better name here be 
`OwningModuleFile`?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =
+ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : 

Bigcheese wrote:

Should have at least a bounds check assert on the ModuleFileIndex value here.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,
+   LocSeq *Seq = nullptr) const {
+if (!MF.ModuleOffsetMap.empty())
+  ReadModuleOffsetMap(MF);
+
+auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+ModuleFile *ModuleFileHomingLoc =
+ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : 
+return TranslateSourceLocation(*ModuleFileHomingLoc, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile ,
  SourceLocation Loc) const {
-if (!ModuleFile.ModuleOffsetMap.empty())
-  ReadModuleOffsetMap(ModuleFile);
-assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-   ModuleFile.SLocRemap.end() &&
-   "Cannot find offset to remap.");
-SourceLocation::IntTy Remap =
-ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-return Loc.getLocWithOffset(Remap);
+if (Loc.isInvalid())
+  return Loc;
+
+// It implies that the Loc is already translated.
+if (SourceMgr.isLoadedSourceLocation(Loc))
+  return Loc;
+
+return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2);

Bigcheese wrote:

Why the -2 here?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;

Bigcheese wrote:

Were you going to change this to only reserve 16 bits for module index?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -4078,8 +4065,8 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const {
   return;

Bigcheese wrote:

There's no more SourceLocation remap so I believe this error message is wrong 
now.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;
+  return Encoded;
 }
-inline SourceLocation
-SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) {
-  return Seq ? Seq->decode(Encoded)
- : SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+inline std::pair
+SourceLocationEncoding::decode(RawLocEncoding Encoded,
+   SourceLocationSequence *Seq) {
+  unsigned ModuleFileIndex = Encoded >> 32;
+
+  if (!ModuleFileIndex)
+return {Seq ? Seq->decode(Encoded)
+ : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)),
+ModuleFileIndex};
+
+  Encoded &= ((RawLocEncoding)1 << 33) - 1;

Bigcheese wrote:

We have `maskTrailingOnes(32)` which is a bit clearer than raw 
bit math.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits

https://github.com/Bigcheese edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits

https://github.com/Bigcheese commented:

I have a few minor comments on the patch. I want to do some additional perf 
testing on module scanning perf because I'm a bit concerned about the cost of 
`ASTWriter::getRawSourceLocationEncoding`.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

I left a couple of initial comments. With the 64-bit `SourceLocation` aspect of 
this change solved and with just 2% size increase, I think this approach makes 
sense. Could you share the rest of the "no transitive change" patches, or at 
least their outline? I'm curious what else goes into making that work.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits


@@ -696,7 +696,7 @@ class ASTReader
   /// Mapping from global submodule IDs to the module file in which the
   /// submodule resides along with the offset that should be added to the
   /// global submodule ID to produce a local ID.
-  GlobalSubmoduleMapType GlobalSubmoduleMap;
+  mutable GlobalSubmoduleMapType GlobalSubmoduleMap;

jansvoboda11 wrote:

Is `mutable` necessary here?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits


@@ -2220,40 +2227,47 @@ class ASTReader
 return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
+  std::pair
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+ LocSeq *Seq = nullptr) const {
 return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile ,
-SourceLocation::UIntTy Raw,
-LocSeq *Seq = nullptr) const {
-SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw,

jansvoboda11 wrote:

Can we undo the rename? This is named after the return type (what the client 
expects) not after the argument (the raw location). Also, `ASTReader` will 
never read a non-"raw" source location.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits


@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl 
) {
   Record.push_back(getAdjustedFileID(FID).getOpaqueValue());
 }
 
+SourceLocationEncoding::RawLocEncoding
+ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) {
+  unsigned BaseOffset = 0;
+  unsigned ModuleFileIndex = 0;
+
+  // See SourceLocationEncoding.h for the encoding details.
+  if (Context->getSourceManager().isLoadedSourceLocation(Loc) &&
+  Loc.isValid()) {
+assert(getChain());
+auto SLocMapI = getChain()->GlobalSLocOffsetMap.find(
+SourceManager::MaxLoadedOffset - Loc.getOffset() - 1);
+assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() &&
+   "Corrupted global sloc offset map");
+ModuleFile *F = SLocMapI->second;
+BaseOffset = F->SLocEntryBaseOffset - 2;
+// 0 means the location is not loaded. So we need to add 1 to the index to
+// make it clear.
+ModuleFileIndex = F->Index + 1;
+assert(()->getModuleManager()[F->Index] == F);

jansvoboda11 wrote:

This seems a bit tautological.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits


@@ -942,6 +942,12 @@ class ASTReader
   /// Sema tracks these to emit deferred diags.
   llvm::SmallSetVector DeclsToCheckForDeferredDiags;
 
+  /// The module files imported by different module files. Indirectly imported
+  /// module files are included too. The information comes from
+  /// ReadModuleOffsetMap(ModuleFile&).
+  mutable llvm::DenseMap>
+  ImportedModuleFiles;

jansvoboda11 wrote:

Why does this live in `ASTReader`? If we moved it to `ModuleFile`, we could 
remove `mutable` and the map lookups.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Jan Svoboda via cfe-commits


@@ -149,14 +157,44 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return  }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-   SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+   unsigned BaseModuleFileIndex,
+   SourceLocationSequence *Seq) {
+  // If the source location is a local source location, we can try to optimize
+  // the similar sequences to only record the differences.
+  if (!BaseOffset)
+return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+
+  if (Loc.isInvalid())
+return 0;
+  
+  // Otherwise, the higher bits are used to store the module file index,
+  // so it is meaningless to optimize the source locations into small
+  // integers. Let's try to always use the raw encodings.
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));

jansvoboda11 wrote:

My compiler says:
```
warning: result of comparison of constant 4294967296 with expression of type 
'unsigned int' is always true [-Wtautological-constant-out-of-range-compare]
```

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-09 Thread Michael Spencer via cfe-commits

https://github.com/Bigcheese commented:

I think the general approach makes sense. I'll take a closer look at the 
specific changes.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-06 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

Rebase with main and enabling the sequence optimization when the module file 
index is 0. Now the size of the PCMs goes to 204M (originally it is 200M). It 
looks better.

@jansvoboda11 @Bigcheese ping

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-06 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-04-06 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 ++--
 clang/include/clang/Serialization/ASTReader.h | 54 +++-
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  |  4 -
 .../Serialization/SourceLocationEncoding.h| 88 +--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 84 +++---
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 +++
 clang/test/Modules/pr61067.cppm   | 25 --
 .../SourceLocationEncodingTest.cpp| 12 +--
 15 files changed, 275 insertions(+), 176 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index f762116fea956c..f94b32d762effc 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
- 

[clang] [Modules] No transitive source location change (PR #86912)

2024-04-02 Thread Simon Tatham via cfe-commits

statham-arm wrote:

> Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` 
> typedefs) wants to weight in here.

I'm afraid my knowledge of C++ modules is very close to zero. They were 
mentioned in a training course I did last year, but not in much detail.

On 64-bit SourceLocation in general: our patch series to implement those as an 
option in clang was never fully landed, because the second half of it stalled 
in review. I'd still like to see it finished off, though I'm sure it would need 
some vigorous rebasing and retesting by now. The 32-bit SourceLocation limit is 
a problem for at least some of our users, apparently because there's a library 
of header files that break the limit all by themselves. (I'm not sure how; I 
haven't seen them. Maybe by including each other multiple times with different 
`#define`s?)

But I have no idea whether the same considerations would apply to modules, 
because I don't really know enough about modules, sorry!

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-31 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> > If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits 
> > for serialization. We can at most use 64 bits for a slot. In that case, we 
> > can only assume the offset of source location **in its own module** (not 
> > the global offset!) is not large than 2^32. I feel this assumption may be 
> > valid in a lot of places.
> > Or otherwise we can use less bits for module file index (32 bits seems to 
> > be too much honestly), then we can use higher 16 bits to store the module 
> > file index, and leave the lower 48 bits to store the source location. In 
> > this case, the assumption becomes to "the offset of the source location may 
> > not large than 2^48". But it is slightly hard to believe we can reach such 
> > extreme cases.
> 
> Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` 
> typedefs) wants to weight in here.
> 
> > I **tried** to encode this as two separate 32bit values. But it will break 
> > too many codes. Since a lot of places assume that we can encode the source 
> > location as an uint64_t.
> > What I mean is, with VBR6 format 
> > (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can 
> > save more space for small integers in **on-disk** .pcm files (the memory 
> > representation should be the same). For example, for a 64 bits unsigned int 
> > `1`, VBR6 can use only 6 bits to store that `01` to represent the 64 
> > bits value `1` in the on-disk representations. So that even if I don't use 
> > more slots to store the module file index, the size of the .pcm files will 
> > increase after all.
> 
> Right. My thinking was that single 64bit value with the module file index in 
> the upper 32 bits would basically disable VBR6 encoding for the lower 32 
> bits. If we split this thing into two separate 32bit values, we are more 
> likely to VBR6 encode both of them. But this would actually increase size for 
> (what I assume is the most common case) local source locations. 

Yes, this is the trade offs.

> Still, I think having a rough idea of how alternative implementations compare 
> would be great.
> 
> Do you have any data on how much recompilation this can save for real world 
> projects?

I don't have a specific data though. But I think it is understandable that the 
feature is super important. And we can save countless unnecessary compilations 
after this feature.

After the patch, for reduced BMI, we're already able to avoid changing the BMI 
if we only change the definitions of non-inline function bodies in the module 
unit.

Further more, based on this patch, we can do further optimizations. e.g., if we 
split declaration ID like we did in this patch, we may be able to avoid 
recompilations if a unreferenced module unit adds or deletes declarations. This 
may be pretty common in practice:

```
export module interface:part1;
...

//--- 
export module interface:part2;
...

//--- export module interface;
export import :part1;
export import :part2;

//--- consumer.cppm
export module consumer;
import interface;
// only used declarations in interface:part1;
```

Then if the user only changes module unit `interface:part2`, then we can keep 
the BMI for `consumer.cppm` the same. That implies every user of 
`consumer.cppm` can avoid recompilations.

---

I see this is a killer feature for C++20 modules. I think it is significantly 
valuable. So I really want to make this (including later optimizations) into 
clang19.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-29 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for 
> serialization. We can at most use 64 bits for a slot. In that case, we can 
> only assume the offset of source location **in its own module** (not the 
> global offset!) is not large than 2^32. I feel this assumption may be valid 
> in a lot of places.
> 
> Or otherwise we can use less bits for module file index (32 bits seems to be 
> too much honestly), then we can use higher 16 bits to store the module file 
> index, and leave the lower 48 bits to store the source location. In this 
> case, the assumption becomes to "the offset of the source location may not 
> large than 2^48". But it is slightly hard to believe we can reach such 
> extreme cases.

Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` 
typedefs) wants to weight in here.

> I **tried** to encode this as two separate 32bit values. But it will break 
> too many codes. Since a lot of places assume that we can encode the source 
> location as an uint64_t.
> 
> What I mean is, with VBR6 format 
> (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can 
> save more space for small integers in **on-disk** .pcm files (the memory 
> representation should be the same). For example, for a 64 bits unsigned int 
> `1`, VBR6 can use only 6 bits to store that `01` to represent the 64 bits 
> value `1` in the on-disk representations. So that even if I don't use more 
> slots to store the module file index, the size of the .pcm files will 
> increase after all.

Right. My thinking was that single 64bit value with the module file index in 
the upper 32 bits would basically disable VBR6 encoding for the lower 32 bits. 
If we split this thing into two separate 32 bit values, we are more likely to 
VBR6 encode both of them. But this might actually increase size for (what I 
assume is the most common case) local source locations. Still, I think having a 
rough idea of how alternative implementations compare would be great.

Do you have any data on how much recompilation this can save for real world 
projects?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> > Yes, I explained this in Some low level details section. The size of source 
> > location won't be affected. Since the size of source location is unsigned 
> > (practically, it is 32 bits in most platforms). And we use uint64_t as a 
> > unit in the serializer. So there are 32 bit not used completely. The plan 
> > is to store the module file index in the higher 32 bits and it shouldn't be 
> > a safe problem. Maybe the original wording is not so clear. I've updated it.
> 
> Thank you, using 64 bits in the serialization format makes sense! This also 
> means that whenever Clang is configured with 64 bit `SourceLocation`, we 
> should be using 96 bits for serialization: 32 bits for the module file index 
> and 64 bits for the offset itself, correct?

If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for 
serialization. We can at most use 64 bits for a slot. In that case, we can only 
assume the offset of source location **in its own module** (not the global 
offset!) is not large than 2^32. I hope this may not be true.

> 
> > The only trade-off I saw about this change is that it may increase the size 
> > of **on-disk** .pcm files due to we use VBR6 format to decrease the size of 
> > small numbers. But on the one side, we still need to pay for more spaces if 
> > we want to use `{local-module-index, offset-within-module} pair` (Thanks 
> > for the good name suggestion). On the other hand, from the experiment, it 
> > shows the overhead is acceptable.
> 
> Sorry, I don't quite understand. Are you saying you did or did not try to 
> encode this as two separate 32bit values?

I **tried** to encode this as two separate 32bit values. But it will break too 
many codes. Since a lot of places assume that we can encode the source location 
as an uint64_t.

What I mean is, with VBR6 format 
(https://llvm.org/docs/BitCodeFormat.html#variable-width-integer),  we can save 
more space for small integers in **on-disk** .pcm files (the memory 
representation should be the same). For example, for a 64 bits unsigned int 
`1`, VBR6 can use only 6 bits to store that `01` to represent the 64 bits 
value `1` in the on-disk representations. So that even if I don't use more 
slots to store the module file index, the size of the .pcm files will increase 
after all.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

> Yes, I explained this in Some low level details section. The size of source 
> location won't be affected. Since the size of source location is unsigned 
> (practically, it is 32 bits in most platforms). And we use uint64_t as a unit 
> in the serializer. So there are 32 bit not used completely. The plan is to 
> store the module file index in the higher 32 bits and it shouldn't be a safe 
> problem. Maybe the original wording is not so clear. I've updated it.

Thank you, using 64 bits in the serialization format makes sense! This also 
means that whenever Clang is configured with 64 bit `SourceLocation`, we should 
be using 96 bits for serialization: 32 bits for the module file index and 64 
bits for the offset itself, correct?

> The only trade-off I saw about this change is that it may increase the size 
> of **on-disk** .pcm files due to we use VBR6 format to decrease the size of 
> small numbers. But on the one side, we still need to pay for more spaces if 
> we want to use `{local-module-index, offset-within-module} pair` (Thanks for 
> the good name suggestion). On the other hand, from the experiment, it shows 
> the overhead is acceptable.

Sorry, I don't quite understand. Are you saying you did or did not try to 
encode this as two separate 32bit values?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> By default, `SourceLocation` is 32 bits. One bit is used to distinguish macro 
> expansions. Looking at libc++'s module map, it contains 999 top-level modules 
> at this moment. That's 10 bits just to be able to import the (entire) 
> standard library. That leaves 21 bits, restricting local `SourceLocation` 
> space to 2 MB. This doesn't sound feasible. Did I misunderstand?

Yes, I explained this in `Some low level details` section. The size of source 
location won't be affected. Since the size of source location is unsigned 
(practically, it is 32 bits in most platforms). And we use uint64_t as a unit 
in the serializer. So there are 32 bit not used completely. The plan is to 
store the module file index in the higher 32 bits and it shouldn't be a safe 
problem.

The only trade-off I saw about this change is that it may increase the size of 
**on-disk** .pcm files due to we use VBR6 format to decrease the size of small 
numbers. But on the one side, we still need to pay for more spaces if we want 
to use `{local-module-index, offset-within-module} pair` (Thanks for the good 
name suggestion). On the other hand, from the experiment, it shows the overhead 
is acceptable.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Jan Svoboda via cfe-commits

jansvoboda11 wrote:

By default, `SourceLocation` is 32 bits. One bit is used to distinguish macro 
expansions. Looking at libc++'s module map, it contains 999 top-level modules 
at this moment. That's 10 bits just to be able to import the (entire) standard 
library. That leaves 21 bits, restricting local `SourceLocation` space to 2 MB. 
This doesn't sound feasible. Did I misunderstand?

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread James Y Knight via cfe-commits

jyknight wrote:

+1 on the high-level plan. Switching from a linear offset to a 
{local-module-index, offset-within-module} pair sounds great!

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

 BTW, after this patch, with reduced BMI 
(https://github.com/llvm/llvm-project/pull/85050), we can already do something 
more interesting than reformating:

```
//--- A.cppm
export module A;
int funcA0();
int funcA1();
export int funcA() {
return funcA0();
}

//--- A.v1.cppm
export module A;

int funcA0();
int funcA1();
export int funcA() {
return funcA0() + funcA1();
}

//--- B.cppm
export module B;
import A;

export int funcB() {
return funcA();
}
```

Now the B.pcm will keep unchanged  with `A.pcm` from `A.cppm` and `A.v1.pcm` 
from `A.v1.cppm`. We changed the implementation of `funcA()` from `return 
funcA0();` to `return funcA0() + funcA1();`. And the `B.pcm` can still get the 
same content.

https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Chuanqi Xu (ChuanqiXu9)


Changes

This is part of "no transitive change" patch series, "no transitive source 
location change". I talked this with @Bigcheese in the tokyo's WG21 
meeting.

The idea comes from @jyknight posted on LLVM discourse. That for:

```
// A.cppm
export module A;
...

// B.cppm
export module B;
import A;
...

//--- C.cppm
export module C;
import C;
```

Almost every time A.cppm changes, we need to recompile `C`. Due to we think the 
source location is significant to the semantics.  But it may be good if we can 
avoid recompiling `C` if the change from `A` wouldn't change the BMI of B.

# Motivation Example

This patch only cares source locations. So let's focus on source location's 
example. We can see the full example from the attached test.

```
//--- A.cppm
export module A;
export template class T
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}

//--- A.v1.cppm
export module A;

export template class T
struct C {
T func() {
return T(43);
}
};
export int funcA() {
return 43;
}

//--- B.cppm
export module B;
import A;

export int funcB() {
return funcA();
}

//--- C.cppm
export module C;
import A;
export void testD() {
Cint c;
c.func();
}
```

Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` 
has an additional blank line. Then the test shows that two BMI of `B.cppm`, one 
specified `-fmodule-file=A=A.pcm` and the other specified 
`-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. 

However, it is a different story for C, since C instantiates templates from A, 
and the instantiation records the source information from module A, which is 
different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and 
`C.v1.pcm` can and should differ.

# Internal perspective of status quo

To fully understand the patch, we need to understand how we encodes source 
locations and how we serialize and deserialize them.

For source locations, we encoded them as:

```
|
|
| _ base offset of an imported module
|
|
|
|_ base offset of another imported module
|
|
|
|
| ___ 0
```

As the diagram shows, we encode the local (unloaded) source location from 0 to 
higher bits. And we allocate the space for source locations from the loaded 
modules from high bits to 0. Then the source locations from the loaded modules 
will be mapped to our source location space according to the allocated offset.

For example, for,

```
// a.cppm
export module a;
...

// b.cppm
export module b;
import a;
...
```

Assuming the offset of a source location (let's name the location as `S`) in 
a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in 
b.cppm, when we import a, the source manager will allocate a space for module 
'a' (according to the recorded number of source locations) as the base offset 
of module 'a' in the current source location spaces. Let's assume the allocated 
base offset as 90 in this example.  Then when we want to get the location in 
the current source location space for `S`, we can get it simply by adding `45` 
to `90` to `135`. Finally we can get the source location for `S` in module B as 
`135`.

And when we want to write module `b`, we would also write the source location 
of `S` as `135` directly in the BMI. And to clarify the location `S` comes from 
module `a`, we also need to record the base offset of  module `a`, 90 in the 
BMI of `b`.

Then the problem comes. Since the base offset of module 'a' is computed by the 
number source locations in module 'a'. In module 'b', the recorded base offset 
of module 'a' will change every time the number of source locations in module 
'a' increase or decrease. In other words, the contents of BMI of B will change 
every time the number of locations in module 'a' changes. This is pretty 
sensitive. Almost every change will change the number of locations. So this is 
the problem this patch want to solve.

Let's continue with the existing design to understand what's going on. Another 
interesting case is:

```
// c.cppm
export module c;
import a;
import b;
...
```

In `c.cppm`, when we import `a`, we still need to allocate a base location 
offset for it, let's say the value becomes to `200` somehow. Then when we reach 
the location `S` recorded in module `b`, we need to translate it into the 
current source location space. The solution is quite simple, we can get it by 
`135 + (200 - 90) = 245`. In another word, the offset of a source location in 
current module can be computed as `Recorded Offset + Base Offset of the its 
module file - Recorded Base Offset`.

Then we're almost done about how we handle the offset of source locations in 
serializers.

# The high level design of current patch

>From the abstract level, what we want to do  is to remove the hardcoded base 
>offset of imported modules and remain the ability to calculate the source 

[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 ready_for_review 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/86912
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] No transitive source location change (PR #86912)

2024-03-28 Thread Chuanqi Xu via cfe-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/86912

>From 8d4e349710f4ca84a7ad2dd8aa71afa50b730dbb Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Mon, 18 Mar 2024 08:36:55 +0800
Subject: [PATCH] [Modules] No transitive source location change

---
 clang/include/clang/Basic/SourceLocation.h|  1 +
 .../include/clang/Serialization/ASTBitCodes.h | 56 ++--
 clang/include/clang/Serialization/ASTReader.h | 54 +++-
 clang/include/clang/Serialization/ASTWriter.h |  4 +
 .../include/clang/Serialization/ModuleFile.h  |  4 -
 .../Serialization/SourceLocationEncoding.h| 78 +++--
 clang/lib/Frontend/ASTUnit.cpp|  2 -
 clang/lib/Serialization/ASTReader.cpp | 86 +++
 clang/lib/Serialization/ASTReaderDecl.cpp |  2 +-
 clang/lib/Serialization/ASTWriter.cpp | 41 +++--
 clang/lib/Serialization/ASTWriterDecl.cpp |  8 +-
 clang/lib/Serialization/ModuleFile.cpp|  1 -
 .../no-transitive-source-location-change.cppm | 69 +++
 clang/test/Modules/pr61067.cppm   | 25 --
 .../SourceLocationEncodingTest.cpp| 66 ++
 15 files changed, 264 insertions(+), 233 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm

diff --git a/clang/include/clang/Basic/SourceLocation.h 
b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index f31efa5117f0d1..628ce03572fea6 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include 
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+  : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-  : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) 
{
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+  : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,