[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: I've relanded this in https://github.com/llvm/llvm-project/commit/947b06282324db8fe2784c4054af9de493a876af. Let's see what happens. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: thanks, it is pretty helpful. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
rorth wrote: This is certainly a case of unaligned access. In a local build, I've run the first failing `clang` invocation under `truss` (the Solaris syscall tracer). For ``` /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang -cc1 -internal-isystem /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include -nostdsysteminc -fmodules -Wno-private-module -fimplicit-module-maps -fmodules-cache-path=/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/APINotes/Output/availability.m.tmp/ModulesCache -fapinotes-modules -fsyntax-only -I /vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Headers -F /vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Frameworks /vol/llvm/src/llvm-project/dist/clang/test/APINotes/availability.m -verify ``` this reveals ``` 14552: Incurred fault #5, FLTACCESS %pc = 0x1083824E0 14552:siginfo: SIGBUS BUS_ADRALN addr=0x7F5E7B6C 14552: Received signal #10, SIGBUS [caught] 14552:siginfo: SIGBUS BUS_ADRALN addr=0x7F5E7B6C ``` `gdb` shows exactly that: ``` Thread 2 received signal SIGBUS, Bus error. [Switching to Thread 1 (LWP 1)] 0x0001083824e0 in clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, clang::SourceLocation&) () 1: x/i $pc => 0x1083824e0 <_ZN5clang9ASTReader15DeclCursorForIDENS_12GlobalDeclIDERNS_14SourceLocationE+168>: ldx [ %i5 + %i1 ], %o2 (gdb) p/x $i5 $1 = 0x7f5e7b4c (gdb) p/x $i1 $2 = 0x20 ``` The `ldx` insn (Load Extended Word) takes a doubleword address with natural (64-bit) alignment. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: Oh, maybe I found the reason. It is because my patch breaks the alignments of `DeclOffset`: https://github.com/llvm/llvm-project/blob/8d28e5861f8b117a547850ffbb9a332aa6e91459/clang/include/clang/Serialization/ASTBitCodes.h#L237-L240 then it explains why it work well in some platforms but not in other platforms. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > > I'll revert this. Due to I can't reproduce this. When the bot gets stable, > > please tell if it is the real problem. > > You can reproduce this: the [GCC compile farm](https://portal.cfarm.net/) > does have a Solaris/sparcv9 system (`cfarm215`) which is perfectly equipped > to run LLVM builds (I've tried). > > I think the stack traces from the bot are a pretty strong indication that > your patch is the culprit: > > ``` > Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH > or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): > 0 clang-19 0x0001076d87b8 > llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 36 > 1 clang-19 0x0001076d910c SignalHandler(int) + 896 > 2 libc.so.1 0x7f0c62a8 __sighndlr + 12 > 3 libc.so.1 0x7f0b8b50 call_user_handler + 1024 > 4 libc.so.1 0x7f0b8f10 sigacthandler + 160 > 5 clang-19 0x0001083824e0 > clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, > clang::SourceLocation&) + 168 > 6 clang-19 0x00010838aca0 > clang::ASTReader::ReadDeclRecord(clang::GlobalDeclID) + 48 > 7 clang-19 0x0001082fb4ec > clang::ASTReader::GetDecl(clang::GlobalDeclID) + 232 > 8 clang-19 0x0001082cb820 > clang::ASTReader::SetGloballyVisibleDecls(clang::IdentifierInfo*, > llvm::SmallVectorImpl const&, > llvm::SmallVectorImpl*) + 252 > 9 clang-19 0x0001083144a0 clang::ASTReader::finishPendingActions() + 572 > 10 clang-19 0x000108319e10 clang::ASTReader::FinishedDeserializing() + 92 > 11 clang-19 0x00010830dbf4 clang::ASTReader::get(llvm::StringRef) + 680 > 12 clang-19 0x0001078a84fc clang::IdentifierTable::get(llvm::StringRef) > + 84 > 13 clang-19 0x00010a130fcc clang::Sema::Initialize() + 1208 > 14 clang-19 0x000109fd1814 clang::Parser::Initialize() + 1260 > 15 clang-19 0x000109fccb68 clang::ParseAST(clang::Sema&, bool, bool) + > 556 > 16 clang-19 0x0001081b10d8 clang::ASTFrontendAction::ExecuteAction() + > 248 > 17 clang-19 0x0001081b06f8 clang::FrontendAction::Execute() + 92 > 18 clang-19 0x0001081196c8 > clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1572 > 19 clang-19 0x0001082b87b8 > clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 684 > 20 clang-19 0x0001048a2980 cc1_main(llvm::ArrayRef, char > const*, void*) + 4296 > 21 clang-19 0x00010489f6f8 ExecuteCC1Tool(llvm::SmallVectorImpl const*>&, llvm::ToolContext const&) + 1184 > 22 clang-19 0x00010489e018 clang_main(int, char**, llvm::ToolContext > const&) + 4424 > 23 clang-19 0x0001048aee0c main + 60 > 24 clang-19 0x00010489c904 _start + 100 > /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.script: > line 2: 12701 Bus Error > /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang > -cc1 -internal-isystem > /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include > -nostdsysteminc -include-pch > /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.tmp > -fsyntax-only > /vol/llvm/src/llvm-project/dist/clang/test/PCH/opencl-extensions.cl -triple > spir-unknown-unknown > ``` > > One thing I see immediately that this uses a triple the bot is not configured > to handle. Nonetheless clang shouldn't die with `SIGBUS` in such as case. Reverted. It looks like the configuration isn't in our bots actually. I can't open that site. I need to take another look at the code though. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
rorth wrote: > I'll revert this. Due to I can't reproduce this. When the bot gets stable, > please tell if it is the real problem. You can reproduce this: the [GCC compile farm](https://portal.cfarm.net/) does have a Solaris/sparcv9 system (`cfarm215`) which is perfectly equipped to run LLVM builds (I've tried). I think the stack traces from the bot are a pretty strong indication that your patch is the culprit: ``` Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 clang-19 0x0001076d87b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 36 1 clang-19 0x0001076d910c SignalHandler(int) + 896 2 libc.so.1 0x7f0c62a8 __sighndlr + 12 3 libc.so.1 0x7f0b8b50 call_user_handler + 1024 4 libc.so.1 0x7f0b8f10 sigacthandler + 160 5 clang-19 0x0001083824e0 clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, clang::SourceLocation&) + 168 6 clang-19 0x00010838aca0 clang::ASTReader::ReadDeclRecord(clang::GlobalDeclID) + 48 7 clang-19 0x0001082fb4ec clang::ASTReader::GetDecl(clang::GlobalDeclID) + 232 8 clang-19 0x0001082cb820 clang::ASTReader::SetGloballyVisibleDecls(clang::IdentifierInfo*, llvm::SmallVectorImpl const&, llvm::SmallVectorImpl*) + 252 9 clang-19 0x0001083144a0 clang::ASTReader::finishPendingActions() + 572 10 clang-19 0x000108319e10 clang::ASTReader::FinishedDeserializing() + 92 11 clang-19 0x00010830dbf4 clang::ASTReader::get(llvm::StringRef) + 680 12 clang-19 0x0001078a84fc clang::IdentifierTable::get(llvm::StringRef) + 84 13 clang-19 0x00010a130fcc clang::Sema::Initialize() + 1208 14 clang-19 0x000109fd1814 clang::Parser::Initialize() + 1260 15 clang-19 0x000109fccb68 clang::ParseAST(clang::Sema&, bool, bool) + 556 16 clang-19 0x0001081b10d8 clang::ASTFrontendAction::ExecuteAction() + 248 17 clang-19 0x0001081b06f8 clang::FrontendAction::Execute() + 92 18 clang-19 0x0001081196c8 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1572 19 clang-19 0x0001082b87b8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 684 20 clang-19 0x0001048a2980 cc1_main(llvm::ArrayRef, char const*, void*) + 4296 21 clang-19 0x00010489f6f8 ExecuteCC1Tool(llvm::SmallVectorImpl&, llvm::ToolContext const&) + 1184 22 clang-19 0x00010489e018 clang_main(int, char**, llvm::ToolContext const&) + 4424 23 clang-19 0x0001048aee0c main + 60 24 clang-19 0x00010489c904 _start + 100 /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.script: line 2: 12701 Bus Error /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang -cc1 -internal-isystem /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include -nostdsysteminc -include-pch /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.tmp -fsyntax-only /vol/llvm/src/llvm-project/dist/clang/test/PCH/opencl-extensions.cl -triple spir-unknown-unknown ``` One thing I see immediately that this uses a triple the bot is not configured to handle. Nonetheless clang shouldn't die with `SIGBUS` in such as case. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > I strongly suspect that this patch badly broke the [Solaris/sparcv9 > buildbot](https://lab.llvm.org/buildbot/#/builders/72/builds/4046): it > introduced more than 1000 failures. > > Please fix or revert. I'll revert this. Due to I can't reproduce this. When the bot gets stable, please tell if it is the real problem. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
rorth wrote: I strongly suspect that this patch badly broke the [Solaris/sparcv9 buildbot](https://lab.llvm.org/buildbot/#/builders/72/builds/4046): it introduced more than 1000 failures. Please fix or revert. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 closed https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/Bigcheese approved this pull request. I did some testing today and this change seems fine. For scanning modules I actually saw some get smaller with your change. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > The changes LGTM, don't want to block this on my remaining nits. Thanks for reviewing this. > > I believe @Bigcheese wanted to test test impact on PCM size on our side > before this lands. I've rebased this with main. I'll wait for the results from @Bigcheese https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From 2c20a6200fb2790b3a891ffc8c43682c113c7e8a Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 +--- clang/include/clang/Serialization/ASTReader.h | 48 ++ clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 14 ++- .../Serialization/SourceLocationEncoding.h| 91 +-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 57 clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 ++ clang/test/Modules/pr61067.cppm | 25 - .../SourceLocationEncodingTest.cpp| 12 ++- 15 files changed, 269 insertions(+), 162 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index 186c3b722ced16..94a3d24d47926b 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -23,6 +23,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -167,45 +168,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -231,8 +225,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -240,17 +236,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset, -
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/jansvoboda11 approved this pull request. The changes LGTM, don't want to block this on my remaining nits. I believe @Bigcheese wanted to test test impact on PCM size on our side before this lands. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : +return TranslateSourceLocation(*OwningModuleFile, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; jansvoboda11 wrote: I see, thanks for explaining that. I think we should still attempt to unify things so that `TranslateSourceLocation()` is only called from `ReadSourceLocation()`. Even if it means carrying something like `std::variant` (or `, UntranslatedSourceLocation>`) in few places. Juggling translated and untranslated `SourceLocation` instances and making functions resilient to both seems unnecessarily complex. I'm fine with leaving a FIXME in this patch and doing a follow-up. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2221,33 +,45 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex == 0 ? : MF.DependentModules[ModuleFileIndex - 1]; + +assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location space"); jansvoboda11 wrote: What does `Loc` being invalid has to do with this? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: @jansvoboda11 @Bigcheese ping https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From ddb4074b0460daf7b42531ec62e97347b3f2e14d Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH 1/4] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 ++-- clang/include/clang/Serialization/ASTReader.h | 54 +++- clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 4 - .../Serialization/SourceLocationEncoding.h| 88 +-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 84 +++--- clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 +++ clang/test/Modules/pr61067.cppm | 25 -- .../SourceLocationEncodingTest.cpp| 12 +-- 15 files changed, 275 insertions(+), 176 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index 500098dd3dab1d..eca776a77e4557 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -22,6 +22,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -239,8 +233,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -248,17 +244,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset, -
[clang] [Modules] No transitive source location change (PR #86912)
@@ -4082,14 +4069,14 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const { : ModuleMgr.lookupByFileName(Name)); if (!OM) { std::string Msg = - "SourceLocation remap refers to unknown module, cannot find "; + "cannot find module "; ChuanqiXu9 wrote: Done https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2221,33 +,45 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex == 0 ? : MF.DependentModules[ModuleFileIndex - 1]; + +assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location space"); ChuanqiXu9 wrote: But the value of `Loc` may not be valid. Also I feel it is fine to have some redundant assertions. It helps the reader to understand the codes better. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : +return TranslateSourceLocation(*OwningModuleFile, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; ChuanqiXu9 wrote: > Now that TranslateSourceLocation() is only called from ReadSourceLocation() Sadly, this is not true. `TranslateSourceLocation()` may be called in `ASTReader::ReadAST()`: https://github.com/llvm/llvm-project/blob/aac4d03423dd6b7bdef0f2eb03c570f3e2ca6630/clang/lib/Serialization/ASTReader.cpp#L4588-L4591 The input value of `TranslateSourceLocation()` there may come from a reading of untranslated source location in `ASTReader::ReadControlBlock` when reading imported modules. Or the input value may come from the argument of `ASTReader::ReadAST()`, where must be a translated source location. Then it looks really dangerous to me. So I add the FIXME. We may not be able to change the signature of the argument of `TranslateSourceLocation()` to `UntranslatedSourceLocation` since that will require us to change the signature of `ASTReader::ReadAST()`. The reason why actually it works, is that, in the case the input value comes from a translated source location (passed directly in `ASTReader::ReadAST()`), the value of `M.ImportedBy` may always be null **now**. But I feel it is dangerous if someone changes it suddenly. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : +return TranslateSourceLocation(*OwningModuleFile, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; jansvoboda11 wrote: I don't think this an issue. Now that `TranslateSourceLocation()` is only called from `ReadSourceLocation()` there's no reason for taking care to handle already-translated source locations correctly. Maybe now that it doesn't use any `ASTWriter` members we can make the function a non-member and make it static `ASTWriter.cpp`. In my mind that's enough to leave out the FIXME and just assume it only works when used on an untranslated source location read in `ReadSourceLocation()`. As an optional clarification we can return a special type (e.g. `UntranslatedSourceLocation`) from `ReadUntranslatedSourceLocation()` and make it so that only `TranslateSourceLocation()` can transform that into the regular (now always-translated) `SourceLocation`. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl ) { Record.push_back(getAdjustedFileID(FID).getOpaqueValue()); } +SourceLocationEncoding::RawLocEncoding +ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) { + unsigned BaseOffset = 0; + unsigned ModuleFileIndex = 0; + + // See SourceLocationEncoding.h for the encoding details. + if (Context->getSourceManager().isLoadedSourceLocation(Loc) && + Loc.isValid()) { +assert(getChain()); +auto SLocMapI = getChain()->GlobalSLocOffsetMap.find( +SourceManager::MaxLoadedOffset - Loc.getOffset() - 1); +assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() && + "Corrupted global sloc offset map"); +ModuleFile *F = SLocMapI->second; +BaseOffset = F->SLocEntryBaseOffset - 2; +// 0 means the location is not loaded. So we need to add 1 to the index to +// make it clear. +ModuleFileIndex = F->Index + 1; +assert(()->getModuleManager()[F->Index] == F); jansvoboda11 wrote: I don't think `ASTWriter` should be re-checking `ModuleManager` invariants either way. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -4082,14 +4069,14 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const { : ModuleMgr.lookupByFileName(Name)); if (!OM) { std::string Msg = - "SourceLocation remap refers to unknown module, cannot find "; + "cannot find module "; jansvoboda11 wrote: Would be good to keep some context in the error message, maybe remove just the `"SourceLocation"` part and leave the rest as it was? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2221,33 +,45 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex == 0 ? : MF.DependentModules[ModuleFileIndex - 1]; + +assert(!SourceMgr.isLoadedSourceLocation(Loc) && "Run out source location space"); jansvoboda11 wrote: This is already guarded against in `ReadASTBlock()`, no need to have redundant checks. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: Fix conflicts and rebase with main. @Bigcheese @jansvoboda11 ping~ https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); ChuanqiXu9 wrote: Oh, the compiler may be too smart : ) My intention was like, if someday we turned 64 bit source location on, this assertion can make us remember to update here. But given the warning, I don't know how to write the assertion here. So I just removed it. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : ChuanqiXu9 wrote: Done. I feel the idea to store `MF` itself at index `0` may be slightly tricky. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : +return TranslateSourceLocation(*OwningModuleFile, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; ChuanqiXu9 wrote: My thought was like "all the recorded source location are local to its module file", so it won't be translated. But after I review how we judge whether a source location is loaded in source manager, I find this may be dangerous in extreme cases. We judge if a source location is loaded by comparing the offset with the current loaded offset. So it may be dangerous if we have too many loaded source location in the TU and the source location we read is too big, then we may misread it as loaded source location. My intention for this code is to make `TranslateSourceLocation` to be symmetric with the original one. The original `TranslateSourceLocation` is re-enterable. We could call `TranslateSourceLocation` on a translated source location. But we can't do this for the `TranslateSourceLocation` in the current patch. So I just removed the check and leave a FIXME comment. I'd like to fix this as a NFC patch after we land this. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH 1/3] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 ++-- clang/include/clang/Serialization/ASTReader.h | 54 +++- clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 4 - .../Serialization/SourceLocationEncoding.h| 88 +-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 84 +++--- clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 +++ clang/test/Modules/pr61067.cppm | 25 -- .../SourceLocationEncodingTest.cpp| 12 +-- 15 files changed, 275 insertions(+), 176 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index f762116fea956c..f94b32d762effc 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -22,6 +22,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -239,8 +233,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -248,17 +244,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset, -
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : +return TranslateSourceLocation(*OwningModuleFile, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; jansvoboda11 wrote: How can we get here with a `SourceLocation` that's already been translated? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,33 +2221,40 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *OwningModuleFile = +ModuleFileIndex ? MF.DependentModules[ModuleFileIndex - 1] : jansvoboda11 wrote: Since the type of `ModuleFileIndex` is not obvious due to `auto`, I suggest comparing it to `0` explicitly. ```suggestion ModuleFileIndex == 0 ? : MF.DependentModules[ModuleFileIndex - 1]; ``` Or maybe the element at index `0` in `DependentModules` could be `MF` itself to simplify this further? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); jansvoboda11 wrote: No, I'm referring to this line. I assume the compiler sees through `RawLocEncoding` and knows it's being initialized from `SourceLocation::UIntTy`. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > I have a few minor comments on the patch. I want to do some additional perf > testing on module scanning perf because I'm a bit concerned about the cost of > `ASTWriter::getRawSourceLocationEncoding`. What's the perf results about scanning? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = +ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : +return TranslateSourceLocation(*ModuleFileHomingLoc, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; + +return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2); ChuanqiXu9 wrote: There are a lot `- 2` in Serialization and SourceManager. I just tried to follow that. Sadly I didn't find clear comments explaining this. By reading the codes, I think it is trying to skip 2 dummy SLocEntries (the invalid source location and invalid expansion location, see `SourceManager::clearIDTables()`) since they are not sensitive to the context. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = +ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : ChuanqiXu9 wrote: We have a bounds check in the `operator[]` of SmallVector. So I guess it may be fine. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -4078,8 +4065,8 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const { return; ChuanqiXu9 wrote: Done. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = ChuanqiXu9 wrote: Done. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); + + assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32)); + Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32; + return Encoded; } -inline SourceLocation -SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) { - return Seq ? Seq->decode(Encoded) - : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)); +inline std::pair +SourceLocationEncoding::decode(RawLocEncoding Encoded, + SourceLocationSequence *Seq) { + unsigned ModuleFileIndex = Encoded >> 32; + + if (!ModuleFileIndex) +return {Seq ? Seq->decode(Encoded) + : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)), +ModuleFileIndex}; + + Encoded &= ((RawLocEncoding)1 << 33) - 1; ChuanqiXu9 wrote: Done. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); + + assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32)); + Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32; ChuanqiXu9 wrote: Now I assert it should be less than `(1<<16)` https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -696,7 +696,7 @@ class ASTReader /// Mapping from global submodule IDs to the module file in which the /// submodule resides along with the offset that should be added to the /// global submodule ID to produce a local ID. - GlobalSubmoduleMapType GlobalSubmoduleMap; + mutable GlobalSubmoduleMapType GlobalSubmoduleMap; ChuanqiXu9 wrote: Oh, I don't know why it remains. Maybe I added during the work progress. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, ChuanqiXu9 wrote: Done. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl ) { Record.push_back(getAdjustedFileID(FID).getOpaqueValue()); } +SourceLocationEncoding::RawLocEncoding +ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) { + unsigned BaseOffset = 0; + unsigned ModuleFileIndex = 0; + + // See SourceLocationEncoding.h for the encoding details. + if (Context->getSourceManager().isLoadedSourceLocation(Loc) && + Loc.isValid()) { +assert(getChain()); +auto SLocMapI = getChain()->GlobalSLocOffsetMap.find( +SourceManager::MaxLoadedOffset - Loc.getOffset() - 1); +assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() && + "Corrupted global sloc offset map"); +ModuleFile *F = SLocMapI->second; +BaseOffset = F->SLocEntryBaseOffset - 2; +// 0 means the location is not loaded. So we need to add 1 to the index to +// make it clear. +ModuleFileIndex = F->Index + 1; +assert(()->getModuleManager()[F->Index] == F); ChuanqiXu9 wrote: Yes but I'd like to remain it. The assertions make me feel better. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -942,6 +942,12 @@ class ASTReader /// Sema tracks these to emit deferred diags. llvm::SmallSetVector DeclsToCheckForDeferredDiags; + /// The module files imported by different module files. Indirectly imported + /// module files are included too. The information comes from + /// ReadModuleOffsetMap(ModuleFile&). + mutable llvm::DenseMap> + ImportedModuleFiles; ChuanqiXu9 wrote: Done in the new commit. There was a similar member `Imports` in `ModuleFile`, which records the directly imported module files from the coding. But, yes, it is good to remove the map lookups and let's try to handle the ambiguous by commenting. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); ChuanqiXu9 wrote: I guess you may refer to the next line and that one is indeed comparing unsigned with `(1<<32)`, and I fixed that. This line should be fine since Encoded has type `RawLocEncoding `. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH 1/2] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 ++-- clang/include/clang/Serialization/ASTReader.h | 54 +++- clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 4 - .../Serialization/SourceLocationEncoding.h| 88 +-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 84 +++--- clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 +++ clang/test/Modules/pr61067.cppm | 25 -- .../SourceLocationEncodingTest.cpp| 12 +-- 15 files changed, 275 insertions(+), 176 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index f762116fea956c..f94b32d762effc 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -22,6 +22,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -239,8 +233,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -248,17 +244,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset, -
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > Could you share the rest of the "no transitive change" patches, or at least > their outline? I'm curious what else goes into making that work. Oh, I haven't started them. The patch in my mind includes two at least: (1) Add a new block recording the semantical hash value of the current named module. (2) A similar patch for DeclID to refactor the existing linear offset to {ModuleFileIndex, LocalDeclID} patch. BTW, with (2), maybe we can refactor the existing super large vector of Decls, `DeclsLoaded`, in the ASTReader into a array of vectors or into a map. But this may not be included in the series of the patches. The motivation for (1) is that: ``` export module a; export import b; ... ``` We want the BMI of `a` to change all the time if `b` changes. Also for the motivation example, if we changed the definition of `funcB` into `inline`, ``` //--- A.cppm export module A; export inline int funcA() { return 43; } //--- B.cppm export module B; import A; export inline int funcB() { return funcA(); } ``` We hope the BMI of module B will change after the function A changes. But the declaration level hash may be tricky, so probably we need to change the BMI of the module B after the module A changes as the initial step. So the result may be, we need to add a new hash or signature value to the BMI for named modules: ``` signature = hash_combine(hash value of the current buffer, combined hash value of the signature export imported modules, combined hash value of touched non-export import modules) ``` (Maybe we need to do this for all module types) Maybe we can reuse the existing ASTSignature or we need to add a new block. But I am sure about this point now. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = Bigcheese wrote: `ModuleFileHomingLoc` sounds a bit confusing to me. Would a better name here be `OwningModuleFile`? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = +ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : Bigcheese wrote: Should have at least a bounds check assert on the ModuleFileIndex value here. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { +if (!MF.ModuleOffsetMap.empty()) + ReadModuleOffsetMap(MF); + +auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq); +ModuleFile *ModuleFileHomingLoc = +ModuleFileIndex ? ImportedModuleFiles[][ModuleFileIndex - 1] : +return TranslateSourceLocation(*ModuleFileHomingLoc, Loc); } /// Translate a source location from another module file's source /// location space into ours. SourceLocation TranslateSourceLocation(ModuleFile , SourceLocation Loc) const { -if (!ModuleFile.ModuleOffsetMap.empty()) - ReadModuleOffsetMap(ModuleFile); -assert(ModuleFile.SLocRemap.find(Loc.getOffset()) != - ModuleFile.SLocRemap.end() && - "Cannot find offset to remap."); -SourceLocation::IntTy Remap = -ModuleFile.SLocRemap.find(Loc.getOffset())->second; -return Loc.getLocWithOffset(Remap); +if (Loc.isInvalid()) + return Loc; + +// It implies that the Loc is already translated. +if (SourceMgr.isLoadedSourceLocation(Loc)) + return Loc; + +return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2); Bigcheese wrote: Why the -2 here? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); + + assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32)); + Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32; Bigcheese wrote: Were you going to change this to only reserve 16 bits for module index? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -4078,8 +4065,8 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile ) const { return; Bigcheese wrote: There's no more SourceLocation remap so I believe this error message is wrong now. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); + + assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32)); + Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32; + return Encoded; } -inline SourceLocation -SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) { - return Seq ? Seq->decode(Encoded) - : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)); +inline std::pair +SourceLocationEncoding::decode(RawLocEncoding Encoded, + SourceLocationSequence *Seq) { + unsigned ModuleFileIndex = Encoded >> 32; + + if (!ModuleFileIndex) +return {Seq ? Seq->decode(Encoded) + : SourceLocation::getFromRawEncoding(decodeRaw(Encoded)), +ModuleFileIndex}; + + Encoded &= ((RawLocEncoding)1 << 33) - 1; Bigcheese wrote: We have `maskTrailingOnes(32)` which is a bit clearer than raw bit math. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/Bigcheese edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/Bigcheese commented: I have a few minor comments on the patch. I want to do some additional perf testing on module scanning perf because I'm a bit concerned about the cost of `ASTWriter::getRawSourceLocationEncoding`. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
jansvoboda11 wrote: I left a couple of initial comments. With the 64-bit `SourceLocation` aspect of this change solved and with just 2% size increase, I think this approach makes sense. Could you share the rest of the "no transitive change" patches, or at least their outline? I'm curious what else goes into making that work. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -696,7 +696,7 @@ class ASTReader /// Mapping from global submodule IDs to the module file in which the /// submodule resides along with the offset that should be added to the /// global submodule ID to produce a local ID. - GlobalSubmoduleMapType GlobalSubmoduleMap; + mutable GlobalSubmoduleMapType GlobalSubmoduleMap; jansvoboda11 wrote: Is `mutable` necessary here? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -2220,40 +2227,47 @@ class ASTReader return Sema::AlignPackInfo::getFromRawEncoding(Raw); } + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Read a source location from raw form and return it in its /// originating module file's source location space. - SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { + std::pair + ReadUntranslatedSourceLocation(RawLocEncoding Raw, + LocSeq *Seq = nullptr) const { return SourceLocationEncoding::decode(Raw, Seq); } /// Read a source location from raw form. - SourceLocation ReadSourceLocation(ModuleFile , -SourceLocation::UIntTy Raw, -LocSeq *Seq = nullptr) const { -SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq); -return TranslateSourceLocation(ModuleFile, Loc); + SourceLocation ReadRawSourceLocation(ModuleFile , RawLocEncoding Raw, jansvoboda11 wrote: Can we undo the rename? This is named after the return type (what the client expects) not after the argument (the raw location). Also, `ASTReader` will never read a non-"raw" source location. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -5574,10 +5577,34 @@ void ASTWriter::AddFileID(FileID FID, RecordDataImpl ) { Record.push_back(getAdjustedFileID(FID).getOpaqueValue()); } +SourceLocationEncoding::RawLocEncoding +ASTWriter::getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq) { + unsigned BaseOffset = 0; + unsigned ModuleFileIndex = 0; + + // See SourceLocationEncoding.h for the encoding details. + if (Context->getSourceManager().isLoadedSourceLocation(Loc) && + Loc.isValid()) { +assert(getChain()); +auto SLocMapI = getChain()->GlobalSLocOffsetMap.find( +SourceManager::MaxLoadedOffset - Loc.getOffset() - 1); +assert(SLocMapI != getChain()->GlobalSLocOffsetMap.end() && + "Corrupted global sloc offset map"); +ModuleFile *F = SLocMapI->second; +BaseOffset = F->SLocEntryBaseOffset - 2; +// 0 means the location is not loaded. So we need to add 1 to the index to +// make it clear. +ModuleFileIndex = F->Index + 1; +assert(()->getModuleManager()[F->Index] == F); jansvoboda11 wrote: This seems a bit tautological. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -942,6 +942,12 @@ class ASTReader /// Sema tracks these to emit deferred diags. llvm::SmallSetVector DeclsToCheckForDeferredDiags; + /// The module files imported by different module files. Indirectly imported + /// module files are included too. The information comes from + /// ReadModuleOffsetMap(ModuleFile&). + mutable llvm::DenseMap> + ImportedModuleFiles; jansvoboda11 wrote: Why does this live in `ASTReader`? If we moved it to `ModuleFile`, we could remove `mutable` and the map lookups. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
@@ -149,14 +157,44 @@ class SourceLocationSequence::State { operator SourceLocationSequence *() { return } }; -inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, - SourceLocationSequence *Seq) { - return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); +inline SourceLocationEncoding::RawLocEncoding +SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, + unsigned BaseModuleFileIndex, + SourceLocationSequence *Seq) { + // If the source location is a local source location, we can try to optimize + // the similar sequences to only record the differences. + if (!BaseOffset) +return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); + + if (Loc.isInvalid()) +return 0; + + // Otherwise, the higher bits are used to store the module file index, + // so it is meaningless to optimize the source locations into small + // integers. Let's try to always use the raw encodings. + assert(Loc.getOffset() >= BaseOffset); + Loc = Loc.getLocWithOffset(-BaseOffset); + RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); + assert(Encoded < ((RawLocEncoding)1 << 32)); jansvoboda11 wrote: My compiler says: ``` warning: result of comparison of constant 4294967296 with expression of type 'unsigned int' is always true [-Wtautological-constant-out-of-range-compare] ``` https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/Bigcheese commented: I think the general approach makes sense. I'll take a closer look at the specific changes. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: Rebase with main and enabling the sequence optimization when the module file index is 0. Now the size of the PCMs goes to 204M (originally it is 200M). It looks better. @jansvoboda11 @Bigcheese ping https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From 8e07cbdd0f348484d532d7827e4b4a7888e70978 Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 ++-- clang/include/clang/Serialization/ASTReader.h | 54 +++- clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 4 - .../Serialization/SourceLocationEncoding.h| 88 +-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 84 +++--- clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 +++ clang/test/Modules/pr61067.cppm | 25 -- .../SourceLocationEncodingTest.cpp| 12 +-- 15 files changed, 275 insertions(+), 176 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index f762116fea956c..f94b32d762effc 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -22,6 +22,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -239,8 +233,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -248,17 +244,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset, -
[clang] [Modules] No transitive source location change (PR #86912)
statham-arm wrote: > Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` > typedefs) wants to weight in here. I'm afraid my knowledge of C++ modules is very close to zero. They were mentioned in a training course I did last year, but not in much detail. On 64-bit SourceLocation in general: our patch series to implement those as an option in clang was never fully landed, because the second half of it stalled in review. I'd still like to see it finished off, though I'm sure it would need some vigorous rebasing and retesting by now. The 32-bit SourceLocation limit is a problem for at least some of our users, apparently because there's a library of header files that break the limit all by themselves. (I'm not sure how; I haven't seen them. Maybe by including each other multiple times with different `#define`s?) But I have no idea whether the same considerations would apply to modules, because I don't really know enough about modules, sorry! https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > > If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits > > for serialization. We can at most use 64 bits for a slot. In that case, we > > can only assume the offset of source location **in its own module** (not > > the global offset!) is not large than 2^32. I feel this assumption may be > > valid in a lot of places. > > Or otherwise we can use less bits for module file index (32 bits seems to > > be too much honestly), then we can use higher 16 bits to store the module > > file index, and leave the lower 48 bits to store the source location. In > > this case, the assumption becomes to "the offset of the source location may > > not large than 2^48". But it is slightly hard to believe we can reach such > > extreme cases. > > Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` > typedefs) wants to weight in here. > > > I **tried** to encode this as two separate 32bit values. But it will break > > too many codes. Since a lot of places assume that we can encode the source > > location as an uint64_t. > > What I mean is, with VBR6 format > > (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can > > save more space for small integers in **on-disk** .pcm files (the memory > > representation should be the same). For example, for a 64 bits unsigned int > > `1`, VBR6 can use only 6 bits to store that `01` to represent the 64 > > bits value `1` in the on-disk representations. So that even if I don't use > > more slots to store the module file index, the size of the .pcm files will > > increase after all. > > Right. My thinking was that single 64bit value with the module file index in > the upper 32 bits would basically disable VBR6 encoding for the lower 32 > bits. If we split this thing into two separate 32bit values, we are more > likely to VBR6 encode both of them. But this would actually increase size for > (what I assume is the most common case) local source locations. Yes, this is the trade offs. > Still, I think having a rough idea of how alternative implementations compare > would be great. > > Do you have any data on how much recompilation this can save for real world > projects? I don't have a specific data though. But I think it is understandable that the feature is super important. And we can save countless unnecessary compilations after this feature. After the patch, for reduced BMI, we're already able to avoid changing the BMI if we only change the definitions of non-inline function bodies in the module unit. Further more, based on this patch, we can do further optimizations. e.g., if we split declaration ID like we did in this patch, we may be able to avoid recompilations if a unreferenced module unit adds or deletes declarations. This may be pretty common in practice: ``` export module interface:part1; ... //--- export module interface:part2; ... //--- export module interface; export import :part1; export import :part2; //--- consumer.cppm export module consumer; import interface; // only used declarations in interface:part1; ``` Then if the user only changes module unit `interface:part2`, then we can keep the BMI for `consumer.cppm` the same. That implies every user of `consumer.cppm` can avoid recompilations. --- I see this is a killer feature for C++20 modules. I think it is significantly valuable. So I really want to make this (including later optimizations) into clang19. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
jansvoboda11 wrote: > If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for > serialization. We can at most use 64 bits for a slot. In that case, we can > only assume the offset of source location **in its own module** (not the > global offset!) is not large than 2^32. I feel this assumption may be valid > in a lot of places. > > Or otherwise we can use less bits for module file index (32 bits seems to be > too much honestly), then we can use higher 16 bits to store the module file > index, and leave the lower 48 bits to store the source location. In this > case, the assumption becomes to "the offset of the source location may not > large than 2^48". But it is slightly hard to believe we can reach such > extreme cases. Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` typedefs) wants to weight in here. > I **tried** to encode this as two separate 32bit values. But it will break > too many codes. Since a lot of places assume that we can encode the source > location as an uint64_t. > > What I mean is, with VBR6 format > (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can > save more space for small integers in **on-disk** .pcm files (the memory > representation should be the same). For example, for a 64 bits unsigned int > `1`, VBR6 can use only 6 bits to store that `01` to represent the 64 bits > value `1` in the on-disk representations. So that even if I don't use more > slots to store the module file index, the size of the .pcm files will > increase after all. Right. My thinking was that single 64bit value with the module file index in the upper 32 bits would basically disable VBR6 encoding for the lower 32 bits. If we split this thing into two separate 32 bit values, we are more likely to VBR6 encode both of them. But this might actually increase size for (what I assume is the most common case) local source locations. Still, I think having a rough idea of how alternative implementations compare would be great. Do you have any data on how much recompilation this can save for real world projects? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > > Yes, I explained this in Some low level details section. The size of source > > location won't be affected. Since the size of source location is unsigned > > (practically, it is 32 bits in most platforms). And we use uint64_t as a > > unit in the serializer. So there are 32 bit not used completely. The plan > > is to store the module file index in the higher 32 bits and it shouldn't be > > a safe problem. Maybe the original wording is not so clear. I've updated it. > > Thank you, using 64 bits in the serialization format makes sense! This also > means that whenever Clang is configured with 64 bit `SourceLocation`, we > should be using 96 bits for serialization: 32 bits for the module file index > and 64 bits for the offset itself, correct? If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location **in its own module** (not the global offset!) is not large than 2^32. I hope this may not be true. > > > The only trade-off I saw about this change is that it may increase the size > > of **on-disk** .pcm files due to we use VBR6 format to decrease the size of > > small numbers. But on the one side, we still need to pay for more spaces if > > we want to use `{local-module-index, offset-within-module} pair` (Thanks > > for the good name suggestion). On the other hand, from the experiment, it > > shows the overhead is acceptable. > > Sorry, I don't quite understand. Are you saying you did or did not try to > encode this as two separate 32bit values? I **tried** to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t. What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can save more space for small integers in **on-disk** .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int `1`, VBR6 can use only 6 bits to store that `01` to represent the 64 bits value `1` in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
jansvoboda11 wrote: > Yes, I explained this in Some low level details section. The size of source > location won't be affected. Since the size of source location is unsigned > (practically, it is 32 bits in most platforms). And we use uint64_t as a unit > in the serializer. So there are 32 bit not used completely. The plan is to > store the module file index in the higher 32 bits and it shouldn't be a safe > problem. Maybe the original wording is not so clear. I've updated it. Thank you, using 64 bits in the serialization format makes sense! This also means that whenever Clang is configured with 64 bit `SourceLocation`, we should be using 96 bits for serialization: 32 bits for the module file index and 64 bits for the offset itself, correct? > The only trade-off I saw about this change is that it may increase the size > of **on-disk** .pcm files due to we use VBR6 format to decrease the size of > small numbers. But on the one side, we still need to pay for more spaces if > we want to use `{local-module-index, offset-within-module} pair` (Thanks for > the good name suggestion). On the other hand, from the experiment, it shows > the overhead is acceptable. Sorry, I don't quite understand. Are you saying you did or did not try to encode this as two separate 32bit values? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: > By default, `SourceLocation` is 32 bits. One bit is used to distinguish macro > expansions. Looking at libc++'s module map, it contains 999 top-level modules > at this moment. That's 10 bits just to be able to import the (entire) > standard library. That leaves 21 bits, restricting local `SourceLocation` > space to 2 MB. This doesn't sound feasible. Did I misunderstand? Yes, I explained this in `Some low level details` section. The size of source location won't be affected. Since the size of source location is unsigned (practically, it is 32 bits in most platforms). And we use uint64_t as a unit in the serializer. So there are 32 bit not used completely. The plan is to store the module file index in the higher 32 bits and it shouldn't be a safe problem. The only trade-off I saw about this change is that it may increase the size of **on-disk** .pcm files due to we use VBR6 format to decrease the size of small numbers. But on the one side, we still need to pay for more spaces if we want to use `{local-module-index, offset-within-module} pair` (Thanks for the good name suggestion). On the other hand, from the experiment, it shows the overhead is acceptable. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
jansvoboda11 wrote: By default, `SourceLocation` is 32 bits. One bit is used to distinguish macro expansions. Looking at libc++'s module map, it contains 999 top-level modules at this moment. That's 10 bits just to be able to import the (entire) standard library. That leaves 21 bits, restricting local `SourceLocation` space to 2 MB. This doesn't sound feasible. Did I misunderstand? https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
jyknight wrote: +1 on the high-level plan. Switching from a linear offset to a {local-module-index, offset-within-module} pair sounds great! https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
ChuanqiXu9 wrote: BTW, after this patch, with reduced BMI (https://github.com/llvm/llvm-project/pull/85050), we can already do something more interesting than reformating: ``` //--- A.cppm export module A; int funcA0(); int funcA1(); export int funcA() { return funcA0(); } //--- A.v1.cppm export module A; int funcA0(); int funcA1(); export int funcA() { return funcA0() + funcA1(); } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } ``` Now the B.pcm will keep unchanged with `A.pcm` from `A.cppm` and `A.v1.pcm` from `A.v1.cppm`. We changed the implementation of `funcA()` from `return funcA0();` to `return funcA0() + funcA1();`. And the `B.pcm` can still get the same content. https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Chuanqi Xu (ChuanqiXu9) Changes This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting. The idea comes from @jyknight posted on LLVM discourse. That for: ``` // A.cppm export module A; ... // B.cppm export module B; import A; ... //--- C.cppm export module C; import C; ``` Almost every time A.cppm changes, we need to recompile `C`. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B. # Motivation Example This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test. ``` //--- A.cppm export module A; export template class T struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- A.v1.cppm export module A; export template class T struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } //--- C.cppm export module C; import A; export void testD() { Cint c; c.func(); } ``` Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ. # Internal perspective of status quo To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them. For source locations, we encoded them as: ``` | | | _ base offset of an imported module | | | |_ base offset of another imported module | | | | | ___ 0 ``` As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset. For example, for, ``` // a.cppm export module a; ... // b.cppm export module b; import a; ... ``` Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`. And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of module `a`, 90 in the BMI of `b`. Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve. Let's continue with the existing design to understand what's going on. Another interesting case is: ``` // c.cppm export module c; import a; import b; ... ``` In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`. Then we're almost done about how we handle the offset of source locations in serializers. # The high level design of current patch >From the abstract level, what we want to do is to remove the hardcoded base >offset of imported modules and remain the ability to calculate the source
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 ready_for_review https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/86912 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] No transitive source location change (PR #86912)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/86912 >From 8d4e349710f4ca84a7ad2dd8aa71afa50b730dbb Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Mon, 18 Mar 2024 08:36:55 +0800 Subject: [PATCH] [Modules] No transitive source location change --- clang/include/clang/Basic/SourceLocation.h| 1 + .../include/clang/Serialization/ASTBitCodes.h | 56 ++-- clang/include/clang/Serialization/ASTReader.h | 54 +++- clang/include/clang/Serialization/ASTWriter.h | 4 + .../include/clang/Serialization/ModuleFile.h | 4 - .../Serialization/SourceLocationEncoding.h| 78 +++-- clang/lib/Frontend/ASTUnit.cpp| 2 - clang/lib/Serialization/ASTReader.cpp | 86 +++ clang/lib/Serialization/ASTReaderDecl.cpp | 2 +- clang/lib/Serialization/ASTWriter.cpp | 41 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 8 +- clang/lib/Serialization/ModuleFile.cpp| 1 - .../no-transitive-source-location-change.cppm | 69 +++ clang/test/Modules/pr61067.cppm | 25 -- .../SourceLocationEncodingTest.cpp| 66 ++ 15 files changed, 264 insertions(+), 233 deletions(-) create mode 100644 clang/test/Modules/no-transitive-source-location-change.cppm diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 00b1e0fa855b7a..7a0f5ba8d1270b 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -90,6 +90,7 @@ class SourceLocation { friend class ASTWriter; friend class SourceManager; friend struct llvm::FoldingSetTrait; + friend class SourceLocationEncoding; public: using UIntTy = uint32_t; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index f31efa5117f0d1..628ce03572fea6 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -22,6 +22,7 @@ #include "clang/Basic/IdentifierTable.h" #include "clang/Basic/OperatorKinds.h" #include "clang/Basic/SourceLocation.h" +#include "clang/Serialization/SourceLocationEncoding.h" #include "llvm/ADT/DenseMapInfo.h" #include "llvm/Bitstream/BitCodes.h" #include @@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1; /// Source range/offset of a preprocessed entity. struct PPEntityOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase. uint32_t BitOffset; - PPEntityOffset(SourceRange R, uint32_t BitOffset) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()), -BitOffset(BitOffset) {} - - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } + PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset) + : Begin(Begin), End(End), BitOffset(BitOffset) {} - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Source range of a skipped preprocessor region struct PPSkippedRange { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location of beginning of range. - SourceLocation::UIntTy Begin; + RawLocEncoding Begin; /// Raw source location of end of range. - SourceLocation::UIntTy End; + RawLocEncoding End; - PPSkippedRange(SourceRange R) - : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) { - } + PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End) + : Begin(Begin), End(End) {} - SourceLocation getBegin() const { -return SourceLocation::getFromRawEncoding(Begin); - } - SourceLocation getEnd() const { -return SourceLocation::getFromRawEncoding(End); - } + RawLocEncoding getBegin() const { return Begin; } + RawLocEncoding getEnd() const { return End; } }; /// Offset in the AST file. Use splitted 64-bit integer into low/high @@ -239,8 +233,10 @@ struct UnderalignedInt64 { /// Source location and bit offset of a declaration. struct DeclOffset { + using RawLocEncoding = SourceLocationEncoding::RawLocEncoding; + /// Raw source location. - SourceLocation::UIntTy Loc = 0; + RawLocEncoding RawLoc = 0; /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep /// structure alignment 32-bit and avoid padding gap because undefined @@ -248,17 +244,15 @@ struct DeclOffset { UnderalignedInt64 BitOffset; DeclOffset() = default; - DeclOffset(SourceLocation Loc, uint64_t BitOffset,