[clang] [Serialization] Storing DeclID separately (PR #95897)
ChuanqiXu9 wrote: > Thanks for this patch, I will try it on our codebase to see the effects of it > and report back to you. I am also trying to understand in which cases this > would be a win and when not. IIUC, this is always a win when `ModuleFileIndex > != 0` and a pessimization otherwise. Yes, I think. > > When is `ModuleFileIndex == 0`? Does this only happen for predefined decls > (that we never deserialize) or also for the declarations owned by the > currently written module? Yes, this is what I thought. https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
ilya-biryukov wrote: Thanks for this patch, I will try it on our codebase to see the effects of it and report back to you. I am also trying to understand in which cases this would be a win and when not. IIUC, this is always a win when `ModuleFileIndex != 0` and a pessimization otherwise. When is `ModuleFileIndex == 0`? Does this only happen for predefined decls (that we never deserialize) or also for the declarations owned by the currently written module? https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
ilya-biryukov wrote: Sorry for not getting to it today, I'll send a review tomorrow. https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
ChuanqiXu9 wrote: @ilya-biryukov I think this is ready to review and test. I think this can mitigate the size increase with the DeclID change. https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Chuanqi Xu (ChuanqiXu9) Changes In https://github.com/llvm/llvm-project/pull/92083, there are some unaddressed question about the increased size of PCMs. And I prepared this draft to try to address this. This is patch is not ready for review yet. There are many places to polish. But I believe it is functional correct for how sensitive the DeclID is. So this is majorly for testing and discussing higher level ideas. (I may land some cleanup codes directly. So this may have conflict with main) To understand the problem better, we need to understand how we encodes values in PCM. We're using VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer) now to record values except blobs. In VBR6 format, the value '1' is stored as `0x1` directly no matter its type. And the value `0xFF` will be stored as `0b000111'11`. The first `1` in the right hand side of `'` indicates the higher bits still contirbute to the current value. And the 12th bit is 0 so that the higher value doesn't relate to the current value. Then we can combine the lower 5 bits and the higher 6 bits to get the value `0xFF`. And for my patch, it changes the DeclID from consecutive numbers to `ModuleFileIndex, LocalDeclID` pairs. So that when we record a reference to an external declaration, the serialized data in the disk may be much larger. But it is not a 100% regression, when we record a reference to a local declaration (where `ModuleFileIndex` is `0`), the serialized data in the disk may be smaller since the `LocalDeclID` won't be based on the sum of number of imported declarations. So for the question, the un-strict upper bound in **theory** may be the case that we have many modules (more than 2^16) but super less declarations (less than 2^5). Then the estimated upper bound may be near 10x. Since we may need more than 50 bits to store the index now but previously we can only use 6 bits to do that. But I guess this won't be true in practice. In practice, how the size changes with the last patch should be highly dependent to the code bases. Then for this patch, what I did is, to store the `ModuleFileIndex, LocalDeclID` pair separately instead of putting them into the one slot. This should align to the design more naturally. But from I mentioned above, we can see this may not be 100% pure win. Since for local declarations, now we need additional 6 bits to present the index. And for my local use cases, since we don't have so many modules, I see the size of PCMs increases very slightly after this patch. For landing plans, this patch doesn't have a lot priority in my side, I am chasing https://github.com/llvm/llvm-project/pull/92085 https://github.com/llvm/llvm-project/pull/92511 more now. And from practice, I feel the actual increased size doesn't affect us practically. So I'd like to reduce the size more after landing above patches. Also we need to do the similar things for source locations, identifier IDs and type IDs. But all of them may not have such a strong impact as DeclID has. --- Patch is 34.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95897.diff 11 Files Affected: - (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3) - (modified) clang/include/clang/Serialization/ASTReader.h (+2-3) - (modified) clang/include/clang/Serialization/ASTRecordReader.h (+19) - (modified) clang/include/clang/Serialization/ASTRecordWriter.h (+27) - (modified) clang/lib/Serialization/ASTReader.cpp (+23-21) - (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+8-10) - (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+5-4) - (modified) clang/lib/Serialization/ASTWriter.cpp (+19-9) - (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+60-52) - (modified) clang/test/Modules/codegen-nodep.test (+1-1) - (modified) clang/test/Modules/decl-params-determinisim.m (+8-8) ``diff diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index 38502a23f805e..977fa7359ef1b 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -285,6 +285,9 @@ using unaligned_decl_id_t = serialization::DeclID, llvm::endianness::native, llvm::support::unaligned>; +/// The number of slots needed to record a DeclID in bitstreams. +const unsigned int DeclIDSerialiazedSize = 2; + /// The number of predefined preprocessed entity IDs. const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1; diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h index f41c473c97cd9..ab257314e2c74 100644 --- a/clang/include/clang/Serialization/ASTReader.h +++ b/clang/include/clang/Serialization/ASTReader.h @@ -592,7 +592,7 @@ class ASTReader /// An array of lexical contents of a declaration context, as a sequence
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 ready_for_review https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/95897 >From 0e6b0ee59605d28bb031d8c2fb70fb853d853605 Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Tue, 18 Jun 2024 11:28:03 +0800 Subject: [PATCH] [Serialization] Store DeclID in two slots to utilize VBR6 format --- .../include/clang/Serialization/ASTBitCodes.h | 3 + clang/include/clang/Serialization/ASTReader.h | 5 +- .../clang/Serialization/ASTRecordReader.h | 19 +++ .../clang/Serialization/ASTRecordWriter.h | 27 + clang/lib/Serialization/ASTReader.cpp | 44 +++ clang/lib/Serialization/ASTReaderDecl.cpp | 18 ++- clang/lib/Serialization/ASTReaderStmt.cpp | 9 +- clang/lib/Serialization/ASTWriter.cpp | 28 +++-- clang/lib/Serialization/ASTWriterDecl.cpp | 112 ++ clang/test/Modules/codegen-nodep.test | 2 +- clang/test/Modules/decl-params-determinisim.m | 16 +-- 11 files changed, 175 insertions(+), 108 deletions(-) diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index 38502a23f805e..977fa7359ef1b 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -285,6 +285,9 @@ using unaligned_decl_id_t = serialization::DeclID, llvm::endianness::native, llvm::support::unaligned>; +/// The number of slots needed to record a DeclID in bitstreams. +const unsigned int DeclIDSerialiazedSize = 2; + /// The number of predefined preprocessed entity IDs. const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1; diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h index f41c473c97cd9..ab257314e2c74 100644 --- a/clang/include/clang/Serialization/ASTReader.h +++ b/clang/include/clang/Serialization/ASTReader.h @@ -592,7 +592,7 @@ class ASTReader /// An array of lexical contents of a declaration context, as a sequence of /// Decl::Kind, DeclID pairs. - using LexicalContents = ArrayRef; + using LexicalContents = ArrayRef; /// Map from a DeclContext to its lexical contents. llvm::DenseMap> @@ -945,8 +945,7 @@ class ASTReader SmallVector DelayedDeleteExprs; // A list of late parsed template function data with their module files. - SmallVector>, 4> - LateParsedTemplates; + SmallVector, 4> LateParsedTemplates; /// The IDs of all decls to be checked for deferred diags. /// diff --git a/clang/include/clang/Serialization/ASTRecordReader.h b/clang/include/clang/Serialization/ASTRecordReader.h index 2561418b78ca7..ae0254214d0c9 100644 --- a/clang/include/clang/Serialization/ASTRecordReader.h +++ b/clang/include/clang/Serialization/ASTRecordReader.h @@ -86,6 +86,11 @@ class ASTRecordReader /// Skips the specified number of values. void skipInts(unsigned N) { Idx += N; } + /// Skips the specified number of DeclIDs. + void skipDeclRefs(unsigned N) { +Idx += N * serialization::DeclIDSerialiazedSize; + } + /// Retrieve the global submodule ID its local ID number. serialization::SubmoduleID getGlobalSubmoduleID(unsigned LocalID) { @@ -187,12 +192,26 @@ class ASTRecordReader /// Reads a declaration from the given position in a record in the /// given module, advancing Idx. Decl *readDecl() { +#ifndef NDEBUG +unsigned OldIdx = Idx; +Decl *D = Reader->ReadDecl(*F, Record, Idx); +assert(Idx - OldIdx == serialization::DeclIDSerialiazedSize); +return D; +#endif return Reader->ReadDecl(*F, Record, Idx); } Decl *readDeclRef() { return readDecl(); } + template void readDeclArray(Func &) { +unsigned LengthOfArray = readInt(); +unsigned End = Idx + LengthOfArray; + +while (Idx < End) + ConsumeFunc(readDeclAs()); + } + /// Reads a declaration from the given position in the record, /// advancing Idx. /// diff --git a/clang/include/clang/Serialization/ASTRecordWriter.h b/clang/include/clang/Serialization/ASTRecordWriter.h index 0c8ac75fc40f4..30e786510ac14 100644 --- a/clang/include/clang/Serialization/ASTRecordWriter.h +++ b/clang/include/clang/Serialization/ASTRecordWriter.h @@ -234,12 +234,39 @@ class ASTRecordWriter /// Emit a reference to a declaration. void AddDeclRef(const Decl *D) { +#ifndef NDEBUG +unsigned OldSize = size(); +Writer->AddDeclRef(D, *Record); +assert(size() - OldSize == serialization::DeclIDSerialiazedSize); +return; +#endif return Writer->AddDeclRef(D, *Record); } void writeDeclRef(const Decl *D) { AddDeclRef(D); } + void writeNullDeclRef() { +#ifndef NDEBUG +unsigned OldSize = size(); +#endif + +push_back(0); +push_back(0); + +#ifndef NDEBUG +assert(size() - OldSize == serialization::DeclIDSerialiazedSize); +#endif + } + + template void writeDeclArray(ArrayRef Array) { +unsigned ElementNum = Array.size(); +push_back(ElementNum *
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/95897 >From b6d1326fdee4f31c6f6e32783c690b7ae2a4dedb Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Tue, 18 Jun 2024 11:28:03 +0800 Subject: [PATCH] Draft --- clang/include/clang/AST/DeclID.h | 2 + .../include/clang/Serialization/ASTBitCodes.h | 3 + clang/include/clang/Serialization/ASTReader.h | 5 +- .../clang/Serialization/ASTRecordReader.h | 14 ++ .../clang/Serialization/ASTRecordWriter.h | 27 +++ clang/lib/Serialization/ASTReader.cpp | 84 - clang/lib/Serialization/ASTReaderDecl.cpp | 18 +- clang/lib/Serialization/ASTReaderStmt.cpp | 6 +- clang/lib/Serialization/ASTWriter.cpp | 28 ++- clang/lib/Serialization/ASTWriterDecl.cpp | 173 +++--- 10 files changed, 233 insertions(+), 127 deletions(-) diff --git a/clang/include/clang/AST/DeclID.h b/clang/include/clang/AST/DeclID.h index e8f4860e13f1f..5f7c362c6701a 100644 --- a/clang/include/clang/AST/DeclID.h +++ b/clang/include/clang/AST/DeclID.h @@ -197,6 +197,8 @@ class LocalDeclID : public DeclIDBase { static LocalDeclID get(ASTReader , serialization::ModuleFile , DeclID ID); + static LocalDeclID get(ASTReader , serialization::ModuleFile , + unsigned ModuleFileIndex, unsigned LocalDeclID); LocalDeclID ++() { ++ID; diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index 9f0f900a02914..8d789683b3164 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -261,6 +261,9 @@ using unaligned_decl_id_t = serialization::DeclID, llvm::endianness::native, llvm::support::unaligned>; +/// The number of slots needed to record a DeclID in bitstreams. +const unsigned int DeclIDRefSize = 2; + /// The number of predefined preprocessed entity IDs. const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1; diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h index 480f852e3bf07..980c43c7db2cb 100644 --- a/clang/include/clang/Serialization/ASTReader.h +++ b/clang/include/clang/Serialization/ASTReader.h @@ -600,7 +600,7 @@ class ASTReader /// An array of lexical contents of a declaration context, as a sequence of /// Decl::Kind, DeclID pairs. - using LexicalContents = ArrayRef; + using LexicalContents = ArrayRef; /// Map from a DeclContext to its lexical contents. llvm::DenseMap> @@ -961,8 +961,7 @@ class ASTReader SmallVector DelayedDeleteExprs; // A list of late parsed template function data with their module files. - SmallVector>, 4> - LateParsedTemplates; + SmallVector, 4> LateParsedTemplates; /// The IDs of all decls to be checked for deferred diags. /// diff --git a/clang/include/clang/Serialization/ASTRecordReader.h b/clang/include/clang/Serialization/ASTRecordReader.h index d00fb182f05f4..003bb592d188b 100644 --- a/clang/include/clang/Serialization/ASTRecordReader.h +++ b/clang/include/clang/Serialization/ASTRecordReader.h @@ -187,12 +187,26 @@ class ASTRecordReader /// Reads a declaration from the given position in a record in the /// given module, advancing Idx. Decl *readDecl() { +#ifndef NDEBUG +unsigned OldIdx = Idx; +Decl *D = Reader->ReadDecl(*F, Record, Idx); +assert(Idx - OldIdx == serialization::DeclIDRefSize); +return D; +#endif return Reader->ReadDecl(*F, Record, Idx); } Decl *readDeclRef() { return readDecl(); } + template void readDeclArray(Func &) { +unsigned LengthOfArray = readInt(); +unsigned End = Idx + LengthOfArray; + +while (Idx < End) + ConsumeFunc(readDeclAs()); + } + /// Reads a declaration from the given position in the record, /// advancing Idx. /// diff --git a/clang/include/clang/Serialization/ASTRecordWriter.h b/clang/include/clang/Serialization/ASTRecordWriter.h index 0c8ac75fc40f4..ed96fbc310096 100644 --- a/clang/include/clang/Serialization/ASTRecordWriter.h +++ b/clang/include/clang/Serialization/ASTRecordWriter.h @@ -234,12 +234,39 @@ class ASTRecordWriter /// Emit a reference to a declaration. void AddDeclRef(const Decl *D) { +#ifndef NDEBUG +unsigned OldSize = size(); +Writer->AddDeclRef(D, *Record); +assert(size() - OldSize == serialization::DeclIDRefSize); +return; +#endif return Writer->AddDeclRef(D, *Record); } void writeDeclRef(const Decl *D) { AddDeclRef(D); } + void writeNullDeclRef() { +#ifndef NDEBUG +unsigned OldSize = size(); +#endif + +push_back(0); +push_back(0); + +#ifndef NDEBUG +assert(size() - OldSize == serialization::DeclIDRefSize); +#endif + } + + template void writeDeclArray(ArrayRef Array) { +unsigned ElementNum = Array.size(); +push_back(ElementNum * serialization::DeclIDRefSize); + +for
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 edited https://github.com/llvm/llvm-project/pull/95897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Serialization] Storing DeclID separately (PR #95897)
https://github.com/ChuanqiXu9 created https://github.com/llvm/llvm-project/pull/95897 In https://github.com/llvm/llvm-project/pull/92083, there are some unaddressed question about the increased size of PCMs. And I prepared this draft to try to address this. This is patch is not ready for review yet. There are many places to polish. But I believe it is functional correct for how sensitive the DeclID is. So this is majorly for testing and discussing higher level ideas. To understand the problem better, we need to understand how we encodes values in PCM now. We're using VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer) now to record values except blobs. In VBR6 format, the value '1' is stored as `0x1` directly no matter its type. And the value `0xFF` will be stored as `0b000111'11`. The first `1` in the right hand side of `'` indicates the higher bits still contirbute to the current value. And the 12th bit is 0 so that the higher value doesn't relate to the current value. Then we can combine the lower 5 bits and the higher 6 bits to get the value `0xFF`. And for my patch, it changes the DeclID from consecutive numbers to `` pairs. So that when we record a reference to an external declaration, the serialized data in the disk may be much larger. But it is not a 100% regression, when we record a reference to a local declaration (where `ModuleFileIndex` is `0`), the serialized data in the disk may be smaller since the `LocalDeclID` won't be based on the sum of number of imported declarations. So for the question, the un-strict upper bound in **theory** may be the case that we have many modules (more than 2^16) but super less declarations (less than 2^5). Then the estimated upper bound may be near 10x. Since we may need more than 50 bits to store the index now but previously we can only use 6 bits to do that. But I guess this won't be true in practice. In practice, how the size changes with the last patch should be highly dependent to the code bases. Then for this patch, what I did is, to store the `` pair separately instead of putting them into the one slot. This should align to the design more naturally. But from I mentioned above, we can see this may not be 100% pure win. Since for local declarations, now we need additional 6 bits to present the index. And for my local use cases, since we don't have so many modules, I see the size of PCMs increases very slightly after this patch. For landing plans, this patch doesn't have a lot priority in my side, I am chasing https://github.com/llvm/llvm-project/pull/92085 https://github.com/llvm/llvm-project/pull/92511 more now. And from practice, I feel the actual increased size doesn't affect us practically. So I'd like to reduce the size more after landing above patches. Also we need to do the similar things for source locations, identifier IDs and type IDs. But all of them may not have such a strong impact as DeclID has. >From 9c1755c55509f38c256b7f0191b01d674b7cb619 Mon Sep 17 00:00:00 2001 From: Chuanqi Xu Date: Tue, 18 Jun 2024 11:28:03 +0800 Subject: [PATCH] Draft --- .../include/clang/Serialization/ASTBitCodes.h | 3 + clang/include/clang/Serialization/ASTReader.h | 10 +- .../clang/Serialization/ASTRecordReader.h | 15 ++ .../clang/Serialization/ASTRecordWriter.h | 28 +++ clang/lib/Serialization/ASTReader.cpp | 159 clang/lib/Serialization/ASTReaderDecl.cpp | 29 +-- clang/lib/Serialization/ASTReaderStmt.cpp | 5 +- clang/lib/Serialization/ASTWriter.cpp | 29 ++- clang/lib/Serialization/ASTWriterDecl.cpp | 169 -- clang/utils/TableGen/ClangAttrEmitter.cpp | 4 +- 10 files changed, 280 insertions(+), 171 deletions(-) diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h index a4728b1c06b3f..d1f8545052fe8 100644 --- a/clang/include/clang/Serialization/ASTBitCodes.h +++ b/clang/include/clang/Serialization/ASTBitCodes.h @@ -261,6 +261,9 @@ using unaligned_decl_id_t = serialization::DeclID, llvm::endianness::native, llvm::support::unaligned>; +/// The number of slots needed to record a DeclID in bitstreams. +const unsigned int DeclIDRefSize = 2; + /// The number of predefined preprocessed entity IDs. const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1; diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h index 08f302c01f538..c6c4a82bcddec 100644 --- a/clang/include/clang/Serialization/ASTReader.h +++ b/clang/include/clang/Serialization/ASTReader.h @@ -599,7 +599,7 @@ class ASTReader /// An array of lexical contents of a declaration context, as a sequence of /// Decl::Kind, DeclID pairs. - using LexicalContents = ArrayRef; + using LexicalContents = ArrayRef; /// Map from a DeclContext to its lexical contents. llvm::DenseMap> @@ -960,7 +960,7 @@ class