[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-06-03 Thread Chuanqi Xu via llvm-branch-commits

ChuanqiXu9 wrote:

@jansvoboda11 ping. Did my answers resolve your concerns?

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-27 Thread Chuanqi Xu via llvm-branch-commits


@@ -3896,7 +3903,7 @@ void ASTWriter::WriteIdentifierTable(Preprocessor ,
 
   // Write out identifiers if either the ID is local or the identifier has
   // changed since it was loaded.
-  if (ID >= FirstIdentID || !Chain || !II->isFromAST() ||
+  if (isLocalIdentifierID(ID) || !Chain || !II->isFromAST() ||

ChuanqiXu9 wrote:

Maybe it is because the name `FirstIdentID` is confusing. The proper name may 
be `FirstLocalIdentID`.

Previously, before this patch, `FirstIdentID` will be `Number of identifier ID` 
+ 1. This is the reason why the encoding of Identifier IDs may be affected by 
imported modules. This is also the major motivation of the patch. 

So all the ID bigger or equal to FirstIdentID must be the ID for local 
identifiers previously. But it won't be the case we make this patch.

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-27 Thread Chuanqi Xu via llvm-branch-commits


@@ -918,7 +918,7 @@ ASTSelectorLookupTrait::ReadKey(const unsigned char* d, 
unsigned) {
   SelectorTable  = Reader.getContext().Selectors;
   unsigned N = endian::readNext(d);
   const IdentifierInfo *FirstII = Reader.getLocalIdentifier(
-  F, endian::readNext(d));
+  F, endian::readNext(d));

ChuanqiXu9 wrote:

If `IdentifierID` is not integral type, the code can't compile 
(`endian::readNext` won't accept that). So I feel it might not be so useful.

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-27 Thread Chuanqi Xu via llvm-branch-commits


@@ -124,7 +124,7 @@ struct HeaderFileInfo {
   /// This ID number will be non-zero when there is a controlling
   /// macro whose IdentifierInfo may not yet have been loaded from
   /// external storage.
-  unsigned ControllingMacroID = 0;
+  uint64_t ControllingMacroID = 0;

ChuanqiXu9 wrote:

Done by adding `LazyIdentifierInfoPtr` to `ExternalPreprocessorSource.h`.

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-27 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/92085

>From a2fb7f50161932a9557a22a4ba23f827e80a4d6b Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Tue, 14 May 2024 15:33:12 +0800
Subject: [PATCH] [Serialization] No transitive identifier change

---
 .../clang/Lex/ExternalPreprocessorSource.h|  54 -
 clang/include/clang/Lex/HeaderSearch.h|  12 +-
 .../include/clang/Serialization/ASTBitCodes.h |   2 +-
 clang/include/clang/Serialization/ASTReader.h |  19 ++-
 .../include/clang/Serialization/ModuleFile.h  |   3 -
 clang/lib/Lex/HeaderSearch.cpp|  33 +++---
 clang/lib/Serialization/ASTReader.cpp |  98 
 clang/lib/Serialization/ASTWriter.cpp |  63 ++
 clang/lib/Serialization/GlobalModuleIndex.cpp |   3 +-
 clang/lib/Serialization/ModuleFile.cpp|   1 -
 .../no-transitive-identifier-change.cppm  | 110 ++
 11 files changed, 286 insertions(+), 112 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-identifier-change.cppm

diff --git a/clang/include/clang/Lex/ExternalPreprocessorSource.h 
b/clang/include/clang/Lex/ExternalPreprocessorSource.h
index 6775841860373..48429948dbffe 100644
--- a/clang/include/clang/Lex/ExternalPreprocessorSource.h
+++ b/clang/include/clang/Lex/ExternalPreprocessorSource.h
@@ -36,12 +36,64 @@ class ExternalPreprocessorSource {
   /// Return the identifier associated with the given ID number.
   ///
   /// The ID 0 is associated with the NULL identifier.
-  virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;
+  virtual IdentifierInfo *GetIdentifier(uint64_t ID) = 0;
 
   /// Map a module ID to a module.
   virtual Module *getModule(unsigned ModuleID) = 0;
 };
 
+// Either a pointer to an IdentifierInfo of the controlling macro or the ID
+// number of the controlling macro.
+class LazyIdentifierInfoPtr {
+  // If the low bit is clear, a pointer to the IdentifierInfo. If the low
+  // bit is set, the upper 63 bits are the ID number.
+  mutable uint64_t Ptr = 0;
+
+public:
+  LazyIdentifierInfoPtr() = default;
+
+  explicit LazyIdentifierInfoPtr(const IdentifierInfo *Ptr)
+  : Ptr(reinterpret_cast(Ptr)) {}
+
+  explicit LazyIdentifierInfoPtr(uint64_t ID) : Ptr((ID << 1) | 0x01) {
+assert((ID << 1 >> 1) == ID && "ID must require < 63 bits");
+if (ID == 0)
+  Ptr = 0;
+  }
+
+  LazyIdentifierInfoPtr =(const IdentifierInfo *Ptr) {
+this->Ptr = reinterpret_cast(Ptr);
+return *this;
+  }
+
+  LazyIdentifierInfoPtr =(uint64_t ID) {
+assert((ID << 1 >> 1) == ID && "IDs must require < 63 bits");
+if (ID == 0)
+  Ptr = 0;
+else
+  Ptr = (ID << 1) | 0x01;
+
+return *this;
+  }
+
+  /// Whether this pointer is non-NULL.
+  ///
+  /// This operation does not require the AST node to be deserialized.
+  bool isValid() const { return Ptr != 0; }
+
+  /// Whether this pointer is currently stored as ID.
+  bool isID() const { return Ptr & 0x01; }
+
+  IdentifierInfo *getPtr() const {
+assert(!isID());
+return reinterpret_cast(Ptr);
+  }
+
+  uint64_t getID() const {
+assert(isID());
+return Ptr >> 1;
+  }
+};
 }
 
 #endif
diff --git a/clang/include/clang/Lex/HeaderSearch.h 
b/clang/include/clang/Lex/HeaderSearch.h
index 5ac634d4e..65700b8f9dc11 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -16,6 +16,7 @@
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Basic/SourceManager.h"
 #include "clang/Lex/DirectoryLookup.h"
+#include "clang/Lex/ExternalPreprocessorSource.h"
 #include "clang/Lex/HeaderMap.h"
 #include "clang/Lex/ModuleMap.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -119,13 +120,6 @@ struct HeaderFileInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsValid : 1;
 
-  /// The ID number of the controlling macro.
-  ///
-  /// This ID number will be non-zero when there is a controlling
-  /// macro whose IdentifierInfo may not yet have been loaded from
-  /// external storage.
-  unsigned ControllingMacroID = 0;
-
   /// If this file has a \#ifndef XXX (or equivalent) guard that
   /// protects the entire contents of the file, this is the identifier
   /// for the macro that controls whether or not it has any effect.
@@ -134,7 +128,7 @@ struct HeaderFileInfo {
   /// the controlling macro of this header, since
   /// getControllingMacro() is able to load a controlling macro from
   /// external storage.
-  const IdentifierInfo *ControllingMacro = nullptr;
+  LazyIdentifierInfoPtr LazyControllingMacro;
 
   /// If this header came from a framework include, this is the name
   /// of the framework.
@@ -580,7 +574,7 @@ class HeaderSearch {
   /// no-op \#includes.
   void SetFileControllingMacro(FileEntryRef File,
const IdentifierInfo *ControllingMacro) {
-getFileInfo(File).ControllingMacro = ControllingMacro;
+getFileInfo(File).LazyControllingMacro = ControllingMacro;
   }
 
   /// Determine 

[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-24 Thread Jan Svoboda via llvm-branch-commits


@@ -3896,7 +3903,7 @@ void ASTWriter::WriteIdentifierTable(Preprocessor ,
 
   // Write out identifiers if either the ID is local or the identifier has
   // changed since it was loaded.
-  if (ID >= FirstIdentID || !Chain || !II->isFromAST() ||
+  if (isLocalIdentifierID(ID) || !Chain || !II->isFromAST() ||

jansvoboda11 wrote:

This diff makes it seem that `ID >= FirstIdentID` is equivalent to 
`isLocalIdentifierID(ID)`, which I don't think is the case. Can you explain 
what's going on here?

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-24 Thread Jan Svoboda via llvm-branch-commits


@@ -918,7 +918,7 @@ ASTSelectorLookupTrait::ReadKey(const unsigned char* d, 
unsigned) {
   SelectorTable  = Reader.getContext().Selectors;
   unsigned N = endian::readNext(d);
   const IdentifierInfo *FirstII = Reader.getLocalIdentifier(
-  F, endian::readNext(d));
+  F, endian::readNext(d));

jansvoboda11 wrote:

I think having some kind of `static_assert` that `IdentifierID` is an integral 
type would be helpful. Maybe that'd be useful even within `endian::readNext()`?

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-24 Thread Jan Svoboda via llvm-branch-commits


@@ -124,7 +124,7 @@ struct HeaderFileInfo {
   /// This ID number will be non-zero when there is a controlling
   /// macro whose IdentifierInfo may not yet have been loaded from
   /// external storage.
-  unsigned ControllingMacroID = 0;
+  uint64_t ControllingMacroID = 0;

jansvoboda11 wrote:

This increases the size of `HeaderFileInfo` from 32 to 40 bytes. Can we squash 
this member with `ControllingMacro` into some kind of tagged pointer & 
`uint64_t` union to save space?

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-19 Thread Chuanqi Xu via llvm-branch-commits

ChuanqiXu9 wrote:

@jansvoboda11 @Bigcheese gentle ping

https://github.com/llvm/llvm-project/pull/92085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-14 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/92085

>From c612b56dec8bfc7c1612e94be8876316f14ea8ea Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Tue, 14 May 2024 15:33:12 +0800
Subject: [PATCH] [Serialization] No transitive identifier change

---
 .../clang/Lex/ExternalPreprocessorSource.h|   2 +-
 clang/include/clang/Lex/HeaderSearch.h|   2 +-
 .../include/clang/Serialization/ASTBitCodes.h |   2 +-
 clang/include/clang/Serialization/ASTReader.h |  19 ++-
 .../include/clang/Serialization/ModuleFile.h  |   3 -
 clang/lib/Serialization/ASTReader.cpp |  96 ---
 clang/lib/Serialization/ASTWriter.cpp |  61 ++
 clang/lib/Serialization/GlobalModuleIndex.cpp |   3 +-
 clang/lib/Serialization/ModuleFile.cpp|   1 -
 .../no-transitive-identifier-change.cppm  | 110 ++
 10 files changed, 214 insertions(+), 85 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-identifier-change.cppm

diff --git a/clang/include/clang/Lex/ExternalPreprocessorSource.h 
b/clang/include/clang/Lex/ExternalPreprocessorSource.h
index 6775841860373..bd5c11a4577f5 100644
--- a/clang/include/clang/Lex/ExternalPreprocessorSource.h
+++ b/clang/include/clang/Lex/ExternalPreprocessorSource.h
@@ -36,7 +36,7 @@ class ExternalPreprocessorSource {
   /// Return the identifier associated with the given ID number.
   ///
   /// The ID 0 is associated with the NULL identifier.
-  virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;
+  virtual IdentifierInfo *GetIdentifier(uint64_t ID) = 0;
 
   /// Map a module ID to a module.
   virtual Module *getModule(unsigned ModuleID) = 0;
diff --git a/clang/include/clang/Lex/HeaderSearch.h 
b/clang/include/clang/Lex/HeaderSearch.h
index 5ac634d4e..cb75dd429c448 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -124,7 +124,7 @@ struct HeaderFileInfo {
   /// This ID number will be non-zero when there is a controlling
   /// macro whose IdentifierInfo may not yet have been loaded from
   /// external storage.
-  unsigned ControllingMacroID = 0;
+  uint64_t ControllingMacroID = 0;
 
   /// If this file has a \#ifndef XXX (or equivalent) guard that
   /// protects the entire contents of the file, this is the identifier
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h 
b/clang/include/clang/Serialization/ASTBitCodes.h
index 772452e3afc55..1fd482b5aff0e 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -59,7 +59,7 @@ const unsigned VERSION_MINOR = 1;
 ///
 /// The ID numbers of identifiers are consecutive (in order of discovery)
 /// and start at 1. 0 is reserved for NULL.
-using IdentifierID = uint32_t;
+using IdentifierID = uint64_t;
 
 /// The number of predefined identifier IDs.
 const unsigned int NUM_PREDEF_IDENT_IDS = 1;
diff --git a/clang/include/clang/Serialization/ASTReader.h 
b/clang/include/clang/Serialization/ASTReader.h
index d028d52fc5ef1..1da9123280f26 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -657,14 +657,6 @@ class ASTReader
   /// been loaded.
   std::vector IdentifiersLoaded;
 
-  using GlobalIdentifierMapType =
-  ContinuousRangeMap;
-
-  /// Mapping from global identifier IDs to the module in which the
-  /// identifier resides along with the offset that should be added to the
-  /// global identifier ID to produce a local ID.
-  GlobalIdentifierMapType GlobalIdentifierMap;
-
   /// A vector containing macros that have already been
   /// loaded.
   ///
@@ -1536,6 +1528,11 @@ class ASTReader
   /// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
   unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;
 
+  /// Translate an \param IdentifierID ID to the index of IdentifiersLoaded
+  /// array and the corresponding module file.
+  std::pair
+  translateIdentifierIDToIndex(serialization::IdentifierID ID) const;
+
 public:
   /// Load the AST file and validate its contents against the given
   /// Preprocessor.
@@ -2120,7 +2117,7 @@ class ASTReader
   /// Load a selector from disk, registering its ID if it exists.
   void LoadSelector(Selector Sel);
 
-  void SetIdentifierInfo(unsigned ID, IdentifierInfo *II);
+  void SetIdentifierInfo(serialization::IdentifierID ID, IdentifierInfo *II);
   void SetGloballyVisibleDecls(IdentifierInfo *II,
const SmallVectorImpl ,
SmallVectorImpl *Decls = nullptr);
@@ -2145,10 +2142,10 @@ class ASTReader
 return DecodeIdentifierInfo(ID);
   }
 
-  IdentifierInfo *getLocalIdentifier(ModuleFile , unsigned LocalID);
+  IdentifierInfo *getLocalIdentifier(ModuleFile , uint64_t LocalID);
 
   serialization::IdentifierID getGlobalIdentifierID(ModuleFile ,
-unsigned LocalID);
+

[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-14 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)


Changes

Following of https://github.com/llvm/llvm-project/pull/92083

The motivation is still cutting of the unnecessary change in the dependency 
chain. See the above link (recursively) for details.

After this patch, (and the above patch), we can already do something pretty 
interesting. For example,

 Motivation example

```

//--- m-partA.cppm
export module m:partA;

export inline int getA() {
return 43;
}

export class A {
public:
int getMem();
};

export template typename T
class ATempl {
public:
T getT();
};

//--- m-partA.v1.cppm
export module m:partA;

export inline int getA() {
return 43;
}

// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
return 88;
}

export class A {
public:
int getMem();
// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to 
be
// not changed.
int getMem2();
};

export template typename T
class ATempl {
public:
T getT();
// Add a new declaration without introducing a new type.
T getT2();
};

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
return getB();
}
```

In this example, module `m` exports two partitions `:partA` and `:partB`. And a 
consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope 
the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we 
can make it if the change of `:partA` doesn't introduce new types. (And we can 
get rid of this if we make no-transitive-type-change).

As the example shows, when we change the implementation of `:partA` from 
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` 
at the global namespace, add a new member function `getMem2()` to class `A` and 
add a new member function to  `getT2()` to class template `ATempl`. And since 
`:partA` is not used by `useBOnly` completely, the BMI of  `useBOnly`  won't 
change after we made above changes.

 Design details

Method used in this patch is similar with 
https://github.com/llvm/llvm-project/pull/92083 and 
https://github.com/llvm/llvm-project/pull/86912. It extends the 32 bit 
IdentifierID to 64 bits and use the higher 32 bits to store the module file 
index. So that the encoding of the identifier won't get affected by other 
modules.

 Overhead

Similar with https://github.com/llvm/llvm-project/pull/92083 and 
https://github.com/llvm/llvm-project/pull/86912. The change is only expected to 
increase the size of the on-disk .pcm files and not affect the compile-time 
performances. And from my experiment, the size of the on-disk change only 
increase 1%+ and observe no compile-time impacts.

 Future Plans

I'll try to do the same thing for type ids. IIRC, it won't change the 
dependency graph if we add a new type in an unused units. I do think this is a 
significant win. And this will be a pretty good answer to "why modules are 
better than headers."




---

Patch is 27.17 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/92085.diff


10 Files Affected:

- (modified) clang/include/clang/Lex/ExternalPreprocessorSource.h (+1-1) 
- (modified) clang/include/clang/Lex/HeaderSearch.h (+1-1) 
- (modified) clang/include/clang/Serialization/ASTBitCodes.h (+1-1) 
- (modified) clang/include/clang/Serialization/ASTReader.h (+8-11) 
- (modified) clang/include/clang/Serialization/ModuleFile.h (-3) 
- (modified) clang/lib/Serialization/ASTReader.cpp (+51-45) 
- (modified) clang/lib/Serialization/ASTWriter.cpp (+40-21) 
- (modified) clang/lib/Serialization/GlobalModuleIndex.cpp (+2-1) 
- (modified) clang/lib/Serialization/ModuleFile.cpp (-1) 
- (added) clang/test/Modules/no-transitive-identifier-change.cppm (+102) 


``diff
diff --git a/clang/include/clang/Lex/ExternalPreprocessorSource.h 
b/clang/include/clang/Lex/ExternalPreprocessorSource.h
index 6775841860373..bd5c11a4577f5 100644
--- a/clang/include/clang/Lex/ExternalPreprocessorSource.h
+++ b/clang/include/clang/Lex/ExternalPreprocessorSource.h
@@ -36,7 +36,7 @@ class ExternalPreprocessorSource {
   /// Return the identifier associated with the given ID number.
   ///
   /// The ID 0 is associated with the NULL identifier.
-  virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;
+  virtual IdentifierInfo *GetIdentifier(uint64_t ID) = 0;
 
   /// Map a module ID to a module.
   virtual Module *getModule(unsigned ModuleID) = 0;
diff --git a/clang/include/clang/Lex/HeaderSearch.h 
b/clang/include/clang/Lex/HeaderSearch.h
index 5ac634d4e..cb75dd429c448 100644

[llvm-branch-commits] [clang] [Serialization] No transitive identifier change (PR #92085)

2024-05-14 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 created 
https://github.com/llvm/llvm-project/pull/92085

Following of https://github.com/llvm/llvm-project/pull/92083

The motivation is still cutting of the unnecessary change in the dependency 
chain. See the above link (recursively) for details.

After this patch, (and the above patch), we can already do something pretty 
interesting. For example,

 Motivation example

```

//--- m-partA.cppm
export module m:partA;

export inline int getA() {
return 43;
}

export class A {
public:
int getMem();
};

export template 
class ATempl {
public:
T getT();
};

//--- m-partA.v1.cppm
export module m:partA;

export inline int getA() {
return 43;
}

// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
return 88;
}

export class A {
public:
int getMem();
// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to 
be
// not changed.
int getMem2();
};

export template 
class ATempl {
public:
T getT();
// Add a new declaration without introducing a new type.
T getT2();
};

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
return getB();
}
```

In this example, module `m` exports two partitions `:partA` and `:partB`. And a 
consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope 
the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we 
can make it if the change of `:partA` doesn't introduce new types. (And we can 
get rid of this if we make no-transitive-type-change).

As the example shows, when we change the implementation of `:partA` from 
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` 
at the global namespace, add a new member function `getMem2()` to class `A` and 
add a new member function to  `getT2()` to class template `ATempl`. And since 
`:partA` is not used by `useBOnly` completely, the BMI of  `useBOnly`  won't 
change after we made above changes.

 Design details

Method used in this patch is similar with 
https://github.com/llvm/llvm-project/pull/92083 and 
https://github.com/llvm/llvm-project/pull/86912. It extends the 32 bit 
IdentifierID to 64 bits and use the higher 32 bits to store the module file 
index. So that the encoding of the identifier won't get affected by other 
modules.

 Overhead

Similar with https://github.com/llvm/llvm-project/pull/92083 and 
https://github.com/llvm/llvm-project/pull/86912. The change is only expected to 
increase the size of the on-disk .pcm files and not affect the compile-time 
performances. And from my experiment, the size of the on-disk change only 
increase 1%+ and observe no compile-time impacts.

 Future Plans

I'll try to do the same thing for type ids. IIRC, it won't change the 
dependency graph if we add a new type in an unused units. I do think this is a 
significant win. And this will be a pretty good answer to "why modules are 
better than headers."




>From d32a410d4847333870dff5cefb0b93c0d7facc80 Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Tue, 14 May 2024 15:33:12 +0800
Subject: [PATCH] [Serialization] No transitive identifier change

---
 .../clang/Lex/ExternalPreprocessorSource.h|   2 +-
 clang/include/clang/Lex/HeaderSearch.h|   2 +-
 .../include/clang/Serialization/ASTBitCodes.h |   2 +-
 clang/include/clang/Serialization/ASTReader.h |  19 ++--
 .../include/clang/Serialization/ModuleFile.h  |   3 -
 clang/lib/Serialization/ASTReader.cpp |  96 +
 clang/lib/Serialization/ASTWriter.cpp |  61 +++
 clang/lib/Serialization/GlobalModuleIndex.cpp |   3 +-
 clang/lib/Serialization/ModuleFile.cpp|   1 -
 .../no-transitive-identifier-change.cppm  | 102 ++
 10 files changed, 206 insertions(+), 85 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-identifier-change.cppm

diff --git a/clang/include/clang/Lex/ExternalPreprocessorSource.h 
b/clang/include/clang/Lex/ExternalPreprocessorSource.h
index 6775841860373..bd5c11a4577f5 100644
--- a/clang/include/clang/Lex/ExternalPreprocessorSource.h
+++ b/clang/include/clang/Lex/ExternalPreprocessorSource.h
@@ -36,7 +36,7 @@ class ExternalPreprocessorSource {
   /// Return the identifier associated with the given ID number.
   ///
   /// The ID 0 is associated with the NULL identifier.
-  virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;
+  virtual IdentifierInfo *GetIdentifier(uint64_t ID) = 0;
 
   /// Map a module ID to a module.
   virtual Module *getModule(unsigned ModuleID) = 0;
diff --git a/clang/include/clang/Lex/HeaderSearch.h