[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-07-04 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

I finally got around to testing this patch on compiles of explicit modules. The 
number of elapsed CPU cycles decreased (by up to 1.8% for large modules) and 
the cumulative size of PCM files decreased by 2% (though small modules are a 
few bytes larger). These findings give me confidence to land this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-07-04 Thread Jan Svoboda via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc68ba12abf49: [clang][modules] Mark fewer identifiers as 
out-of-date (authored by jansvoboda11).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp

Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -3573,14 +3573,16 @@
 // the mapping from persistent IDs to strings.
 Writer.SetIdentifierOffset(II, Out.tell());
 
+auto MacroOffset = Writer.getMacroDirectivesOffset(II);
+
 // Emit the offset of the key/data length information to the interesting
 // identifiers table if necessary.
-if (InterestingIdentifierOffsets && isInterestingIdentifier(II))
+if (InterestingIdentifierOffsets &&
+isInterestingIdentifier(II, MacroOffset))
   InterestingIdentifierOffsets->push_back(Out.tell());
 
 unsigned KeyLen = II->getLength() + 1;
 unsigned DataLen = 4; // 4 bytes for the persistent ID << 1
-auto MacroOffset = Writer.getMacroDirectivesOffset(II);
 if (isInterestingIdentifier(II, MacroOffset)) {
   DataLen += 2; // 2 bytes for builtin ID
   DataLen += 2; // 2 bytes for flags
@@ -3659,9 +3661,8 @@
   // strings.
   {
 llvm::OnDiskChainedHashTableGenerator Generator;
-ASTIdentifierTableTrait Trait(
-*this, PP, IdResolver, IsModule,
-(getLangOpts().CPlusPlus && IsModule) ? &InterestingIdents : nullptr);
+ASTIdentifierTableTrait Trait(*this, PP, IdResolver, IsModule,
+  IsModule ? &InterestingIdents : nullptr);
 
 // Look for any identifiers that were named while processing the
 // headers, but are otherwise not needed. We add these to the hash
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -4386,27 +4386,56 @@
 if (F.OriginalSourceFileID.isValid())
   F.OriginalSourceFileID = TranslateFileID(F, F.OriginalSourceFileID);
 
-// Preload all the pending interesting identifiers by marking them out of
-// date.
 for (auto Offset : F.PreloadIdentifierOffsets) {
   const unsigned char *Data = F.IdentifierTableData + Offset;
 
   ASTIdentifierLookupTrait Trait(*this, F);
   auto KeyDataLen = Trait.ReadKeyDataLength(Data);
   auto Key = Trait.ReadKey(Data, KeyDataLen.first);
-  auto &II = PP.getIdentifierTable().getOwn(Key);
-  II.setOutOfDate(true);
+
+  IdentifierInfo *II;
+  if (!PP.getLangOpts().CPlusPlus) {
+// Identifiers present in both the module file and the importing
+// instance are marked out-of-date so that they can be deserialized
+// on next use via ASTReader::updateOutOfDateIdentifier().
+// Identifiers present in the module file but not in the importing
+// instance are ignored for now, preventing growth of the identifier
+// table. They will be deserialized on first use via ASTReader::get().
+auto It = PP.getIdentifierTable().find(Key);
+if (It == PP.getIdentifierTable().end())
+  continue;
+II = It->second;
+  } else {
+// With C++ modules, not many identifiers are considered interesting.
+// All identifiers in the module file can be placed into the identifier
+// table of the importing instance and marked as out-of-date. This makes
+// ASTReader::get() a no-op, and deserialization will take place on
+// first/next use via ASTReader::updateOutOfDateIdentifier().
+II = &PP.getIdentifierTable().getOwn(Key);
+  }
+
+  II->setOutOfDate(true);
 
   // Mark this identifier as being from an AST file so that we can track
   // whether we need to serialize it.
-  markIdentifierFromAST(*this, II);
+  markIdentifierFromAST(*this, *II);
 
   // Associate the ID with the identifier so that the writer can reuse it.
   auto ID = Trait.ReadIdentifierID(Data + KeyDataLen.first);
-  SetIdentifierInfo(ID, &II);
+  SetIdentifierInfo(ID, II);
 }
   }
 
+  // Builtins and library builtins have already been initialized. Mark all
+  // identifiers as out-of-date, so that they are deserialized on first use.
+  if (Type == MK_PCH || Type == MK_Preamble || Type == MK_MainFile)
+for (auto &Id : PP.getIdentifierTable())
+  Id.second->setOutOfDate(true);
+
+  // Mark selectors as out of date.
+  for (const auto &Sel : SelectorGeneration)
+SelectorOutOfDate[Sel.first] = true;
+
   // Setup the import locations and notify the module manager that we've
   // commit

[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-06-06 Thread Michael Spencer via Phabricator via cfe-commits
Bigcheese accepted this revision.
Bigcheese added a comment.

Looks good, although I'm not an expert on this bit either.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-06-01 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a project: clang-modules.
jansvoboda11 added a comment.

Thanks, Ben. Maybe someone subscribed to `clang-modules` could take a look?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-06-01 Thread Ben Langmuir via Phabricator via cfe-commits
benlangmuir accepted this revision.
benlangmuir added a comment.
This revision is now accepted and ready to land.

LGTM, but I would prefer at least one more person who understands the 
identifier handling code review this. It's been years since I thought about 
this code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-06-01 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 updated this revision to Diff 527634.
jansvoboda11 added a comment.

Rebase and switch back to `IdentifierTable::getOwn()`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp

Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -3546,14 +3546,16 @@
 // the mapping from persistent IDs to strings.
 Writer.SetIdentifierOffset(II, Out.tell());
 
+auto MacroOffset = Writer.getMacroDirectivesOffset(II);
+
 // Emit the offset of the key/data length information to the interesting
 // identifiers table if necessary.
-if (InterestingIdentifierOffsets && isInterestingIdentifier(II))
+if (InterestingIdentifierOffsets &&
+isInterestingIdentifier(II, MacroOffset))
   InterestingIdentifierOffsets->push_back(Out.tell());
 
 unsigned KeyLen = II->getLength() + 1;
 unsigned DataLen = 4; // 4 bytes for the persistent ID << 1
-auto MacroOffset = Writer.getMacroDirectivesOffset(II);
 if (isInterestingIdentifier(II, MacroOffset)) {
   DataLen += 2; // 2 bytes for builtin ID
   DataLen += 2; // 2 bytes for flags
@@ -3632,9 +3634,8 @@
   // strings.
   {
 llvm::OnDiskChainedHashTableGenerator Generator;
-ASTIdentifierTableTrait Trait(
-*this, PP, IdResolver, IsModule,
-(getLangOpts().CPlusPlus && IsModule) ? &InterestingIdents : nullptr);
+ASTIdentifierTableTrait Trait(*this, PP, IdResolver, IsModule,
+  IsModule ? &InterestingIdents : nullptr);
 
 // Look for any identifiers that were named while processing the
 // headers, but are otherwise not needed. We add these to the hash
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -4386,27 +4386,56 @@
 if (F.OriginalSourceFileID.isValid())
   F.OriginalSourceFileID = TranslateFileID(F, F.OriginalSourceFileID);
 
-// Preload all the pending interesting identifiers by marking them out of
-// date.
 for (auto Offset : F.PreloadIdentifierOffsets) {
   const unsigned char *Data = F.IdentifierTableData + Offset;
 
   ASTIdentifierLookupTrait Trait(*this, F);
   auto KeyDataLen = Trait.ReadKeyDataLength(Data);
   auto Key = Trait.ReadKey(Data, KeyDataLen.first);
-  auto &II = PP.getIdentifierTable().getOwn(Key);
-  II.setOutOfDate(true);
+
+  IdentifierInfo *II;
+  if (!PP.getLangOpts().CPlusPlus) {
+// Identifiers present in both the module file and the importing
+// instance are marked out-of-date so that they can be deserialized
+// on next use via ASTReader::updateOutOfDateIdentifier().
+// Identifiers present in the module file but not in the importing
+// instance are ignored for now, preventing growth of the identifier
+// table. They will be deserialized on first use via ASTReader::get().
+auto It = PP.getIdentifierTable().find(Key);
+if (It == PP.getIdentifierTable().end())
+  continue;
+II = It->second;
+  } else {
+// With C++ modules, not many identifiers are considered interesting.
+// All identifiers in the module file can be placed into the identifier
+// table of the importing instance and marked as out-of-date. This makes
+// ASTReader::get() a no-op, and deserialization will take place on
+// first/next use via ASTReader::updateOutOfDateIdentifier().
+II = &PP.getIdentifierTable().getOwn(Key);
+  }
+
+  II->setOutOfDate(true);
 
   // Mark this identifier as being from an AST file so that we can track
   // whether we need to serialize it.
-  markIdentifierFromAST(*this, II);
+  markIdentifierFromAST(*this, *II);
 
   // Associate the ID with the identifier so that the writer can reuse it.
   auto ID = Trait.ReadIdentifierID(Data + KeyDataLen.first);
-  SetIdentifierInfo(ID, &II);
+  SetIdentifierInfo(ID, II);
 }
   }
 
+  // Builtins and library builtins have already been initialized. Mark all
+  // identifiers as out-of-date, so that they are deserialized on first use.
+  if (Type == MK_PCH || Type == MK_Preamble || Type == MK_MainFile)
+for (auto &Id : PP.getIdentifierTable())
+  Id.second->setOutOfDate(true);
+
+  // Mark selectors as out of date.
+  for (const auto &Sel : SelectorGeneration)
+SelectorOutOfDate[Sel.first] = true;
+
   // Setup the import locations and notify the module manager that we've
   // committed to these module files.
   for (ImportedModule &M : Loaded) {
@@ -4424,25 +4453,6 @@
   F.ImportLoc = T

[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-05-31 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:4413
+// first/next use via ASTReader::updateOutOfDateIdentifier().
+II = &PP.getIdentifierTable().get(Key);
+  }

benlangmuir wrote:
> Why did this change from `getOwn` to `get`?
That's a fluke on my part, this should've remained `getOwn()`. These don't 
differ very much for C++ modules, but `getOwn()` is the intended functionality 
here for sure. I'll fix that in next revision.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-05-31 Thread Ben Langmuir via Phabricator via cfe-commits
benlangmuir added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:4413
+// first/next use via ASTReader::updateOutOfDateIdentifier().
+II = &PP.getIdentifierTable().get(Key);
+  }

Why did this change from `getOwn` to `get`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-05-30 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151277/new/

https://reviews.llvm.org/D151277

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151277: [clang][modules] Mark fewer identifiers as out-of-date

2023-05-23 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 created this revision.
jansvoboda11 added reviewers: Bigcheese, benlangmuir.
Herald added a subscriber: ributzka.
Herald added a project: All.
jansvoboda11 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

In `clang-scan-deps` contexts, the number of interesting identifiers in PCM 
files is fairly low (only macros), while the number of identifiers in the 
importing instance is high (builtins). Marking the whole identifier table 
out-of-date triggers lots of benign and expensive calls to 
`ASTReader::updateOutOfDateIdentifiers()`. (That unfortunately happens even for 
unused identifiers due to `SemaRef.IdResolver.begin(II)` line in 
`ASTWriter::WriteASTCore()`.)

This patch makes the main code path more similar to C++ modules, where the PCM 
files have `INTERESTING_IDENTIFIERS` section which lists identifiers that get 
created in the identifier table of the importing instance and marked as 
out-of-date. The only difference is that the main code path doesn't *create* 
identifiers in the table and relies on the importing instance calling 
`ASTReader::get()` when creating new identifier on-demand. It only marks 
existing identifiers as out-of-date.

This speeds up `clang-scan-deps` by 5-10%. Impact on explicitly built modules 
TBD.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D151277

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp

Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -3544,14 +3544,16 @@
 // the mapping from persistent IDs to strings.
 Writer.SetIdentifierOffset(II, Out.tell());
 
+auto MacroOffset = Writer.getMacroDirectivesOffset(II);
+
 // Emit the offset of the key/data length information to the interesting
 // identifiers table if necessary.
-if (InterestingIdentifierOffsets && isInterestingIdentifier(II))
+if (InterestingIdentifierOffsets &&
+isInterestingIdentifier(II, MacroOffset))
   InterestingIdentifierOffsets->push_back(Out.tell());
 
 unsigned KeyLen = II->getLength() + 1;
 unsigned DataLen = 4; // 4 bytes for the persistent ID << 1
-auto MacroOffset = Writer.getMacroDirectivesOffset(II);
 if (isInterestingIdentifier(II, MacroOffset)) {
   DataLen += 2; // 2 bytes for builtin ID
   DataLen += 2; // 2 bytes for flags
@@ -3630,9 +3632,8 @@
   // strings.
   {
 llvm::OnDiskChainedHashTableGenerator Generator;
-ASTIdentifierTableTrait Trait(
-*this, PP, IdResolver, IsModule,
-(getLangOpts().CPlusPlus && IsModule) ? &InterestingIdents : nullptr);
+ASTIdentifierTableTrait Trait(*this, PP, IdResolver, IsModule,
+  IsModule ? &InterestingIdents : nullptr);
 
 // Look for any identifiers that were named while processing the
 // headers, but are otherwise not needed. We add these to the hash
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -4385,27 +4385,56 @@
 if (F.OriginalSourceFileID.isValid())
   F.OriginalSourceFileID = TranslateFileID(F, F.OriginalSourceFileID);
 
-// Preload all the pending interesting identifiers by marking them out of
-// date.
 for (auto Offset : F.PreloadIdentifierOffsets) {
   const unsigned char *Data = F.IdentifierTableData + Offset;
 
   ASTIdentifierLookupTrait Trait(*this, F);
   auto KeyDataLen = Trait.ReadKeyDataLength(Data);
   auto Key = Trait.ReadKey(Data, KeyDataLen.first);
-  auto &II = PP.getIdentifierTable().getOwn(Key);
-  II.setOutOfDate(true);
+
+  IdentifierInfo *II;
+  if (!PP.getLangOpts().CPlusPlus) {
+// Identifiers present in both the module file and the importing
+// instance are marked out-of-date so that they can be deserialized
+// on next use via ASTReader::updateOutOfDateIdentifier().
+// Identifiers present in the module file but not in the importing
+// instance are ignored for now, preventing growth of the identifier
+// table. They will be deserialized on first use via ASTReader::get().
+auto It = PP.getIdentifierTable().find(Key);
+if (It == PP.getIdentifierTable().end())
+  continue;
+II = It->second;
+  } else {
+// With C++ modules, not many identifiers are considered interesting.
+// All identifiers in the module file can be placed into the identifier
+// table of the importing instance and marked as out-of-date. This makes
+// ASTReader::get() a no-op, and deserialization will take place on
+// first/next use via ASTReader::updateOutOfDateIdentifier().
+II = &PP.getIdentifierTable().get(Key);