[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGdcd3a0c9f13b: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile`… (authored by jansvoboda11). Changed prior to commit: https://reviews.llvm.org/D157066?vs=547097=548666#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 Files: clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/ModuleFile.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp Index: clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp === --- clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp +++ clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp @@ -459,18 +459,19 @@ serialization::ModuleFile *MF = MDC.ScanInstance.getASTReader()->getModuleManager().lookup( M->getASTFile()); - MDC.ScanInstance.getASTReader()->visitInputFiles( - *MF, true, true, [&](const serialization::InputFile , bool isSystem) { + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { // __inferred_module.map is the result of the way in which an implicit // module build handles inferred modules. It adds an overlay VFS with // this file in the proper directory and relies on the rest of Clang to // handle it like normal. With explicitly built modules we don't need // to play VFS tricks, so replace it with the correct module map. -if (IF.getFile()->getName().endswith("__inferred_module.map")) { +if (StringRef(IFI.Filename).endswith("__inferred_module.map")) { MDC.addFileDep(MD, ModuleMap->getName()); return; } -MDC.addFileDep(MD, IF.getFile()->getName()); +MDC.addFileDep(MD, IFI.Filename); }); llvm::DenseSet SeenDeps; @@ -478,11 +479,15 @@ addAllSubmoduleDeps(M, MD, SeenDeps); addAllAffectingClangModules(M, MD, SeenDeps); - MDC.ScanInstance.getASTReader()->visitTopLevelModuleMaps( - *MF, [&](FileEntryRef FE) { -if (FE.getNameAsRequested().endswith("__inferred_module.map")) + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { +if (!(IFI.TopLevel && IFI.ModuleMap)) return; -MD.ModuleMapFileDeps.emplace_back(FE.getNameAsRequested()); +if (StringRef(IFI.FilenameAsRequested) +.endswith("__inferred_module.map")) + return; +MD.ModuleMapFileDeps.emplace_back(IFI.FilenameAsRequested); }); CompilerInvocation CI = MDC.makeInvocationForModuleBuildWithoutOutputs( Index: clang/lib/Serialization/ASTWriter.cpp === --- clang/lib/Serialization/ASTWriter.cpp +++ clang/lib/Serialization/ASTWriter.cpp @@ -1525,7 +1525,8 @@ bool IsSystemFile; bool IsTransient; bool BufferOverridden; - bool IsTopLevelModuleMap; + bool IsTopLevel; + bool IsModuleMap; uint32_t ContentHash[2]; InputFileEntry(FileEntryRef File) : File(File) {} @@ -1547,8 +1548,10 @@ IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 32)); // Modification time IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Overridden IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Transient + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Top-level IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Module map - IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // File name + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 16)); // Name as req. len + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Name as req. + name unsigned IFAbbrevCode = Stream.EmitAbbrev(std::move(IFAbbrev)); // Create input file hash abbreviation. @@ -1582,8 +1585,8 @@ Entry.IsSystemFile = isSystem(File.getFileCharacteristic()); Entry.IsTransient = Cache->IsTransient; Entry.BufferOverridden = Cache->BufferOverridden; -Entry.IsTopLevelModuleMap = isModuleMap(File.getFileCharacteristic()) && -File.getIncludeLoc().isInvalid(); +Entry.IsTopLevel = File.getIncludeLoc().isInvalid(); +Entry.IsModuleMap = isModuleMap(File.getFileCharacteristic()); auto ContentHash = hash_code(-1); if (PP->getHeaderSearchInfo() @@ -1631,6 +1634,15 @@ // Emit size/modification time for this file. // And whether this file was overridden. { + SmallString<128>
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
jansvoboda11 added a comment. Landing with one more use of `Filename` converted to `FilenameAsRequested` (in call to `Listener.visitInputFile()`). The only remaining usages of `Filename` is now in the scanner (intentional) and in `ASTReader` when deciding whether an `InputFileInfo` has already been deserialized or not (default-initialized `InputFileInfo` has an empty `Filename`). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
vsapsai accepted this revision. vsapsai added a comment. The existing `FilenameAsRequested` usage looks good to me. Unfortunately, don't know how to make sure all places where `FilenameAsRequested` is required instead of `Filename` are updated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
jansvoboda11 added inline comments. Comment at: clang/lib/Serialization/ASTReader.cpp:2411 bool Transient = FI.Transient; - StringRef Filename = FI.Filename; + StringRef Filename = FI.FilenameAsRequested; uint64_t StoredContentHash = FI.ContentHash; jansvoboda11 wrote: > benlangmuir wrote: > > It's not clear to me why this one changed > This actually maintains the same semantics - `FI.Filename` was previously the > as-requested path, now it's the on-disk path. Without changing this line, > `FileManager::getFileRef()` would get called with the on-disk path, meaning > consumers of this function would get the incorrect path when calling > `FileEntryRef::getNameAsRequested()` on the returned file. I recall one > clang-scan-deps test failing because of it. Just remembered: in `ModuleDepCollector.cpp`, we call `ModuleMap::getModuleMapFileForUniquing()`. This calls `SourceMgr.getFileEntryRefForID()` based on `Module::DefinitionLoc`, which triggers deserialization of the associated source location entry and ends up calling this function right here. We'd end up with the on-disk module map path for modular dependencies. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
benlangmuir accepted this revision. benlangmuir added inline comments. This revision is now accepted and ready to land. Comment at: clang/include/clang/Serialization/ModuleFile.h:72 + bool TopLevel; + bool ModuleMap; }; jansvoboda11 wrote: > benlangmuir wrote: > > Is there something using this new split? It seems like every user is using > > `&&` for these > You're right. I recall needing this on (now probably abandoned) patch, so I > thought I might as well add the extra bit now, sine I'm changing the format > anyways. I'm fine with removing this. It's fine, I was just making sure I didn't miss something. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
jansvoboda11 added inline comments. Comment at: clang/include/clang/Serialization/ModuleFile.h:72 + bool TopLevel; + bool ModuleMap; }; benlangmuir wrote: > Is there something using this new split? It seems like every user is using > `&&` for these You're right. I recall needing this on (now probably abandoned) patch, so I thought I might as well add the extra bit now, sine I'm changing the format anyways. I'm fine with removing this. Comment at: clang/lib/Serialization/ASTReader.cpp:2411 bool Transient = FI.Transient; - StringRef Filename = FI.Filename; + StringRef Filename = FI.FilenameAsRequested; uint64_t StoredContentHash = FI.ContentHash; benlangmuir wrote: > It's not clear to me why this one changed This actually maintains the same semantics - `FI.Filename` was previously the as-requested path, now it's the on-disk path. Without changing this line, `FileManager::getFileRef()` would get called with the on-disk path, meaning consumers of this function would get the incorrect path when calling `FileEntryRef::getNameAsRequested()` on the returned file. I recall one clang-scan-deps test failing because of it. Comment at: clang/lib/Serialization/ASTReader.cpp:9323 +void ASTReader::visitInputFileInfos( +serialization::ModuleFile , bool IncludeSystem, benlangmuir wrote: > Can we rewrite `visitInputFiles` on top of this? I can try consolidating these in a follow-up if you're fine with that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
benlangmuir added a subscriber: vsapsai. benlangmuir added a comment. CC @vsapsai since he was also making name vs. name-as-requested change in modules Comment at: clang/include/clang/Serialization/ModuleFile.h:72 + bool TopLevel; + bool ModuleMap; }; Is there something using this new split? It seems like every user is using `&&` for these Comment at: clang/lib/Serialization/ASTReader.cpp:2411 bool Transient = FI.Transient; - StringRef Filename = FI.Filename; + StringRef Filename = FI.FilenameAsRequested; uint64_t StoredContentHash = FI.ContentHash; It's not clear to me why this one changed Comment at: clang/lib/Serialization/ASTReader.cpp:9323 +void ASTReader::visitInputFileInfos( +serialization::ModuleFile , bool IncludeSystem, Can we rewrite `visitInputFiles` on top of this? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
jansvoboda11 updated this revision to Diff 547097. jansvoboda11 added a comment. Remove leftover `std::string` constructor Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D157066/new/ https://reviews.llvm.org/D157066 Files: clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/ModuleFile.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp Index: clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp === --- clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp +++ clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp @@ -459,18 +459,19 @@ serialization::ModuleFile *MF = MDC.ScanInstance.getASTReader()->getModuleManager().lookup( M->getASTFile()); - MDC.ScanInstance.getASTReader()->visitInputFiles( - *MF, true, true, [&](const serialization::InputFile , bool isSystem) { + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { // __inferred_module.map is the result of the way in which an implicit // module build handles inferred modules. It adds an overlay VFS with // this file in the proper directory and relies on the rest of Clang to // handle it like normal. With explicitly built modules we don't need // to play VFS tricks, so replace it with the correct module map. -if (IF.getFile()->getName().endswith("__inferred_module.map")) { +if (StringRef(IFI.Filename).endswith("__inferred_module.map")) { MDC.addFileDep(MD, ModuleMap->getName()); return; } -MDC.addFileDep(MD, IF.getFile()->getName()); +MDC.addFileDep(MD, IFI.Filename); }); llvm::DenseSet SeenDeps; @@ -478,11 +479,15 @@ addAllSubmoduleDeps(M, MD, SeenDeps); addAllAffectingClangModules(M, MD, SeenDeps); - MDC.ScanInstance.getASTReader()->visitTopLevelModuleMaps( - *MF, [&](FileEntryRef FE) { -if (FE.getNameAsRequested().endswith("__inferred_module.map")) + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { +if (!(IFI.TopLevel && IFI.ModuleMap)) return; -MD.ModuleMapFileDeps.emplace_back(FE.getNameAsRequested()); +if (StringRef(IFI.FilenameAsRequested) +.endswith("__inferred_module.map")) + return; +MD.ModuleMapFileDeps.emplace_back(IFI.FilenameAsRequested); }); CompilerInvocation CI = MDC.makeInvocationForModuleBuildWithoutOutputs( Index: clang/lib/Serialization/ASTWriter.cpp === --- clang/lib/Serialization/ASTWriter.cpp +++ clang/lib/Serialization/ASTWriter.cpp @@ -1525,7 +1525,8 @@ bool IsSystemFile; bool IsTransient; bool BufferOverridden; - bool IsTopLevelModuleMap; + bool IsTopLevel; + bool IsModuleMap; uint32_t ContentHash[2]; InputFileEntry(FileEntryRef File) : File(File) {} @@ -1547,8 +1548,10 @@ IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 32)); // Modification time IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Overridden IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Transient + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Top-level IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Module map - IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // File name + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 16)); // Name as req. len + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Name as req. + name unsigned IFAbbrevCode = Stream.EmitAbbrev(std::move(IFAbbrev)); // Create input file hash abbreviation. @@ -1582,8 +1585,8 @@ Entry.IsSystemFile = isSystem(File.getFileCharacteristic()); Entry.IsTransient = Cache->IsTransient; Entry.BufferOverridden = Cache->BufferOverridden; -Entry.IsTopLevelModuleMap = isModuleMap(File.getFileCharacteristic()) && -File.getIncludeLoc().isInvalid(); +Entry.IsTopLevel = File.getIncludeLoc().isInvalid(); +Entry.IsModuleMap = isModuleMap(File.getFileCharacteristic()); auto ContentHash = hash_code(-1); if (PP->getHeaderSearchInfo() @@ -1631,6 +1634,15 @@ // Emit size/modification time for this file. // And whether this file was overridden. { + SmallString<128> NameAsRequested = Entry.File.getNameAsRequested(); + SmallString<128> Name = Entry.File.getName(); + + PreparePathForOutput(NameAsRequested); + PreparePathForOutput(Name); + + if (Name == NameAsRequested) +
[PATCH] D157066: [clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
jansvoboda11 created this revision. jansvoboda11 added a reviewer: benlangmuir. Herald added a subscriber: ributzka. Herald added a project: All. jansvoboda11 requested review of this revision. Herald added a project: clang. Herald added a subscriber: cfe-commits. The current `ASTReader::visitInputFiles()` function calls into `FileManager` to create `FileEntryRef` objects. This ends up being fairly costly in `clang-scan-deps`, where we mostly only care about file paths. This patch introduces new `ASTReader` API that gives clients access to just the serialized paths. Since the scanner needs both the as-requested path and the on-disk one (and doesn't want to transform the former into the latter via `FileManager`), this patch starts serializing both of them into the PCM file if they differ. This increases the size of scanning PCMs by 0.1% and speeds up scanning by 5%. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D157066 Files: clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/ModuleFile.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp Index: clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp === --- clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp +++ clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp @@ -459,18 +459,19 @@ serialization::ModuleFile *MF = MDC.ScanInstance.getASTReader()->getModuleManager().lookup( M->getASTFile()); - MDC.ScanInstance.getASTReader()->visitInputFiles( - *MF, true, true, [&](const serialization::InputFile , bool isSystem) { + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { // __inferred_module.map is the result of the way in which an implicit // module build handles inferred modules. It adds an overlay VFS with // this file in the proper directory and relies on the rest of Clang to // handle it like normal. With explicitly built modules we don't need // to play VFS tricks, so replace it with the correct module map. -if (IF.getFile()->getName().endswith("__inferred_module.map")) { +if (StringRef(IFI.Filename).endswith("__inferred_module.map")) { MDC.addFileDep(MD, ModuleMap->getName()); return; } -MDC.addFileDep(MD, IF.getFile()->getName()); +MDC.addFileDep(MD, IFI.Filename); }); llvm::DenseSet SeenDeps; @@ -478,11 +479,15 @@ addAllSubmoduleDeps(M, MD, SeenDeps); addAllAffectingClangModules(M, MD, SeenDeps); - MDC.ScanInstance.getASTReader()->visitTopLevelModuleMaps( - *MF, [&](FileEntryRef FE) { -if (FE.getNameAsRequested().endswith("__inferred_module.map")) + MDC.ScanInstance.getASTReader()->visitInputFileInfos( + *MF, /*IncludeSystem=*/true, + [&](const serialization::InputFileInfo , bool IsSystem) { +if (!(IFI.TopLevel && IFI.ModuleMap)) return; -MD.ModuleMapFileDeps.emplace_back(FE.getNameAsRequested()); +if (StringRef(IFI.FilenameAsRequested) +.endswith("__inferred_module.map")) + return; +MD.ModuleMapFileDeps.emplace_back(IFI.FilenameAsRequested); }); CompilerInvocation CI = MDC.makeInvocationForModuleBuildWithoutOutputs( Index: clang/lib/Serialization/ASTWriter.cpp === --- clang/lib/Serialization/ASTWriter.cpp +++ clang/lib/Serialization/ASTWriter.cpp @@ -1525,7 +1525,8 @@ bool IsSystemFile; bool IsTransient; bool BufferOverridden; - bool IsTopLevelModuleMap; + bool IsTopLevel; + bool IsModuleMap; uint32_t ContentHash[2]; InputFileEntry(FileEntryRef File) : File(File) {} @@ -1547,8 +1548,10 @@ IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 32)); // Modification time IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Overridden IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Transient + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Top-level IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Module map - IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // File name + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 16)); // Name as req. len + IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Name as req. + name unsigned IFAbbrevCode = Stream.EmitAbbrev(std::move(IFAbbrev)); // Create input file hash abbreviation. @@ -1582,8 +1585,8 @@ Entry.IsSystemFile = isSystem(File.getFileCharacteristic()); Entry.IsTransient = Cache->IsTransient; Entry.BufferOverridden = Cache->BufferOverridden; -Entry.IsTopLevelModuleMap =