jansvoboda11 added inline comments.
================ Comment at: clang/lib/Basic/FileManager.cpp:625-627 + Entries.push_back( + FileEntryRef(*reinterpret_cast<const FileEntryRef::MapEntry *>( + Value->V.get<const void *>()))); ---------------- rmaz wrote: > jansvoboda11 wrote: > > Why is this necessary? IIUC, `FileEntryRef` has this logic in > > `getBaseMapEntry()`. I would expect `FileEntryRef(Entry)` to be correct > > regardless of what's in `Value->V`. > I borrowed this from: > https://github.com/llvm/llvm-project/blob/main/clang/lib/Basic/FileManager.cpp#L399-L408 > > I assumed this was necessary from @benlangmuir 's comment on D142780: > > > Also, it will never give you a vfs mapped path since it's skipping those > > (V.dyn_cast<FileEntry *>()). > > Are you saying we should just do something like: > ``` > for (const auto &Entry : SeenFileEntries) { > if (llvm::ErrorOr<FileEntryRef::MapValue> Value = Entry.getValue()) > Entries.push_back(FileEntryRef(Entry)); > } > ``` > ? Oh, okay, so the goal is to skip VFS-mapped file entries and only serialize the on-disk ones into PCMs. That makes sense given we do actual `FileEntry` comparison in `HeaderFileInfoTrait::EqualKey()` (instead of plain comparison of filenames that I assumed). I think that for VFS-mapped entries, we always have a separate element in the `SeenFileEntries` map that represents the on-disk file entry. So we should be skipping the VFS-mapped entries entirely instead of considering their remapping target entries again. If my understanding is correct, we should do something like this: ``` if (llvm::ErrorOr<FileEntryRef::MapValue> Value = Entry.getValue()) // Ignore VFS-mapped entries. if (Value->V.dyn_cast<FileEntry *>()) Entries.push_back(FileEntryRef(Entry)); ``` ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:176 + for (auto &File : FileEntries) { const HeaderFileInfo *HFI = ---------------- rmaz wrote: > jansvoboda11 wrote: > > I'm afraid that iterating over file entries in "non-deterministic" order > > can cause non-determinism down the line. For example, calling > > `HeaderSearch::findAllModulesForHeader()` in the loop can trigger > > deserialization of `HeaderFileInfo` from loaded PCMs. That code fills the > > `Module::Headers` vectors, which we serialize without sorting first. > Do you think we should move the sorting into the `getSeenFileEntries()` > method? I think so, yes. ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:1939 // Massage the file path into an appropriate form. StringRef Filename = File->getName(); SmallString<128> FilenameTmp(Filename); ---------------- rmaz wrote: > jansvoboda11 wrote: > > This will insert duplicate entries (with the same key) into the on-disk > > hash table. I don't know if that's problematic, just thought I'd call it > > out. > Not sure I follow, why would we have `FileEntry`s with duplicate names? Sorry, I had assumed we only do lookups in the on-disk hash table based on filenames (instead of full `FileEntry` comparison) and that for that reason, we should serialize all `FileEntryRef` into the PCM, even the VFS-mapped ones. But I now see that you're trying to only serialize the on-disk `FileEntryRef`s, which will not cause duplication. Ignore my comment here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D143414/new/ https://reviews.llvm.org/D143414 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits