[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-16 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D97850#2629464 , @ilyakuteev wrote:

> If a fix will be in ModuleManager and only for ModuleCache the problem with 
> symlinks and path will not affect it as ModuleCache is managed by clang and 
> we can rely on it.
> I agree that using `FileMgr.getBypassFile` is not the best way to solve this 
> problem, we need to replace `FileMgr.getFileRef` with some other method but I 
> did not found such method in FileManager. Maybe we need to add one or not use 
> FileManager as was mentioned. Not sure which way is better and safer.

FileManager plays two roles (unless I'm missing a third?):

- Establish an identity for multiple paths that should be treated as "the same" 
(address of FileEntry).
- Cache stat information (the content of FileEntry).

This patch is predicated on it being safe to skip the former (makes sense to 
me, although I think clang is a bit inconsistent about using relative paths for 
the module cache, so there might be some work to do). We already skip / avoid 
the latter (ModuleManager is the only caller of 
FileManager::getNoncachedStatValue). Seems like skipping the FileManager will 
simplify both the ModuleManager and the FileManager.

Might be worth an RFC on cfe-dev?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-16 Thread Ilya Kuteev via Phabricator via cfe-commits
ilyakuteev added a comment.

If a fix will be in ModuleManager and only for ModuleCache the problem with 
symlinks and path will not affect it as ModuleCache is managed by clang and we 
can rely on it.
I agree that using `FileMgr.getBypassFile` is not the best way to solve this 
problem, we need to replace `FileMgr.getFileRef` with some other method but I 
did not found such method in FileManager. Maybe we need to add one or not use 
FileManager as was mentioned. Not sure which way is better and safer.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-15 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.

Agreed that the right fix is to remove/replace the inode-based cache, but it's 
not safe to just drop it. Consider the following filesystem layout:

  /dir/file.h
  /dir/symlink.h -> file.h

the inode-based cache is what tells us that the following all point at the same 
file (given a working directory of `/dir`):

- `/dir/file.h`
- `/dir/./file.h`
- `../dir/file.h`
- `./file.h`
- `file.h`
- `symlink.h`
- `./symlink.h`
- `../dir/symlink.h`
- `/dir/symlink.h`
- `/dir/./symlink.h`

We can't drop the inode cache until there's another solution in place for this.

(One roundabout solution would be to keep this unchanged, and rely on the VFS 
to provide stable inodes. As it happens, I'm prototyping an on-disk caching 
`FileSystem` implementation that assigns stable/virtual inodes for each real 
path (hard links get distinct inodes); doesn't seem to have any overhead over 
the "real" filesystem (at least not at scale in clang-scan-deps). We could 
change the main path in Clang to use something like that, and change the PCM 
cache to use different logic (it's the only filemanager client that doesn't 
want/expect a stable view of the FS). I don't think it'd be too hard.)

Marking this as "request changes" since `getBypassFile()` is unsound and I'd 
prefer its use not be proliferated; I have some WIP to remove its current use.

Maybe there's a simpler solution though. If it's safe to use `getBypassFile()` 
every time a PCM file is opened, then we're not getting any value out of the 
FileManager. Maybe the module manager could skip the FileManager entirely...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-09 Thread Robert Widmann via Phabricator via cfe-commits
CodaFi added a comment.

I believe removing inode numbers is the correct fix, yes. The workaround I 
applied, and the one here, are both insufficient in the general case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-08 Thread Raphael Isemann via Phabricator via cfe-commits
teemperor added a comment.

It seems we already hit this issue before and decided to add a workaround: 
https://reviews.llvm.org/D86823


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-05 Thread Ilya Kuteev via Phabricator via cfe-commits
ilyakuteev added a comment.

@teemperor 's test shows the problem correctly.

In my case I am working on a dist-compilation system (similar to distcc) for 
objective-c with `-fmodules`. Our previous generation used tmpfs for module 
cache and was ephemeral (Unique temp module cache per compilation). Now we want 
to move module cache to ext4 filesystem to make it actually cache modules 
across compilations. This bug is blocking us from doing this change, so I came 
up with this draft fix.

I think that removing inode cache for PCMs may be some kind of final solution. 
(For example we can make another `FileManager.getNoncacedFileRef` method).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-04 Thread Raphael Isemann via Phabricator via cfe-commits
teemperor added a comment.

If I understand this correctly, then the inode caching logic in FileManager 
isn't just breaking PCMs but all file operations:

  TEST_F(FileManagerTest, InodeReuse) {
{
  std::ofstream myfile;
  myfile.open("a.cpp");
  myfile << "content\n";
}
llvm::ErrorOr fe1 = manager.getFile("a.cpp");
EXPECT_TRUE(fe1);
const FileEntry *f1 = *fe1;
remove("a.cpp");
{
  std::ofstream myfile;
  myfile.open("b.cpp");
  myfile << "different content\n";
}
llvm::ErrorOr fe2 = manager.getFile("b.cpp");
EXPECT_TRUE(fe2);
const FileEntry *f2 = *fe2;
EXPECT_NE(f2->getSize(), f1->getSize());
EXPECT_NE(f2->getUniqueID().getFile(), f1->getUniqueID().getFile());
  }

This fails consistently for me when running in an empty ext4 directory with:

  Expected: (f2->getSize()) != (f1->getSize()), actual: 8 vs 8
  Expected: (f2->getUniqueID().getFile()) != (f1->getUniqueID().getFile()), 
actual: 57855544 vs 57855544

I guess this wasn't considered a valid use case for the normal `#include` logic 
within Clang (which I believe is the primary beneficiary of this cache and 
doesn't really care about a changing file system). But with PCMs and Clang 
REPLs this is probably causing some strange bugs.

Anyway, I don't think I know the FileManager code well enough to come up with a 
real fix (beside just removing the inode cache).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97850/new/

https://reviews.llvm.org/D97850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97850: Fix PCM read from ModuleCache for ext4 filesystem

2021-03-03 Thread Ilya Kuteev via Phabricator via cfe-commits
ilyakuteev created this revision.
ilyakuteev requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D97850

Files:
  clang/lib/Serialization/ModuleManager.cpp


Index: clang/lib/Serialization/ModuleManager.cpp
===
--- clang/lib/Serialization/ModuleManager.cpp
+++ clang/lib/Serialization/ModuleManager.cpp
@@ -470,6 +470,12 @@
   Optional FileOrErr =
   expectedToOptional(FileMgr.getFileRef(FileName, /*OpenFile=*/true,
 /*CacheFailure=*/false));
+  if (FileOrErr) {
+// On Linux ext4 FileManager's inode caching system does not
+// provide us correct behaviour for ModuleCache directories.
+// inode can be reused after PCM delete resulting in cache misleading.
+FileOrErr = FileMgr.getBypassFile(*FileOrErr);
+  }
   if (!FileOrErr)
 return false;
 


Index: clang/lib/Serialization/ModuleManager.cpp
===
--- clang/lib/Serialization/ModuleManager.cpp
+++ clang/lib/Serialization/ModuleManager.cpp
@@ -470,6 +470,12 @@
   Optional FileOrErr =
   expectedToOptional(FileMgr.getFileRef(FileName, /*OpenFile=*/true,
 /*CacheFailure=*/false));
+  if (FileOrErr) {
+// On Linux ext4 FileManager's inode caching system does not
+// provide us correct behaviour for ModuleCache directories.
+// inode can be reused after PCM delete resulting in cache misleading.
+FileOrErr = FileMgr.getBypassFile(*FileOrErr);
+  }
   if (!FileOrErr)
 return false;
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits