dexonsmith added a comment.

In D135220#3862058 <https://reviews.llvm.org/D135220#3862058>, @hans wrote:

> The build system folks replied saying they're not using symlinks, but hard 
> links for compiler inputs. Dropping the `-s` flag in the repro above 
> demonstrates this.

I think this only happened to work before. If `/tmp/a.cc` changes to these file 
contents:

  int foo_table[] = {
    #include "/tmp/foo.h"
  };
  int bar_table[] = {
    #include "/tmp/bar.h"
  };
  int foo2_table[] = {
    #include "/tmp/foo.h"
  };

then I'm getting the following with my system clang:

  # 1 "/tmp/a.cc"
  # 1 "<built-in>" 1
  # 1 "<built-in>" 3
  # 414 "<built-in>" 3
  # 1 "<command line>" 1
  # 1 "<built-in>" 2
  # 1 "/tmp/a.cc" 2
   int foo_table[] = {
  # 1 "/tmp/foo.h" 1
  1, 2, 3
  # 3 "/tmp/a.cc" 2
   };
   int bar_table[] = {
  # 1 "/tmp/bar.h" 1
  1, 2, 3
  # 6 "/tmp/a.cc" 2
   };
   int foo2_table[] = {
  # 1 "/tmp/bar.h" 1
  1, 2, 3
  # 9 "/tmp/a.cc" 2
   };

Note that `foo2_table` now points at `bar.h` but should point at `foo.h`. I 
presume after the present patch it'll point at `foo.h` (although I admit I 
didn't test).

The root cause is the same, and matches @benlangmuir's analysis in response to 
the testcase failure on Windows:

- The FileManager is merging files based on their observed inode (regardless of 
symlink or hardlink).
- Previously, filenames were reported based on the most recent access path: 
"last-ref".
- Now, they're reported based on the original access path: "first-ref".
- In neither case is it sound.

Here are a few solutions I can see:

1. Change FileManager to only merge files with matching "realpath", seeing 
through symlinks but not hardlinks (could use observed inode collisions as a 
hint to do an `lstat` to avoid running an `lstat` every time). Then 
SourceManager will give a different FileID to these files.
2. Change SourceManager to have different FileID to these files, even though 
they were merged by FileManager.
3. Change the `#include` stack to track the accessed filename (the 
`FileEntryRef`) in addition to the SLoc, even though they have the same FileID.
4. Require clients to keep contents distinct if they it matters (the 
(surprising) status quo).

What's in tree (and has been for a long time) is (4). (1) seems like the right 
solution to me.

As a heuristic on top of (4), "last-ref" (before this patch) probably gets the 
answer the user expects more often than "first-ref" (this patch) does, which I 
assume is why it was originally implemented. However, it's unpredictable and 
requires mutating a bunch of state where mutations are hard to reason about.

IMO, "first-ref" is easier to make sound, since it's an immutable property, so 
this patch is an incremental improvement; it makes the fuzzy modeling easier to 
observe but also perhaps more predictable. If we care about fixing hardlink 
include names ("correct-ref") we should fix them in all cases (ideally with 
(1)), but I wonder how urgent it is.

@hans, WDYT?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135220/new/

https://reviews.llvm.org/D135220

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to