> This is why I said what is the a same file if you can't rely on inodes 
> working? 

I don't have a good answer for such a case. Of course, no matter how one
approaches #pragma once there will be cases that aren't handled.

The criteria to optimize for, imo, is which has the most clear failure
mode. Contents happening match could occur naturally without realizing,
which is hard to triage. Mtimes colliding could easily happen without
realizing, which is also hard to triage and reproduce. Path issues pop
up as real build systems use links. Mtime can fail on multiple mounts,
path certainly will. In my opinion, the failure modes for contents and
mtime are very sub-ideal. Path isn't adequate, it seems clear supporting
links is an important goal.

To level-set: I don't think it's reasonable to expect #pragma once to
handle multiple distinct copies of the same file. Especially given that
contents isn't an option.

The failure mode of inodes, however, is a lot clearer. It breaks with
things like multiple mounts and filesystems that don't have inodes. The
way I see it, advice to users becomes clear since it's much clearer
exactly how and why #pragma once might break.

> Early 2000s vs now have a different landscape when it comes to file systems.

Given the landscape today, could it make sense to re-evaluate mtime + content?

Cheers
Jeremy

On Sep 6 2024, at 10:29 pm, Andrew Pinski <pins...@gmail.com> wrote:

>> On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin <jer...@rifkin.dev> wrote:
>>  
>>> Hi Andrew,
>>> Thanks for the thoughts and quick reply.
>>>  
>>>> Not always. because inodes are not always stable on some file systems.
>>>> And also does not work with multi-mounted devices too.
>>>  
>>> Unusual filesystems and multiple mounts are indeed the failing. As I
>>> mentioned, there's no silver bullet; they each have pitfalls. I do,
>>> however, think this is a less surprising failure mode than GCC's which
>>> rears its head in surprising and inconsistent cases.
>>>  
>>>> I say if the file has the same content, then it is the same file and
>>>> GCC uses that definition.
>>>  
>>> GCC doesn't use this definition, really. It's relying primarily on the
>>> mtime check and only falling back to contents in case of collision.
>>>  
>>> The point on same contents contents == same file is well received. When
>>> I wrote the first draft of my paper I wrote it proposing this, however,
>>> I have become convinced this isn't the right approach based on examples
>>> where you could intend to include two files with the same contents that
>>> actually mean different things (such as Example 1).
>>>  
>>> GCC's approach is hybrid, half relying on something from the filesystem
>>> and half relying on the contents. As far as I can tell this can lead to
>>> a worst of both worlds.
>>>  
>>>> GCC definition is the only one which supports all issues described
>>>> here dealing with inodes (sometimes being non-stable), canonical paths
>>>> and both kinds of links and even re-mounted file systems.
>>>  
>>> I'd initially been thinking of a content-based solution in order to
>>> avoid any filesystem reliance and support multiple mounts etc. The
>>> problem currently is even GCC's approach, which has the best chance of
>>> working on multiple mounts, doesn't work consistently due to potential
>>> differences in mtime resolution. 
>>>  
>>>> What does the other implementations say about changing their
>>>> definition of what "the same file is"? Have you asked clang and MSVC
>>>> folks?
>>>  
>>> I've not yet asked. If I proceed with a proposal paper what I'll most
>>> likely be proposing is what Clang does, worded in terms of same device
>>> same location. I started here since GCC's approach is least similar to
>>> that than what MSVC does. It's also easier to reach out to
>>> developers on
>>> open source projects.
>  
> Except the clang solution does not work for some file systems and is
> broken when used on them. Maybe those file systems are not in use as
> they once were and that is why clang didn't run into folks asking to
> fix it.
>  
> Early 2000s vs now have a different landscape when it comes to file
> systems. This is why I said what is the a same file if you can't rely
> on inodes working? 
>  
> Thanks,
> Andrew
>  
>  
>  
>  
>>  
>>>  
>>>  
>>> Thanks,
>>> Jeremy
>>>  
>>>  
>>> On Sep 6 2024, at 8:16 pm, Andrew Pinski <pins...@gmail.com> wrote:
>>>  
>>>> On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin <jer...@rifkin.dev> wrote:
>>>>>  
>>>>> Thanks Andrew, I appreciate the context and links. It looks like the
>>>>> prior implementation failed to handle links due to being based on file
>>>>> path, given cpp_simplify_pathname. Do you have thoughts on the use if
>>>>> device ID + inode as a way to also accommodate symbolic links and hard
>>>>> links without the fickleness of mtime?
>>>>  
>>>> Not always. because inodes are not always stable on some file systems.
>>>> And also does not work with multi-mounted devices too.
>>>> The whole definition of what is the same file is really up for
>>>> debate here.
>>>> I say if the file has the same content, then it is the same file and
>>>> GCC uses that definition. While clang says it is based on if it is the
>>>> same inode which is not always true because of file systems which
>>>> don't use an inode number. While MSVC says it is based on the path but
>>>> what is the canonical path to a file, is a hard link to the same file
>>>> the same file or not; what about symbolic links? How about overlays
>>>> and mounted directories are they the same then?
>>>> GCC definition is the only one which supports all issues described
>>>> here dealing with inodes (sometimes being non-stable), canonical paths
>>>> and both kinds of links and even re-mounted file systems.
>>>>  
>>>> What does the other implementations say about changing their
>>>> definition of what "the same file is"? Have you asked clang and MSVC
>>>> folks?
>>>> Anyways GCC has an optimization already for #ifdef/#define/#endif (and
>>>> that is documented here:
>>>> https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) so does
>>>> it make sense to really standardize `#pramga once` here or just push
>>>> other implementations to add a similar optimization instead?
>>>>  
>>>> Thanks,
>>>> Andrew Pinski
>>>>  
>>>>>  
>>>>> Cheers,
>>>>> Jeremy
>>>>>  
>>>>> On Sep 6 2024, at 12:25 am, Andrew Pinski <pins...@gmail.com> wrote:
>>>>>  
>>>>> > On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin
>>>>> <jer...@rifkin.dev> wrote:
>>>>> >>
>>>>> >> Hello,
>>>>> >>
>>>>> >> I'm looking at #pragma once behavior among the major C/C++
>>>>> compilers as
>>>>> >> part of a proposal paper for standardizing #pragma once. (This is
>>>>> >> apparently a very controversial topic)
>>>>> >>
>>>>> >> To put my question up-front: Would GCC ever be open to altering its
>>>>> >> #pragma once behavior to bring it more in-line with behavior
>>>>> from other
>>>>> >> compilers and possibly more in-line with what users expect?
>>>>> >>
>>>>> >> To elaborate more:
>>>>> >>
>>>>> >> Design decisions for #pragma once essentially boil down to a file-based
>>>>> >> definitions vs a content-based definition of "same file".
>>>>> >>
>>>>> >> A file-based definition is easier to reason about and more
>>>>> in-line with
>>>>> >> what users expect, however, distinct copies of headers can't be handled
>>>>> >> and multiple mount points are problematic.
>>>>> >>
>>>>> >> A content-based definition works for distinct copies, multiple mount
>>>>> >> points, and is completely sufficient 99% of the time, however,
>>>>> it could
>>>>> >> potentially break in hard-to-debug ways in a few notable cases (more
>>>>> >> information later).
>>>>> >>
>>>>> >> Currently the three major C/C++ compilers treat #pragma once
>>>>> very differently:
>>>>> >> - GCC uses file mtime + file contents
>>>>> >> - Clang uses inodes
>>>>> >> - MSVC uses file path
>>>>> >
>>>>> > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52566#c2 .
>>>>> > Note this was changed specifically in GCC 3.4 to fix the issue around
>>>>> > symlinks and hard links.
>>>>> > See https://gcc.gnu.org/pipermail/gcc-patches/2003-July/111203.html
>>>>> > for more information on the fixes.
>>>>> >
>>>>> > In fact `#pragma once` was deprecated before GCC 3.4 because it would
>>>>> > do incorrectly what clang and MSVC are doing and that was considered
>>>>> > wrong.
>>>>> > So GCC behavior has been this way before clang was even written.
>>>>> >
>>>>> > Thanks,
>>>>> > Andrew
>>>>> >
>>>>> >>
>>>>> >> None of the major compilers have documented their #pragma once 
>>>>> >> semantics.
>>>>> >>
>>>>> >> In practice all three of these approaches work pretty well most
>>>>> of the
>>>>> >> time (which is why people feel comfortable using #pragma once). 
>>>>> >> However,
>>>>> >> they can each break in their own ways.
>>>>> >>
>>>>> >> As mentioned earlier, clang and MSVC's file-based definitions
>>>>> of "same
>>>>> >> file" break for multiple mount points and multiple copies of
>>>>> the same
>>>>> >> header. MSVC's approach breaks for symbolic links and hard links.
>>>>> >>
>>>>> >> GCC's hybrid approach can break in surprising ways. I have three
>>>>> >> examples to share:
>>>>> >>
>>>>> >> Example 1:
>>>>> >>
>>>>> >> Consider a scenario such as:
>>>>> >>
>>>>> >> usr/
>>>>> >>   include/
>>>>> >>     library_a/
>>>>> >>       library_main.hpp
>>>>> >>       foo.hpp
>>>>> >>     library_b/
>>>>> >>       library_main.hpp
>>>>> >>       foo.hpp
>>>>> >> src/
>>>>> >>   main.cpp
>>>>> >>
>>>>> >> main.cpp:
>>>>> >> #include "library_a/library_main.hpp"
>>>>> >> #include "library_b/library_main.hpp"
>>>>> >>
>>>>> >> And both library_main.hpp's have:
>>>>> >> #pragma once
>>>>> >> #include "foo.hpp"
>>>>> >>
>>>>> >> Example 2:
>>>>> >>
>>>>> >> namespace v1 {
>>>>> >>     #include "library_v1.hpp"
>>>>> >> }
>>>>> >> namespace v2 {
>>>>> >>     #include "library_v2.hpp"
>>>>> >> }
>>>>> >>
>>>>> >> Where both library headers include their own copy of a shared header
>>>>> >> using #pragma once.
>>>>> >>
>>>>> >> Example 3:
>>>>> >>
>>>>> >> usr/
>>>>> >>   include/
>>>>> >>     library/
>>>>> >>       library.hpp
>>>>> >>       vendored-dependency.hpp
>>>>> >> src/
>>>>> >>   main.cpp
>>>>> >>   vendored-dependency.hpp
>>>>> >>
>>>>> >> main.cpp:
>>>>> >> #include "vendored-dependency.hpp"
>>>>> >> #include <library/library.hpp>
>>>>> >>
>>>>> >> library.hpp:
>>>>> >> #pragma once
>>>>> >> #include "vendored-dependency.hpp"
>>>>> >>
>>>>> >> Assuming the same contents byte-for-byte of
>>>>> vendored-dependency.hpp, and
>>>>> >> it uses #pragma once.
>>>>> >>
>>>>> >> Each of these examples are plausible scenarios where two files
>>>>> with the
>>>>> >> same contents could be #included. In each example, on GCC, the
>>>>> code can
>>>>> >> work or break based on mtime:
>>>>> >> - Example 1: Breaks if mtimes for library_main.hpp happen to be
>>>>> the same
>>>>> >> - Example 2: Breaks if mtimes for the shared dependency copies
>>>>> happen to
>>>>> >> be the same
>>>>> >> - Example 3: Only works if mtimes are the same
>>>>> >>
>>>>> >> File mtimes can happen to match sometimes, e.g. in a fresh git clone.
>>>>> >> However, this is a rather fickle criteria to rely on and could easily
>>>>> >> diverge in the middle of development. Notably, Example 2 was
>>>>> shared with
>>>>> >> me as an example where #pragma once worked great in development and
>>>>> >> broke in CI.
>>>>> >>
>>>>> >> Additionally, while GCC's approach might be able to handle multiple
>>>>> >> mounts better than other approaches, it can still break under multiple
>>>>> >> mounts if mtime resolution differs.
>>>>> >>
>>>>> >> Obviously there is no silver bullet for making #pragma once work
>>>>> >> perfectly all the time, however, I think it's easier to provide clear
>>>>> >> guarantees for #pragma once behavior when the definition of
>>>>> "same file"
>>>>> >> is based on file identity on device, i.e. device id + inode.
>>>>> >>
>>>>> >> Would GCC ever consider using device id + inode instead of
>>>>> mtime +
>>>>> >> contents for #pragma once?
>>>>> >>
>>>>> >> I presume the primary reason against changing the mtime + file contents
>>>>> >> approach in GCC would be caution over breaking any existing
>>>>> use. While
>>>>> >> the three examples above are cases where fickle mtime can be
>>>>> >> problematic, and I can't imagine any situations where mtime could
>>>>> >> reliably be relied upon, I do understand the degree of caution required
>>>>> >> for changes like this.
>>>>> >>
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Jeremy
>>>>> >
>>>>

Reply via email to