> Am 07.09.2024 um 07:27 schrieb Jeremy Rifkin <jer...@rifkin.dev>:
> 
> 
>> 
>> This is why I said what is the a same file if you can't rely on inodes 
>> working? 
> 
> I don't have a good answer for such a case. Of course, no matter how one
> approaches #pragma once there will be cases that aren't handled.
> 
> The criteria to optimize for, imo, is which has the most clear failure
> mode. Contents happening match could occur naturally without realizing,
> which is hard to triage. Mtimes colliding could easily happen without
> realizing, which is also hard to triage and reproduce. Path issues pop
> up as real build systems use links. Mtime can fail on multiple mounts,
> path certainly will. In my opinion, the failure modes for contents and
> mtime are very sub-ideal. Path isn't adequate, it seems clear supporting
> links is an important goal.
> 
> To level-set: I don't think it's reasonable to expect #pragma once to
> handle multiple distinct copies of the same file. Especially given that
> contents isn't an option.
> 
> The failure mode of inodes, however, is a lot clearer. It breaks with
> things like multiple mounts and filesystems that don't have inodes. The
> way I see it, advice to users becomes clear since it's much clearer
> exactly how and why #pragma once might break.
> 
>> Early 2000s vs now have a different landscape when it comes to file systems.
> 
> Given the landscape today, could it make sense to re-evaluate mtime + content?

I’d rather drop the feature - IIRC it’s already deprecated and I’d oppose 
standardization as doing so doesn’t make much sense given all the above issues 
and there is a working solution - include guards.  What can pragma once do that 
include guards can‘t?  What’s the issue to solve?  Include guard collisions?  
That’s a much better understood failure mode than indodes, hardlinks or mtime.

Richard

> Cheers
> Jeremy
> 
> On Sep 6 2024, at 10:29 pm, Andrew Pinski <pins...@gmail.com> wrote:
> 
>>>> On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin <jer...@rifkin.dev> wrote:
>>> 
>>>> Hi Andrew,
>>>> Thanks for the thoughts and quick reply.
>>>> 
>>>>> Not always. because inodes are not always stable on some file systems.
>>>>> And also does not work with multi-mounted devices too.
>>>> 
>>>> Unusual filesystems and multiple mounts are indeed the failing. As I
>>>> mentioned, there's no silver bullet; they each have pitfalls. I do,
>>>> however, think this is a less surprising failure mode than GCC's which
>>>> rears its head in surprising and inconsistent cases.
>>>> 
>>>>> I say if the file has the same content, then it is the same file and
>>>>> GCC uses that definition.
>>>> 
>>>> GCC doesn't use this definition, really. It's relying primarily on the
>>>> mtime check and only falling back to contents in case of collision.
>>>> 
>>>> The point on same contents contents == same file is well received. When
>>>> I wrote the first draft of my paper I wrote it proposing this, however,
>>>> I have become convinced this isn't the right approach based on examples
>>>> where you could intend to include two files with the same contents that
>>>> actually mean different things (such as Example 1).
>>>> 
>>>> GCC's approach is hybrid, half relying on something from the filesystem
>>>> and half relying on the contents. As far as I can tell this can lead to
>>>> a worst of both worlds.
>>>> 
>>>>> GCC definition is the only one which supports all issues described
>>>>> here dealing with inodes (sometimes being non-stable), canonical paths
>>>>> and both kinds of links and even re-mounted file systems.
>>>> 
>>>> I'd initially been thinking of a content-based solution in order to
>>>> avoid any filesystem reliance and support multiple mounts etc. The
>>>> problem currently is even GCC's approach, which has the best chance of
>>>> working on multiple mounts, doesn't work consistently due to potential
>>>> differences in mtime resolution. 
>>>> 
>>>>> What does the other implementations say about changing their
>>>>> definition of what "the same file is"? Have you asked clang and MSVC
>>>>> folks?
>>>> 
>>>> I've not yet asked. If I proceed with a proposal paper what I'll most
>>>> likely be proposing is what Clang does, worded in terms of same device
>>>> same location. I started here since GCC's approach is least similar to
>>>> that than what MSVC does. It's also easier to reach out to
>>>> developers on
>>>> open source projects.
>> 
>> Except the clang solution does not work for some file systems and is
>> broken when used on them. Maybe those file systems are not in use as
>> they once were and that is why clang didn't run into folks asking to
>> fix it.
>> 
>> Early 2000s vs now have a different landscape when it comes to file
>> systems. This is why I said what is the a same file if you can't rely
>> on inodes working? 
>> 
>> Thanks,
>> Andrew
>> 
>> 
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Jeremy
>>>> 
>>>> 
>>>> On Sep 6 2024, at 8:16 pm, Andrew Pinski <pins...@gmail.com> wrote:
>>>> 
>>>>> On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin <jer...@rifkin.dev> wrote:
>>>>>>  
>>>>>> Thanks Andrew, I appreciate the context and links. It looks like the
>>>>>> prior implementation failed to handle links due to being based on file
>>>>>> path, given cpp_simplify_pathname. Do you have thoughts on the use if
>>>>>> device ID + inode as a way to also accommodate symbolic links and hard
>>>>>> links without the fickleness of mtime?
>>>>>  
>>>>> Not always. because inodes are not always stable on some file systems.
>>>>> And also does not work with multi-mounted devices too.
>>>>> The whole definition of what is the same file is really up for
>>>>> debate here.
>>>>> I say if the file has the same content, then it is the same file and
>>>>> GCC uses that definition. While clang says it is based on if it is the
>>>>> same inode which is not always true because of file systems which
>>>>> don't use an inode number. While MSVC says it is based on the path but
>>>>> what is the canonical path to a file, is a hard link to the same file
>>>>> the same file or not; what about symbolic links? How about overlays
>>>>> and mounted directories are they the same then?
>>>>> GCC definition is the only one which supports all issues described
>>>>> here dealing with inodes (sometimes being non-stable), canonical paths
>>>>> and both kinds of links and even re-mounted file systems.
>>>>>  
>>>>> What does the other implementations say about changing their
>>>>> definition of what "the same file is"? Have you asked clang and MSVC
>>>>> folks?
>>>>> Anyways GCC has an optimization already for #ifdef/#define/#endif (and
>>>>> that is documented here:
>>>>> https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) so does
>>>>> it make sense to really standardize `#pramga once` here or just push
>>>>> other implementations to add a similar optimization instead?
>>>>>  
>>>>> Thanks,
>>>>> Andrew Pinski
>>>>>  
>>>>>>  
>>>>>> Cheers,
>>>>>> Jeremy
>>>>>>  
>>>>>> On Sep 6 2024, at 12:25 am, Andrew Pinski <pins...@gmail.com> wrote:
>>>>>>  
>>>>>>> On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin
>>>>>> <jer...@rifkin.dev> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I'm looking at #pragma once behavior among the major C/C++
>>>>>> compilers as
>>>>>>>> part of a proposal paper for standardizing #pragma once. (This is
>>>>>>>> apparently a very controversial topic)
>>>>>>>> 
>>>>>>>> To put my question up-front: Would GCC ever be open to altering its
>>>>>>>> #pragma once behavior to bring it more in-line with behavior
>>>>>> from other
>>>>>>>> compilers and possibly more in-line with what users expect?
>>>>>>>> 
>>>>>>>> To elaborate more:
>>>>>>>> 
>>>>>>>> Design decisions for #pragma once essentially boil down to a file-based
>>>>>>>> definitions vs a content-based definition of "same file".
>>>>>>>> 
>>>>>>>> A file-based definition is easier to reason about and more
>>>>>> in-line with
>>>>>>>> what users expect, however, distinct copies of headers can't be handled
>>>>>>>> and multiple mount points are problematic.
>>>>>>>> 
>>>>>>>> A content-based definition works for distinct copies, multiple mount
>>>>>>>> points, and is completely sufficient 99% of the time, however,
>>>>>> it could
>>>>>>>> potentially break in hard-to-debug ways in a few notable cases (more
>>>>>>>> information later).
>>>>>>>> 
>>>>>>>> Currently the three major C/C++ compilers treat #pragma once
>>>>>> very differently:
>>>>>>>> - GCC uses file mtime + file contents
>>>>>>>> - Clang uses inodes
>>>>>>>> - MSVC uses file path
>>>>>>> 
>>>>>>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52566#c2 .
>>>>>>> Note this was changed specifically in GCC 3.4 to fix the issue around
>>>>>>> symlinks and hard links.
>>>>>>> See https://gcc.gnu.org/pipermail/gcc-patches/2003-July/111203.html
>>>>>>> for more information on the fixes.
>>>>>>> 
>>>>>>> In fact `#pragma once` was deprecated before GCC 3.4 because it would
>>>>>>> do incorrectly what clang and MSVC are doing and that was considered
>>>>>>> wrong.
>>>>>>> So GCC behavior has been this way before clang was even written.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>> 
>>>>>>>> 
>>>>>>>> None of the major compilers have documented their #pragma once 
>>>>>>>> semantics.
>>>>>>>> 
>>>>>>>> In practice all three of these approaches work pretty well most
>>>>>> of the
>>>>>>>> time (which is why people feel comfortable using #pragma once). 
>>>>>>>> However,
>>>>>>>> they can each break in their own ways.
>>>>>>>> 
>>>>>>>> As mentioned earlier, clang and MSVC's file-based definitions
>>>>>> of "same
>>>>>>>> file" break for multiple mount points and multiple copies of
>>>>>> the same
>>>>>>>> header. MSVC's approach breaks for symbolic links and hard links.
>>>>>>>> 
>>>>>>>> GCC's hybrid approach can break in surprising ways. I have three
>>>>>>>> examples to share:
>>>>>>>> 
>>>>>>>> Example 1:
>>>>>>>> 
>>>>>>>> Consider a scenario such as:
>>>>>>>> 
>>>>>>>> usr/
>>>>>>>>    include/
>>>>>>>>      library_a/
>>>>>>>>        library_main.hpp
>>>>>>>>        foo.hpp
>>>>>>>>      library_b/
>>>>>>>>        library_main.hpp
>>>>>>>>        foo.hpp
>>>>>>>> src/
>>>>>>>>    main.cpp
>>>>>>>> 
>>>>>>>> main.cpp:
>>>>>>>> #include "library_a/library_main.hpp"
>>>>>>>> #include "library_b/library_main.hpp"
>>>>>>>> 
>>>>>>>> And both library_main.hpp's have:
>>>>>>>> #pragma once
>>>>>>>> #include "foo.hpp"
>>>>>>>> 
>>>>>>>> Example 2:
>>>>>>>> 
>>>>>>>> namespace v1 {
>>>>>>>>      #include "library_v1.hpp"
>>>>>>>> }
>>>>>>>> namespace v2 {
>>>>>>>>      #include "library_v2.hpp"
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Where both library headers include their own copy of a shared header
>>>>>>>> using #pragma once.
>>>>>>>> 
>>>>>>>> Example 3:
>>>>>>>> 
>>>>>>>> usr/
>>>>>>>>    include/
>>>>>>>>      library/
>>>>>>>>        library.hpp
>>>>>>>>        vendored-dependency.hpp
>>>>>>>> src/
>>>>>>>>    main.cpp
>>>>>>>>    vendored-dependency.hpp
>>>>>>>> 
>>>>>>>> main.cpp:
>>>>>>>> #include "vendored-dependency.hpp"
>>>>>>>> #include <library/library.hpp>
>>>>>>>> 
>>>>>>>> library.hpp:
>>>>>>>> #pragma once
>>>>>>>> #include "vendored-dependency.hpp"
>>>>>>>> 
>>>>>>>> Assuming the same contents byte-for-byte of
>>>>>> vendored-dependency.hpp, and
>>>>>>>> it uses #pragma once.
>>>>>>>> 
>>>>>>>> Each of these examples are plausible scenarios where two files
>>>>>> with the
>>>>>>>> same contents could be #included. In each example, on GCC, the
>>>>>> code can
>>>>>>>> work or break based on mtime:
>>>>>>>> - Example 1: Breaks if mtimes for library_main.hpp happen to be
>>>>>> the same
>>>>>>>> - Example 2: Breaks if mtimes for the shared dependency copies
>>>>>> happen to
>>>>>>>> be the same
>>>>>>>> - Example 3: Only works if mtimes are the same
>>>>>>>> 
>>>>>>>> File mtimes can happen to match sometimes, e.g. in a fresh git clone.
>>>>>>>> However, this is a rather fickle criteria to rely on and could easily
>>>>>>>> diverge in the middle of development. Notably, Example 2 was
>>>>>> shared with
>>>>>>>> me as an example where #pragma once worked great in development and
>>>>>>>> broke in CI.
>>>>>>>> 
>>>>>>>> Additionally, while GCC's approach might be able to handle multiple
>>>>>>>> mounts better than other approaches, it can still break under multiple
>>>>>>>> mounts if mtime resolution differs.
>>>>>>>> 
>>>>>>>> Obviously there is no silver bullet for making #pragma once work
>>>>>>>> perfectly all the time, however, I think it's easier to provide clear
>>>>>>>> guarantees for #pragma once behavior when the definition of
>>>>>> "same file"
>>>>>>>> is based on file identity on device, i.e. device id + inode.
>>>>>>>> 
>>>>>>>> Would GCC ever consider using device id + inode instead of
>>>>>> mtime +
>>>>>>>> contents for #pragma once?
>>>>>>>> 
>>>>>>>> I presume the primary reason against changing the mtime + file contents
>>>>>>>> approach in GCC would be caution over breaking any existing
>>>>>> use. While
>>>>>>>> the three examples above are cases where fickle mtime can be
>>>>>>>> problematic, and I can't imagine any situations where mtime could
>>>>>>>> reliably be relied upon, I do understand the degree of caution required
>>>>>>>> for changes like this.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Jeremy
>>>>>>> 
>>>>> 

Reply via email to