Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Wed, Mar 4, 2015 at 12:38 AM, Jeff Law wrote: > On 03/03/15 12:57, Martin Sebor wrote: >> >> >> As a data point(*) it might be interesting to note that GCC itself >> relies on memcpy providing stronger guarantees than the C standard >> requires it to by emitting calls to the function for large structure >> self-assignments (which are strictly conforming, as discussed in bug >> 65029). > > Right. I actually spent quite a bit of time struggling with this a while > back in a different context. The only case I could come up with where GCC > would generate an overlapping memcpy was self assignment, but even that was > bad and while we ultimately punted, I've always considered it a wart. ? struct A { int large[100]; }; void foo (struct A *x, struct A *y) { *x = *y; } call it as foo (&a, &a); (on x86 you need -mstringop-strategy=libcall, even at -O0, to emit a memcpy call) The self-assignment doesn't have to be visible to the compiler - so to fix this we'd have to assume pointer equality everywhere and either emit a conditional call to memcpy or always emit a call to memmove. Richard. > > [*] IMO, one in favor of tightening up the memcpy specification >> >> to require implementations to provide the expected semantics. > > That works for me :-) > > The things done in glibc's memcpy are a bit on the absurd side and the pain > caused by the changes over time is almost impossible to overstate. If the > Austin group tightens memcpy to require fewer surprises I think most > developers would ultimately be happy with the result -- a few would complain > about the performance impacts for specific workloads, but I suspect they'd > be in the minority. > > > jeff >
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 03/03/15 12:57, Martin Sebor wrote: As a data point(*) it might be interesting to note that GCC itself relies on memcpy providing stronger guarantees than the C standard requires it to by emitting calls to the function for large structure self-assignments (which are strictly conforming, as discussed in bug 65029). Right. I actually spent quite a bit of time struggling with this a while back in a different context. The only case I could come up with where GCC would generate an overlapping memcpy was self assignment, but even that was bad and while we ultimately punted, I've always considered it a wart. [*] IMO, one in favor of tightening up the memcpy specification to require implementations to provide the expected semantics. That works for me :-) The things done in glibc's memcpy are a bit on the absurd side and the pain caused by the changes over time is almost impossible to overstate. If the Austin group tightens memcpy to require fewer surprises I think most developers would ultimately be happy with the result -- a few would complain about the performance impacts for specific workloads, but I suspect they'd be in the minority. jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/2015 10:01 AM, Jeff Law wrote: On 02/20/15 05:10, Jakub Jelinek wrote: On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: On 02/19/2015 09:56 PM, Sandra Loosemore wrote: H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector vec; … vec.resize(size); memcpy(vec.data(), source, size); std::vector::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. I disagree. If you want a function that will have that different property, don't call it memcpy. Right. If someone wants to take it up with the Austin group, that's fine. But until/unless the Austin group blesses, I don't think we should extend as a QoI matter. As a data point(*) it might be interesting to note that GCC itself relies on memcpy providing stronger guarantees than the C standard requires it to by emitting calls to the function for large structure self-assignments (which are strictly conforming, as discussed in bug 65029). Martin [*] IMO, one in favor of tightening up the memcpy specification to require implementations to provide the expected semantics.
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 28/02/2015 9:12 am, Manuel López-Ibáñez wrote: On 02/19/15 14:56, Chris Johns wrote: My main concern is not knowing the trap has been added to the code. If I could build an application and audit it somehow then I can manage it. We have a similar issue with the possible use of FP registers being used in general code (ISR save/restore trade off). Can the ELF be annotated in some GCC specific way that makes it to the final executable to flag this is happening ? We can then create tools to audit the executables. Simply ignore me if I'm misunderstanding the issue: Couldn't GCC generate, instead of a trap, a call to a noinline noreturn cold weak function __gcc_is_a_trap that by default calls the trap? Then, audit tools can inspect the code and see if such a function call appears and even override it with something else. Yes it could and this is a nice idea. Chris, wouldn't that be enough for your purposes? I think it does because we can scan an executable for the call locations and audit them. Chris
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/19/15 14:56, Chris Johns wrote: > > My main concern is not knowing the trap has been added to the code. If I > could build an application and audit it somehow then I can manage it. We > have a similar issue with the possible use of FP registers being used in > general code (ISR save/restore trade off). > > Can the ELF be annotated in some GCC specific way that makes it to the > final executable to flag this is happening ? We can then create tools to > audit the executables. Simply ignore me if I'm misunderstanding the issue: Couldn't GCC generate, instead of a trap, a call to a noinline noreturn cold weak function __gcc_is_a_trap that by default calls the trap? Then, audit tools can inspect the code and see if such a function call appears and even override it with something else. Chris, wouldn't that be enough for your purposes? Cheers, Manuel.
[RFC/patch for stage1] Embed compiler dumps into generated .o files (was Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference)
On Thu, 2015-02-26 at 11:17 -0500, David Malcolm wrote: > On Fri, 2015-02-20 at 10:29 -0700, Jeff Law wrote: > > On 02/19/15 14:56, Chris Johns wrote: > > > On 20/02/2015 8:23 am, Joel Sherrill wrote: > > >> > > >> On 2/19/2015 2:56 PM, Sandra Loosemore wrote: > > >>> Jakub Jelinek wrote: > > On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: > > > Starting with gcc 4.9, -O2 implicitly invokes > > > > > > -fisolate-erroneous-paths-dereference: > > > > > > which > > > > > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > > > > > documents as > > > > > > Detect paths that trigger erroneous or undefined behavior due to > > > dereferencing a null pointer. Isolate those paths from the > > > main control > > > flow and turn the statement with erroneous or undefined > > > behavior into a > > > trap. This flag is enabled by default at -O2 and higher. > > > > > > This results in a sizable number of previously working embedded > > > programs mysteriously > > > crashing when recompiled under gcc 4.9. The problem is that embedded > > > programs will often have ram starting at address zero (think > > > hardware-defined > > > interrupt vectors, say) which gets initialized by code which the > > > -fisolate-erroneous-paths-deference logic can recognize as reading > > > and/or > > > writing address zero. > > If you have some pages mapped at address 0, you really should > > compile your > > code with -fno-delete-null-pointer-checks, otherwise you can run > > into tons > > of other issues. > > >>> H, Passing the additional option in user code would be one thing, > > >>> but what about library code? E.g., using memcpy (either explicitly or > > >>> implicitly for a structure copy)? > > >>> > > >>> It looks to me like cr16 and avr are currently the only architectures > > >>> that disable flag_delete_null_pointer_checks entirely, but I am sure > > >>> that this issue affects other embedded targets besides nios2, too. E.g. > > >>> scanning Mentor's ARM board support library, I see a whole pile of > > >>> devices that have memory mapped at address zero (TI Stellaris/Tiva, > > >>> Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator > > >>> BSPs assume a flat address space starting at address 0. > > >> I forwarded this to the RTEMS list and was promptly pointed to a patch > > >> on a Coldfire BSP where someone worked around this behavior. > > >> > > >> We are discussing how to deal with this. It is likely OK in user code but > > >> horrible in BSP and driver code. We don't have a solution ourselves. We > > >> just recognize it impacts a number of targets. > > >> > > > > > > My main concern is not knowing the trap has been added to the code. If I > > > could build an application and audit it somehow then I can manage it. We > > > have a similar issue with the possible use of FP registers being used in > > > general code (ISR save/restore trade off). > > > > > > Can the ELF be annotated in some GCC specific way that makes it to the > > > final executable to flag this is happening ? We can then create tools to > > > audit the executables. > > Not really, for a variety of reasons. > > Is information on this reaching the pass-specific dumpfile? I don't see > any explicit dumping in gimple-ssa-isolate-paths.c, but I guess that > insert_trap_and_remove_trailing_statements could log itself to the > dumpfile, or use the statistics framework (which itself also reaches a > dumpfile). > > Assuming the info is reaching a dumpfile, could gcc have an option to > write its dumpfiles into a special ELF section in the .s, rather than to > disk? > > Then (given a suitable new option to e.g. eu-readelf) you'd be able to > read the dumpfiles from a .o file, and (handwaving about linkage) from > an execuable or library. > > Not that I'm volunteering... Perhaps foolishly I had a go at prototyping this; attached is a proof-of-concept patch (albeit with FIXMEs and no ChangeLog or testsuite coverage). When writing out the final asm, each dumpfile is read, and embedded into its own section. Manual review of the built .o file shows that the dumpfiles make it into them, e.g.: $ eu-readelf -x .note.GNU-dump.tree-switchconv smoketest.o|head Hex dump of section [28] '.note.GNU-dump.tree-switchconv', 2698 bytes at offset 0xf021: 0x 0a3b3b20 46756e63 74696f6e 20746573 .;; Function tes 0x0010 745f7068 69202874 6573745f 7068692c t_phi (test_phi, 0x0020 2066756e 63646566 5f6e6f3d 302c2064 funcdef_no=0, d 0x0030 65636c5f 7569643d 31383332 2c206367 ecl_uid=1832, cg 0x0040 72617068 5f756964 3d302c20 73796d62 raph_uid=0, symb 0x0050 6f6c5f6f 72646572 3d30290a 0a746573 ol_order=0)..tes 0x0060 745f7068 69202869 6e742069 2c20696e t_phi (int i, in 0x0070 74206a29 0a7b0a20 20696e74 206b3b0a t j).{. int
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Fri, 2015-02-20 at 10:29 -0700, Jeff Law wrote: > On 02/19/15 14:56, Chris Johns wrote: > > On 20/02/2015 8:23 am, Joel Sherrill wrote: > >> > >> On 2/19/2015 2:56 PM, Sandra Loosemore wrote: > >>> Jakub Jelinek wrote: > On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: > > Starting with gcc 4.9, -O2 implicitly invokes > > > > -fisolate-erroneous-paths-dereference: > > > > which > > > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > > > documents as > > > > Detect paths that trigger erroneous or undefined behavior due to > > dereferencing a null pointer. Isolate those paths from the > > main control > > flow and turn the statement with erroneous or undefined > > behavior into a > > trap. This flag is enabled by default at -O2 and higher. > > > > This results in a sizable number of previously working embedded > > programs mysteriously > > crashing when recompiled under gcc 4.9. The problem is that embedded > > programs will often have ram starting at address zero (think > > hardware-defined > > interrupt vectors, say) which gets initialized by code which the > > -fisolate-erroneous-paths-deference logic can recognize as reading > > and/or > > writing address zero. > If you have some pages mapped at address 0, you really should > compile your > code with -fno-delete-null-pointer-checks, otherwise you can run > into tons > of other issues. > >>> H, Passing the additional option in user code would be one thing, > >>> but what about library code? E.g., using memcpy (either explicitly or > >>> implicitly for a structure copy)? > >>> > >>> It looks to me like cr16 and avr are currently the only architectures > >>> that disable flag_delete_null_pointer_checks entirely, but I am sure > >>> that this issue affects other embedded targets besides nios2, too. E.g. > >>> scanning Mentor's ARM board support library, I see a whole pile of > >>> devices that have memory mapped at address zero (TI Stellaris/Tiva, > >>> Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator > >>> BSPs assume a flat address space starting at address 0. > >> I forwarded this to the RTEMS list and was promptly pointed to a patch > >> on a Coldfire BSP where someone worked around this behavior. > >> > >> We are discussing how to deal with this. It is likely OK in user code but > >> horrible in BSP and driver code. We don't have a solution ourselves. We > >> just recognize it impacts a number of targets. > >> > > > > My main concern is not knowing the trap has been added to the code. If I > > could build an application and audit it somehow then I can manage it. We > > have a similar issue with the possible use of FP registers being used in > > general code (ISR save/restore trade off). > > > > Can the ELF be annotated in some GCC specific way that makes it to the > > final executable to flag this is happening ? We can then create tools to > > audit the executables. > Not really, for a variety of reasons. Is information on this reaching the pass-specific dumpfile? I don't see any explicit dumping in gimple-ssa-isolate-paths.c, but I guess that insert_trap_and_remove_trailing_statements could log itself to the dumpfile, or use the statistics framework (which itself also reaches a dumpfile). Assuming the info is reaching a dumpfile, could gcc have an option to write its dumpfiles into a special ELF section in the .s, rather than to disk? Then (given a suitable new option to e.g. eu-readelf) you'd be able to read the dumpfiles from a .o file, and (handwaving about linkage) from an execuable or library. Not that I'm volunteering... > However, the compiler can do > better for warning about some of these kinds of things -- but we > certainly can't guarantee we catch all of them as there are cases where > the point where we determine a property (such as non-nullness) may be > very different from the point where we exploit that property. > > I did propose some patches to improve the warnings back in the 4.9 time > frame, but they never got reviewed. See BZ 16351. We'll have to > revisit them during the next open development period. > > Jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
> On Feb 20, 2015, at 12:01 PM, Jeff Law wrote: > > ... > Regardless, the right thing to do is to disable elimination of NULL pointer > checks on targets where page 0 is mapped and thus a reference to *0 may not > fault. In my mind this is an attribute of both the processor (see H8 above) > and/or the target OS. > > On those targets the C-runtime had better also ensure that its headers aren't > decorated with non-null attributes, particularly for the mem* and str* > functions. pdp11 is an example of such a target, independent of OS (with only 8 pages, clearly no one is going to unmap page 0). Fortunately there one is unlikely to find C program data structures at 0 (instead, vectors if kernel, stack limit if user mode). So no fault is correct there, but no null (0 bits) pointer is also — for practical purposes — a valid assumption.
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
> On Feb 20, 2015, at 12:01 PM, Jeff Law wrote: > > On 02/20/15 04:43, Jonathan Wakely wrote: >>> ... >> >> I'm inclined to agree. >> >> Most developers aren't aware of the preconditions on memcpy, but GCC >> optimizes aggressively based on those preconditions, so we have a >> large and potentially dangerous gap between what developers expect and >> what actually happens. > But that's always true -- this isn't any different than aliasing, arithmetic > overflow, etc. The standards define the contract between the > compiler/library implementors and the developers. Once the contract is > broken, all bets are off. True. The unfortunate problem with C, and even more so with C++, is that the contract is so large and complex that few, if any, are skilled enough language lawyers to know what exactly it says. For that matter, the contract (the standard) is large and complex enough that it has bugs and ambiguities, so the contract is not in fact precisely defined. There’s a nice paper that drives this home: http://people.csail.mit.edu/akcheung/papers/apsys12.pdf For example, while most people know about the “no overlaps” rule of memcpy, stuff like aliasing are far more obscure. Or the exact meaning (if there is one) of “volatile”. It also doesn’t help that a large fraction of the contract is unenforced. You only find out about it when a new version of the compiler starts using a particular rule to make an optimization that suddenly breaks 10 year old code. I remember some heated debates between Linux folks and compiler builders when such things strike the Linux kernel. paul
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/15 10:09, Florian Weimer wrote: On 02/20/2015 06:01 PM, Jeff Law wrote: But that's always true -- this isn't any different than aliasing, arithmetic overflow, etc. The standards define the contract between the compiler/library implementors and the developers. Once the contract is broken, all bets are off. What I don't like about this case (std::vector::data() returning nullptr vs memcpy/memcmp/qsort non-null assertions) is that it is internally non-composing in a totally non-obvious way. data() is explicitly intended to cover interoperability with these older C functions, and it fails. And that's precisely why I consider this class of issues the most problematical. Jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/19/15 14:56, Chris Johns wrote: On 20/02/2015 8:23 am, Joel Sherrill wrote: On 2/19/2015 2:56 PM, Sandra Loosemore wrote: Jakub Jelinek wrote: On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: Starting with gcc 4.9, -O2 implicitly invokes -fisolate-erroneous-paths-dereference: which https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html documents as Detect paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behavior into a trap. This flag is enabled by default at -O2 and higher. This results in a sizable number of previously working embedded programs mysteriously crashing when recompiled under gcc 4.9. The problem is that embedded programs will often have ram starting at address zero (think hardware-defined interrupt vectors, say) which gets initialized by code which the -fisolate-erroneous-paths-deference logic can recognize as reading and/or writing address zero. If you have some pages mapped at address 0, you really should compile your code with -fno-delete-null-pointer-checks, otherwise you can run into tons of other issues. H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? It looks to me like cr16 and avr are currently the only architectures that disable flag_delete_null_pointer_checks entirely, but I am sure that this issue affects other embedded targets besides nios2, too. E.g. scanning Mentor's ARM board support library, I see a whole pile of devices that have memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator BSPs assume a flat address space starting at address 0. I forwarded this to the RTEMS list and was promptly pointed to a patch on a Coldfire BSP where someone worked around this behavior. We are discussing how to deal with this. It is likely OK in user code but horrible in BSP and driver code. We don't have a solution ourselves. We just recognize it impacts a number of targets. My main concern is not knowing the trap has been added to the code. If I could build an application and audit it somehow then I can manage it. We have a similar issue with the possible use of FP registers being used in general code (ISR save/restore trade off). Can the ELF be annotated in some GCC specific way that makes it to the final executable to flag this is happening ? We can then create tools to audit the executables. Not really, for a variety of reasons. However, the compiler can do better for warning about some of these kinds of things -- but we certainly can't guarantee we catch all of them as there are cases where the point where we determine a property (such as non-nullness) may be very different from the point where we exploit that property. I did propose some patches to improve the warnings back in the 4.9 time frame, but they never got reviewed. See BZ 16351. We'll have to revisit them during the next open development period. Jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/15 04:43, Jonathan Wakely wrote: On 20 February 2015 at 11:06, Florian Weimer wrote: On 02/19/2015 09:56 PM, Sandra Loosemore wrote: H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector vec; … vec.resize(size); memcpy(vec.data(), source, size); std::vector::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. I'm inclined to agree. Most developers aren't aware of the preconditions on memcpy, but GCC optimizes aggressively based on those preconditions, so we have a large and potentially dangerous gap between what developers expect and what actually happens. But that's always true -- this isn't any different than aliasing, arithmetic overflow, etc. The standards define the contract between the compiler/library implementors and the developers. Once the contract is broken, all bets are off. jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/2015 06:01 PM, Jeff Law wrote: > But that's always true -- this isn't any different than aliasing, > arithmetic overflow, etc. The standards define the contract between the > compiler/library implementors and the developers. Once the contract is > broken, all bets are off. What I don't like about this case (std::vector::data() returning nullptr vs memcpy/memcmp/qsort non-null assertions) is that it is internally non-composing in a totally non-obvious way. data() is explicitly intended to cover interoperability with these older C functions, and it fails. But you are right about overflows. I think we should give up and just enable -fwrapv by default in Fedora and downstream. This issue has been explicitly documented since 2002 at least (explicitly with security-related checks in mind), and programmers still write overflow checks which are only correct with -fwrapv, and it passes code review. I fear that's not going to change, ever. -- Florian Weimer / Red Hat Product Security
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/15 05:10, Jakub Jelinek wrote: On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: On 02/19/2015 09:56 PM, Sandra Loosemore wrote: H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector vec; … vec.resize(size); memcpy(vec.data(), source, size); std::vector::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. I disagree. If you want a function that will have that different property, don't call it memcpy. Right. If someone wants to take it up with the Austin group, that's fine. But until/unless the Austin group blesses, I don't think we should extend as a QoI matter. jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/15 04:06, Florian Weimer wrote: On 02/19/2015 09:56 PM, Sandra Loosemore wrote: H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector vec; … vec.resize(size); memcpy(vec.data(), source, size); std::vector::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. And it's these kind of uses that scare me the most. jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/15 04:45, Florian Weimer wrote: On 02/20/2015 10:30 AM, Andrew Haley wrote: I doubt that such a thing is ever going to be safe. The idea that a null pointer points to nothing is so hard-baked into the design of C that you can't get away from it. Also, almost every C programmer and especially library writer "knows" that a null pointer points to nothing. NULL pointer dereferences (or NULL pointer with small offsets) were common programming idioms in the DOS days because the interrupt vector table was located at this address. Quite a few systems once had a readable page zero, and (manual, I assume) optimizations for list traversal (p != NULL && p->next != NULL → p->next != NULL) were commonly used on these systems. True, but thankfully this isn't blessed anymore. I think the treatment of pointers not as addresses, but something that has type information and provenience associated with it, came much later, when most of the design was already settled. We still have targets where page0 is special. The H8 for example comes to mind. Folks regularly place data into page0 and mark it as special so the compiler emits more efficient sequences to access that data. Regardless, the right thing to do is to disable elimination of NULL pointer checks on targets where page 0 is mapped and thus a reference to *0 may not fault. In my mind this is an attribute of both the processor (see H8 above) and/or the target OS. On those targets the C-runtime had better also ensure that its headers aren't decorated with non-null attributes, particularly for the mem* and str* functions. jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/2015 11:06 AM, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >> H, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector vec; > … > vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. Sure, but if that's a bug then it's a bug in the definition of memcpy(), not in the definition of the properties of a null pointer. If the size is zero then it really shouldn't matter if the destination address is null. Andrew.
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: > > H, Passing the additional option in user code would be one thing, > > but what about library code? E.g., using memcpy (either explicitly or > > implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector vec; > … > vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. > > I think the C standard is wrong here. We should extend it, as a QoI > matter, and support null pointers for variable-length inputs and outputs > if the size is 0. But I suspect this is still a minority view. I disagree. If you want a function that will have that different property, don't call it memcpy. Jakub
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/2015 12:43 PM, Jonathan Wakely wrote: > On 20 February 2015 at 11:06, Florian Weimer wrote: >> On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >>> H, Passing the additional option in user code would be one thing, >>> but what about library code? E.g., using memcpy (either explicitly or >>> implicitly for a structure copy)? >> >> The memcpy problem isn't restricted to embedded architectures. >> >> size_t size; >> const unsigned char *source; >> std::vector vec; >> … >> vec.resize(size); >> memcpy(vec.data(), source, size); >> >> std::vector::data() can return a null pointer if the vector is empty, >> which means that this code is invalid for empty inputs. >> >> I think the C standard is wrong here. We should extend it, as a QoI >> matter, and support null pointers for variable-length inputs and outputs >> if the size is 0. But I suspect this is still a minority view. > > I'm inclined to agree. > > Most developers aren't aware of the preconditions on memcpy, but GCC > optimizes aggressively based on those preconditions, so we have a > large and potentially dangerous gap between what developers expect and > what actually happens. Maybe we can add, as a compromise, an always-inline wrapper like this? void *memcpy(void *dst, const *void src, size_t size) { if (__builtin_constant_p(size > 0) && size > 0) { // Or whatever else is needed as non-null assertions. *(char *)dst; *(const char *)src; } return memcpy_real(dst, src, size); // Without non-null assertion. } Then we'll still get the non-NULL optimization for the common positive size case. -- Florian Weimer / Red Hat Product Security
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/20/2015 10:30 AM, Andrew Haley wrote: > I doubt that such a thing is ever going to be safe. The idea that a > null pointer points to nothing is so hard-baked into the design of C > that you can't get away from it. Also, almost every C programmer and > especially library writer "knows" that a null pointer points to > nothing. NULL pointer dereferences (or NULL pointer with small offsets) were common programming idioms in the DOS days because the interrupt vector table was located at this address. Quite a few systems once had a readable page zero, and (manual, I assume) optimizations for list traversal (p != NULL && p->next != NULL → p->next != NULL) were commonly used on these systems. I think the treatment of pointers not as addresses, but something that has type information and provenience associated with it, came much later, when most of the design was already settled. -- Florian Weimer / Red Hat Product Security
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 20 February 2015 at 11:06, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >> H, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector vec; > … > vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. > > I think the C standard is wrong here. We should extend it, as a QoI > matter, and support null pointers for variable-length inputs and outputs > if the size is 0. But I suspect this is still a minority view. I'm inclined to agree. Most developers aren't aware of the preconditions on memcpy, but GCC optimizes aggressively based on those preconditions, so we have a large and potentially dangerous gap between what developers expect and what actually happens.
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/19/2015 09:56 PM, Sandra Loosemore wrote: > H, Passing the additional option in user code would be one thing, > but what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector vec; … vec.resize(size); memcpy(vec.data(), source, size); std::vector::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. -- Florian Weimer / Red Hat Product Security
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 18/02/15 19:21, Jeff Prothero wrote: > BTW, I'd also be curious to know what is regarded as engineering > best practice for writing a value to address zero when this is > architecturally required by the hardware platform at hand. > Obviously one can do various things to obscure the process > sufficiently that the current gcc implementation won't detect it and > complain, but as gcc gets smarter about optimization those are at > risk of failing in a future release. It would be nice to have a > guaranteed-to-work future-proof idiom for doing this. Do we have > one, short of retreating to assembly code? I doubt that such a thing is ever going to be safe. The idea that a null pointer points to nothing is so hard-baked into the design of C that you can't get away from it. Also, almost every C programmer and especially library writer "knows" that a null pointer points to nothing. The only way to initialize an interrupt vector table at address zero is to go outside the language. While it may be possible to use some proprietary GCC options to get around this, it's never going to be a good idea because some GCC author (or indeed library author) may make the "mistake" of assuming that null pointer point to nothing. Using C to initialize anything at address zero is so dangerous that we shouldn't tell people that they can use such-and-such a GCC option to support it; we should warn them never to do it, and provide as many warnings as we can. IMO, engineering best practice is to use assembly code. Andrew.
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
(Thanks to everyone for the helpful feedback!) Daniel Gutson wrote: > what about then two warnings (disabled by default), one intended to > tell the user each time the compiler removes a conditional > (-fdelete-null-pointer-checks) > and another intended to tell the user each time the compiler adds a > trap due to dereference an address 0? > > E.g. > -Wnull-pointer-check-deleted > -Wnull-dereference-considered-erroneous I very much like the idea of such warnings. I'm not clear why one would not warn by default when detecting non-standards-conformant code and producing code guaranteed not to do what the programmer intended. But presumably most sane engineers these days compile with -Wall. :-) -Jeff
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 02/19/2015 03:09 PM, Jakub Jelinek wrote: If you have hw where NULL is mapped and you know your code violates the C/C++ standards by placing objects at that address, simply do use the option that is designed for that purpose. As I pointed out earlier, though, that won't help you if your program (perhaps implicitly) uses library code that's been built without the magic option. I thought that declaring an object explicitly placed at address 0 as "volatile" ought to be sufficient to prevent writes to it from being optimized away and replaced with traps, but alas, that is not the case. Also, a structure copy to such a volatile object is implicitly invoking the regular memcpy function without complaining about casting away volatile-ness, as an explicit call would do. I've attached a couple quick test cases and .s output from arm-none-eabi-gcc -O2, using a mainline build from last night. -Sandra static int * const x0 = (int *) 0x; static volatile int * const xv = (int *) 0x; static int * const xn = (int *) 0x1000; void f0 (int *data) { *x0 = *data; } void fv (int *data) { *xv = *data; } void fn (int *data) { *xn = *data; } .cpu arm7tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .arm .syntax divided .file "test.c" .text .align 2 .global f0 .type f0, %function f0: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #0 str r3, [r3] .inst 0xe7f000f0 .size f0, .-f0 .align 2 .global fv .type fv, %function fv: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #0 str r3, [r3] .inst 0xe7f000f0 .size fv, .-fv .align 2 .global fn .type fn, %function fn: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #268435456 ldr r2, [r0] str r2, [r3] bx lr .size fn, .-fn .ident "GCC: (GNU) 5.0.0 20150219 (experimental)" struct big { int a, b; char c [100]; }; static struct big * const x0 = (struct big *) 0x; static volatile struct big * const xv = (struct big *) 0x; static struct big * const xn = (struct big *) 0x1000; void f0 (struct big *data) { *x0 = *data; } void fv (struct big *data) { *xv = *data; } void fn (struct big *data) { *xn = *data; } .cpu arm7tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .arm .syntax divided .file "test2.c" .text .align 2 .global f0 .type f0, %function f0: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 mov r1, r0 stmfd sp!, {r4, lr} mov r2, #108 mov r0, #0 bl memcpy .inst 0xe7f000f0 .size f0, .-f0 .align 2 .global fv .type fv, %function fv: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 mov r1, r0 stmfd sp!, {r4, lr} mov r2, #108 mov r0, #0 bl memcpy .inst 0xe7f000f0 .size fv, .-fv .align 2 .global fn .type fn, %function fn: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, lr} mov r1, r0 mov r2, #108 mov r0, #268435456 bl memcpy ldmfd sp!, {r4, lr} bx lr .size fn, .-fn .ident "GCC: (GNU) 5.0.0 20150219 (experimental)"
Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Thu, Feb 19, 2015 at 06:16:05PM -0300, Daniel Gutson wrote: > what about then two warnings (disabled by default), one intended to > tell the user each time the compiler removes a conditional > (-fdelete-null-pointer-checks) > and another intended to tell the user each time the compiler adds a trap due > to > dereference an address 0? > > E.g. >-Wnull-pointer-check-deleted >-Wnull-dereference-considered-erroneous > > or alike That would be extremely difficult. The -fdelete-null-pointer-checks option is used in many places, like the path isolation, value range propagation, alias oracle, number of iteration analysis etc. E.g. in case of value range propagation, it is really hard to warn if something has been optimized some way because of it, because you really don't know the reason why after all the propagation some SSA_NAME got certain range, to warn you'd essentially have to do all of VRP twice, once with -fdelete-null-pointer-checks and once without, and then compare that when actually performing optimizations. But some optimizations are also done far later than directly in the VRP pass. If you have hw where NULL is mapped and you know your code violates the C/C++ standards by placing objects at that address, simply do use the option that is designed for that purpose. Jakub
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 20/02/2015 8:23 am, Joel Sherrill wrote: On 2/19/2015 2:56 PM, Sandra Loosemore wrote: Jakub Jelinek wrote: On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: Starting with gcc 4.9, -O2 implicitly invokes -fisolate-erroneous-paths-dereference: which https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html documents as Detect paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behavior into a trap. This flag is enabled by default at -O2 and higher. This results in a sizable number of previously working embedded programs mysteriously crashing when recompiled under gcc 4.9. The problem is that embedded programs will often have ram starting at address zero (think hardware-defined interrupt vectors, say) which gets initialized by code which the -fisolate-erroneous-paths-deference logic can recognize as reading and/or writing address zero. If you have some pages mapped at address 0, you really should compile your code with -fno-delete-null-pointer-checks, otherwise you can run into tons of other issues. H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? It looks to me like cr16 and avr are currently the only architectures that disable flag_delete_null_pointer_checks entirely, but I am sure that this issue affects other embedded targets besides nios2, too. E.g. scanning Mentor's ARM board support library, I see a whole pile of devices that have memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator BSPs assume a flat address space starting at address 0. I forwarded this to the RTEMS list and was promptly pointed to a patch on a Coldfire BSP where someone worked around this behavior. We are discussing how to deal with this. It is likely OK in user code but horrible in BSP and driver code. We don't have a solution ourselves. We just recognize it impacts a number of targets. My main concern is not knowing the trap has been added to the code. If I could build an application and audit it somehow then I can manage it. We have a similar issue with the possible use of FP registers being used in general code (ISR save/restore trade off). Can the ELF be annotated in some GCC specific way that makes it to the final executable to flag this is happening ? We can then create tools to audit the executables. Chris
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On 2/19/2015 2:56 PM, Sandra Loosemore wrote: > Jakub Jelinek wrote: >> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>> Starting with gcc 4.9, -O2 implicitly invokes >>> >>> -fisolate-erroneous-paths-dereference: >>> >>> which >>> >>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>> >>> documents as >>> >>> Detect paths that trigger erroneous or undefined behavior due to >>> dereferencing a null pointer. Isolate those paths from the main control >>> flow and turn the statement with erroneous or undefined behavior into a >>> trap. This flag is enabled by default at -O2 and higher. >>> >>> This results in a sizable number of previously working embedded programs >>> mysteriously >>> crashing when recompiled under gcc 4.9. The problem is that embedded >>> programs will often have ram starting at address zero (think >>> hardware-defined >>> interrupt vectors, say) which gets initialized by code which the >>> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >>> writing address zero. >> If you have some pages mapped at address 0, you really should compile your >> code with -fno-delete-null-pointer-checks, otherwise you can run into tons >> of other issues. > H, Passing the additional option in user code would be one thing, > but what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? > > It looks to me like cr16 and avr are currently the only architectures > that disable flag_delete_null_pointer_checks entirely, but I am sure > that this issue affects other embedded targets besides nios2, too. E.g. > scanning Mentor's ARM board support library, I see a whole pile of > devices that have memory mapped at address zero (TI Stellaris/Tiva, > Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator > BSPs assume a flat address space starting at address 0. I forwarded this to the RTEMS list and was promptly pointed to a patch on a Coldfire BSP where someone worked around this behavior. We are discussing how to deal with this. It is likely OK in user code but horrible in BSP and driver code. We don't have a solution ourselves. We just recognize it impacts a number of targets. > I can see both sides of the issue here On the one hand, you get > better code for 99.99% of situations by enabling > -fdelete-null-pointer-checks, but if it makes GCC unusable in the other > .01% case, that is a problem for the users for whom that case is > critical. :-S So I think Jeff's request here is something that > deserves an answer: Which side is benefiting? The embedded side or self-hosted side? What is the actual benefit? Can the option be disabled/enabled on a per target basis? I honestly could see disabling it for *-rtems* and on purely embedded architectures. >>> BTW, I'd also be curious to know what is regarded as engineering best >>> practice for writing a value to address zero when this is architecturally >>> required by the hardware platform at hand. Obviously one can do various >>> things to obscure the process sufficiently that the current gcc >>> implementation >>> won't detect it and complain, but as gcc gets smarter about optimization >>> those are at risk of failing in a future release. It would be nice to have >>> a guaranteed-to-work future-proof idiom for doing this. Do we have one, >>> short >>> of retreating to assembly code? Or GCC specific code. I considered a special memcpy clone for our BSPs which was not inlined and used a pragma to turn this optimization off. Then it would just be a matter of using it in the right places. But knowing all the right places is a hard problem by itself. :( > -Sandra > -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
(Hi Sandra, so long!) On Thu, Feb 19, 2015 at 5:56 PM, Sandra Loosemore wrote: > Jakub Jelinek wrote: >> >> >> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>> >>> Starting with gcc 4.9, -O2 implicitly invokes >>> >>> -fisolate-erroneous-paths-dereference: >>> >>> which >>> >>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>> >>> documents as >>> >>> Detect paths that trigger erroneous or undefined behavior due to >>> dereferencing a null pointer. Isolate those paths from the main >>> control >>> flow and turn the statement with erroneous or undefined behavior into >>> a >>> trap. This flag is enabled by default at -O2 and higher. >>> >>> This results in a sizable number of previously working embedded programs >>> mysteriously >>> crashing when recompiled under gcc 4.9. The problem is that embedded >>> programs will often have ram starting at address zero (think >>> hardware-defined >>> interrupt vectors, say) which gets initialized by code which the >>> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >>> writing address zero. >> >> >> If you have some pages mapped at address 0, you really should compile your >> code with -fno-delete-null-pointer-checks, otherwise you can run into tons >> of other issues. > > > H, Passing the additional option in user code would be one thing, but > what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? > > It looks to me like cr16 and avr are currently the only architectures that > disable flag_delete_null_pointer_checks entirely, but I am sure that this > issue affects other embedded targets besides nios2, too. E.g. scanning > Mentor's ARM board support library, I see a whole pile of devices that have > memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, > Atmel AT91SAMxxx, ). Plus our simulator BSPs assume a flat address > space starting at address 0. > > I can see both sides of the issue here On the one hand, you get better > code for 99.99% of situations by enabling -fdelete-null-pointer-checks, but > if it makes GCC unusable in the other .01% case, that is a problem for the > users for whom that case is critical. :-S So I think Jeff's request here > is something that deserves an answer: what about then two warnings (disabled by default), one intended to tell the user each time the compiler removes a conditional (-fdelete-null-pointer-checks) and another intended to tell the user each time the compiler adds a trap due to dereference an address 0? E.g. -Wnull-pointer-check-deleted -Wnull-dereference-considered-erroneous or alike ? Daniel. > > >>> BTW, I'd also be curious to know what is regarded as engineering best >>> practice for writing a value to address zero when this is architecturally >>> required by the hardware platform at hand. Obviously one can do various >>> things to obscure the process sufficiently that the current gcc >>> implementation >>> won't detect it and complain, but as gcc gets smarter about optimization >>> those are at risk of failing in a future release. It would be nice to >>> have >>> a guaranteed-to-work future-proof idiom for doing this. Do we have one, >>> short >>> of retreating to assembly code? > > > -Sandra > -- Daniel F. Gutson Chief Technology Officer, SPD San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211 Skype:dgutson LinkedIn: http://ar.linkedin.com/in/danielgutson
Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
Jakub Jelinek wrote: On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: Starting with gcc 4.9, -O2 implicitly invokes -fisolate-erroneous-paths-dereference: which https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html documents as Detect paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behavior into a trap. This flag is enabled by default at -O2 and higher. This results in a sizable number of previously working embedded programs mysteriously crashing when recompiled under gcc 4.9. The problem is that embedded programs will often have ram starting at address zero (think hardware-defined interrupt vectors, say) which gets initialized by code which the -fisolate-erroneous-paths-deference logic can recognize as reading and/or writing address zero. If you have some pages mapped at address 0, you really should compile your code with -fno-delete-null-pointer-checks, otherwise you can run into tons of other issues. H, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? It looks to me like cr16 and avr are currently the only architectures that disable flag_delete_null_pointer_checks entirely, but I am sure that this issue affects other embedded targets besides nios2, too. E.g. scanning Mentor's ARM board support library, I see a whole pile of devices that have memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ). Plus our simulator BSPs assume a flat address space starting at address 0. I can see both sides of the issue here On the one hand, you get better code for 99.99% of situations by enabling -fdelete-null-pointer-checks, but if it makes GCC unusable in the other .01% case, that is a problem for the users for whom that case is critical. :-S So I think Jeff's request here is something that deserves an answer: BTW, I'd also be curious to know what is regarded as engineering best practice for writing a value to address zero when this is architecturally required by the hardware platform at hand. Obviously one can do various things to obscure the process sufficiently that the current gcc implementation won't detect it and complain, but as gcc gets smarter about optimization those are at risk of failing in a future release. It would be nice to have a guaranteed-to-work future-proof idiom for doing this. Do we have one, short of retreating to assembly code? -Sandra
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Wed, Feb 18, 2015 at 11:21 AM, Jeff Prothero wrote: > > Starting with gcc 4.9, -O2 implicitly invokes > > -fisolate-erroneous-paths-dereference: > > which > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > documents as > > Detect paths that trigger erroneous or undefined behavior due to > dereferencing a null pointer. Isolate those paths from the main control > flow and turn the statement with erroneous or undefined behavior into a > trap. This flag is enabled by default at -O2 and higher. > > This results in a sizable number of previously working embedded programs > mysteriously > crashing when recompiled under gcc 4.9. The problem is that embedded > programs will often have ram starting at address zero (think hardware-defined > interrupt vectors, say) which gets initialized by code which the > -fisolate-erroneous-paths-deference logic can recognize as reading and/or > writing address zero. You should have used -fno-delete-null-pointer-checks which has been doing this optimization for a long time now, just it got better with -fisolate-erroneous-paths-dereference pass. Thanks, Andrew Pinski > > What happens then is that the previously running program compiles without > any warnings, but then typically locks up mysteriously (often disabling the > remote debug link) due to the trap not being gracefully handled by the > embedded runtime. > > Granted, such code is out-of-spec wrt to C standards. > > None the less, the problem is quite painful to track down and > unexpected. > > Is there any good reason the > > -fisolate-erroneous-paths-dereference > > logic could not issue a compiletime warning or error, instead of just > silently generating code virtually certain to crash at runtime? > > Such a warning/error would save a lot of engineers significant amounts > of time, energy and frustration tracking down this problem. > > I would like to think that the spirit of gcc is about helping engineers > efficiently correct nonstandard pain, rather than inflicting maximal > pain upon engineers violating C standards. :-) > > -Jeff > > BTW, I'd also be curious to know what is regarded as engineering best > practice for writing a value to address zero when this is architecturally > required by the hardware platform at hand. Obviously one can do various > things to obscure the process sufficiently that the current gcc implementation > won't detect it and complain, but as gcc gets smarter about optimization > those are at risk of failing in a future release. It would be nice to have > a guaranteed-to-work future-proof idiom for doing this. Do we have one, short > of retreating to assembly code?
Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: > Starting with gcc 4.9, -O2 implicitly invokes > > -fisolate-erroneous-paths-dereference: > > which > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > documents as > > Detect paths that trigger erroneous or undefined behavior due to > dereferencing a null pointer. Isolate those paths from the main control > flow and turn the statement with erroneous or undefined behavior into a > trap. This flag is enabled by default at -O2 and higher. > > This results in a sizable number of previously working embedded programs > mysteriously > crashing when recompiled under gcc 4.9. The problem is that embedded > programs will often have ram starting at address zero (think hardware-defined > interrupt vectors, say) which gets initialized by code which the > -fisolate-erroneous-paths-deference logic can recognize as reading and/or > writing address zero. If you have some pages mapped at address 0, you really should compile your code with -fno-delete-null-pointer-checks, otherwise you can run into tons of other issues. Also, there is -fsanitize=undefined that allows discovery of such invalid calls at runtime, though admittedly it isn't supported for all targets. Jakub
Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
Starting with gcc 4.9, -O2 implicitly invokes -fisolate-erroneous-paths-dereference: which https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html documents as Detect paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behavior into a trap. This flag is enabled by default at -O2 and higher. This results in a sizable number of previously working embedded programs mysteriously crashing when recompiled under gcc 4.9. The problem is that embedded programs will often have ram starting at address zero (think hardware-defined interrupt vectors, say) which gets initialized by code which the -fisolate-erroneous-paths-deference logic can recognize as reading and/or writing address zero. What happens then is that the previously running program compiles without any warnings, but then typically locks up mysteriously (often disabling the remote debug link) due to the trap not being gracefully handled by the embedded runtime. Granted, such code is out-of-spec wrt to C standards. None the less, the problem is quite painful to track down and unexpected. Is there any good reason the -fisolate-erroneous-paths-dereference logic could not issue a compiletime warning or error, instead of just silently generating code virtually certain to crash at runtime? Such a warning/error would save a lot of engineers significant amounts of time, energy and frustration tracking down this problem. I would like to think that the spirit of gcc is about helping engineers efficiently correct nonstandard pain, rather than inflicting maximal pain upon engineers violating C standards. :-) -Jeff BTW, I'd also be curious to know what is regarded as engineering best practice for writing a value to address zero when this is architecturally required by the hardware platform at hand. Obviously one can do various things to obscure the process sufficiently that the current gcc implementation won't detect it and complain, but as gcc gets smarter about optimization those are at risk of failing in a future release. It would be nice to have a guaranteed-to-work future-proof idiom for doing this. Do we have one, short of retreating to assembly code?