[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
See
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Integers-implementation.html#Integers-implementation

The results of some bitwise operations on signed integers (C90 6.3, C99 and C11
6.5).
Bitwise operators act on the representation of the value including both the
sign and value bits, where the sign bit is considered immediately above the
highest-value value bit. Signed ‘>>’ acts on negative numbers by sign
extension.

As an extension to the C language, GCC does not use the latitude given in C99
and C11 only to treat certain aspects of signed ‘<<’ as undefined. However,
-fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are
also diagnosed where constant expressions are required.

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-23 Thread roscaeugeniu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #2 from Eugeniu Rosca  ---
Hi Andrew,

> As an extension to the C language, GCC does not use the latitude given in C99 
> and C11 only to treat certain aspects of signed ‘<<’ as undefined. However, 
> -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases.

I think my report is purely and precisely related to left-shifting of 1 into
the sign bit (i.e. a subset of all possible '<<'-related UB) and  I provide
evidence that UBSAN behaves differently between the C89/C90 and C11/C18
implementations.

I have backed up my questions with real-life motivation (Linux kernel is
compiled with -std=gnu89 while U-Boot/coreboot are compiled with -std=gnu11),
which means my questions do not come from some theoretical curiosity.

Your quote is helpful, but it still doesn't explain why (1 << 31) is defined
behavior in C89/C90, but not in C99/C11/C18, which triggers one
order-of-magnitude more UBSAN warnings in C11-compiled code compared to
C89-compiled one. Can you shed some light on this?

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-23 Thread roscaeugeniu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

Eugeniu Rosca  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #3 from Eugeniu Rosca  ---
Changing the status to UNCONFIRMED since the answer doesn't tackle the
questions posted in the report.

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-23 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from Andrew Pinski  ---
In C90, it was implemented defined behavior (while in C99 and above it is
undefined behavior).
See the thread which added undefined santizer:
https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00275.html

For the "a < 0" here, and signed left shift of a positive value shifting a 
1 into or past the sign bit, I think it should be possible to control the 
checks separately from other checks on shifts - both because those cases 
were implementation-defined in C90, only undefined in C99/C11, and because 
they are widely used in practice.

--- CUT ---

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54027 where a C90 program
is defined but C99 would be defined and there is talk about why too.


>I provide evidence that UBSAN behaves differently between the C89/C90 and 
>C11/C18 implementations.
Yes you did not read the full quote there.  The part about ubsan is with
respect to C99 undefined behavior.  AGain in C90, this is implementation
defined while C99 it is undefined.  YES there is a huge difference between the
languages which cannot be changed now as that would be requiring to change C90.

>ANSI C

ANSI C is ISO C.  ANSI is a member of ISO.  I think you meant to write ANSI C89
there :).

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-24 Thread roscaeugeniu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #5 from Eugeniu Rosca  ---
Hi Andrew,

Thanks for the much more satisfying answer.

> In C90, it was implemented defined behavior (while in C99 and above it is 
> undefined behavior).

I trust you, but why not giving a reference to the source? Based on my reading
of the C89 draft, it should be based on Chapter "A.6.3 Implementation-defined
behavior", more specifically subchapter "A.6.3.4 Integers" saying:

* The results of bitwise operations on signed integers (3.3).

> See the thread which added undefined santizer:
> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00275.html

That turned to be a useful reference. +1.

> YES there is a huge difference between the languages

It's helpful to see you putting such an emphasis on the difference between the
standards. 

> which cannot be changed now as that would be requiring to change C90.

IMHO a valid scenario would also be not touching C89/C90, but correcting C11 so
that "left-shifting into the sign bit" becomes defined or
implementation-defined. To my knowledge, such kind of step actually exists in
the form of DR463 entry in the "Defect Report Summary for C11" (see
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2148.htm#dr_463 ), which tried
"Harmonizing left-shift with C++14". The latter considers "left-shifting into
the sign bit" as defined behavior.

>> ANSI C
> ANSI C is ISO C. ANSI is a member of ISO.  I think you meant to write ANSI 
> C89 there :).

By "ANSI C" I mean C89/C90 and so do references [14-16], so it's quite
surprising we are not on the same page.

Before I consider this topic closed, I kindly ask you to provide below feedback
which I will share with U-Boot people:

- Since U-Boot is compiled using '-std=gnu11' and since UBSAN reports (1 << 31)
as undefined behavior and since C11 standard says that encountering an UB the
compiler may "ignore the situation completely with unpredictable results", what
are there real-life issues expected from shifting signed integers into (*not
past*) the sign bit? What to expect in the worst case?

- Is it possible to provide a sample code which would showcase that
"left-shifting into the sign bit" can lead to program behavior being erroneous,
unexpected or dependent on the gcc optimization level? For example, we know
[17] that "(a + 1 > a)" can lead to surprising results depending on whether the
program is compiled with "-fwrapv" or not. Are there any similar examples
involving '<<' ? This would be of extreme help for U-Boot community.

- What would be your personal choice between '-std=gnu89' (still used by Linux
kernel) and '-std=gnu11' (used in U-Boot/coreboot), given your GCC experience
gathered so far?

Thank you in advance.

Best regards,
Eugeniu.


[14] https://en.wikipedia.org/wiki/ANSI_C 
 This (C89) version of the language is often referred to as "ANSI C"
[15] https://stackoverflow.com/a/3783805/2381750
 In common parlance, the phrase "ANSI C" usually refers to C89.
[16]
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/C-Dialect-Options.html#C-Dialect-Options
 In C mode, this (-ansi) is equivalent to -std=c90.
[17] http://thiemonagel.de/2010/01/signed-integer-overflow/

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-25 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #6 from Jonathan Wakely  ---
(In reply to Eugeniu Rosca from comment #5)
> > In C90, it was implemented defined behavior (while in C99 and above it is 
> > undefined behavior).
> 
> I trust you, but why not giving a reference to the source?

He already did. Comment 1 quotes the GCC manual which references the relevant
sections of the standards.

It takes time and effort to look the references up in the old standards. If you
need more precise answers *you* should look it up.

In C90 it's 6.3.7 (and repeated in G.3.5), in C99 it's 6.5.7 (and J.2).


> - Since U-Boot is compiled using '-std=gnu11' and since UBSAN reports (1 <<
> 31) as undefined behavior and since C11 standard says that encountering an
> UB the compiler may "ignore the situation completely with unpredictable
> results",

It may do that, or it may do other things (like print a UBsan error).

> what are there real-life issues expected from shifting signed
> integers into (*not past*) the sign bit? What to expect in the worst case?

I think with GCC the worst that will happen is you get a UBsan error (which
might make the program abort, depending on the options used to compile it).

> - Is it possible to provide a sample code which would showcase that
> "left-shifting into the sign bit" can lead to program behavior being
> erroneous, unexpected or dependent on the gcc optimization level? For
> example, we know [17] that "(a + 1 > a)" can lead to surprising results
> depending on whether the program is compiled with "-fwrapv" or not. Are
> there any similar examples involving '<<' ? This would be of extreme help
> for U-Boot community.

You've already been given the link to the GCC documentation which says that GCC
doesn't treat it as undefined in C99 and C11, even though the standards (and
UBsan) say it's undefined.

There is no bug here. It's correct for UBsan to depend on the C standard
chosen, because what is and isn't undefined depends on the standard.

> - What would be your personal choice between '-std=gnu89' (still used by
> Linux kernel) and '-std=gnu11' (used in U-Boot/coreboot), given your GCC
> experience gathered so far?

There's no one-size-fits-all answer.

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-25 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #7 from joseph at codesourcery dot com  ---
The implementation-definedness of signed left shift in C90, including 
shifting into or past the sign bit (as long as the shift count isn't too 
large or negative), is stated explicitly in the response to DR#081.

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-26 Thread roscaeugeniu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #8 from Eugeniu Rosca  ---
On 2018-09-25 at 08:53:34 UTC, Jonathan Wakely wrote in comment #6:

> He already did. Comment 1 quotes the GCC manual which references
> the relevant sections of the standards.

Comment 1 does indeed reference "(C90 6.3, C99 and C11 6.5)". But if you check
my first post again, you'll see that neither C89->"3.3.7 Bitwise shift
operators" [18] nor its C90 [19] counterpart C90->"6.3.7 Bitwise shift
operators" says a word about signed integers in the definition of the '<<'
operator. This is in contrast with later C standards which supplement the
definition of "E1 << E2" by "If E1 has a signed type ...". It is simply not
possible to conclude from C89->3.3.7 or from C90->6.3.7 that "left-shifting of
signed integer into the sign bit" is implementation-defined behavior. The
C89/C90 sections conveying this message are:
 - C89: "A.6.3 Implementation-defined behavior" -> "A.6.3.4 Integers"
 - C90: "G.3 Implementation-defined behavior" -> "G.3.5 Integers"

The latter is referenced by you in Comment 6, but not in Comment 1.

> > - Since U-Boot is compiled using '-std=gnu11' and since UBSAN reports
> > (1 << 31) as undefined behavior and since C11 standard says that
> > encountering an UB the compiler may "ignore the situation completely
> > with unpredictable results",

> It may do that, or it may do other things (like print a UBsan error).

I wish there was an "and" relationship between the two, i.e. the error to be
dependent on the UB, with no false positives.

> > what are there real-life issues expected from shifting signed
> > integers into (*not past*) the sign bit? What to expect in the worst case?

> I think with GCC the worst that will happen is you get a UBsan error (which
> might make the program abort, depending on the options used to compile it).

In U-Boot/Linux kernel execution goes on, so the latter is not of concern.
I am quite surprised that, in spite of no harm expected from shifting a signed
integer into the sign bit, GCC (std=c99, std=c11) still decides to fire a
warning. It seems like the idea behind UBSAN is to inform the user about what C
standard says on paper rather than how it is implemented. This is quite
unfortunate.

> You've already been given the link to the GCC documentation which says
> that GCC doesn't treat it as undefined in C99 and C11, even though the
> standards (and UBsan) say it's undefined.

Ok, let's come back to it. Here is what you refer to:

-8<-
As an extension to the C language, GCC does not use the latitude given in C99
and C11 only to treat certain aspects of signed ‘<<’ as undefined. However,
-fsanitize=shift (and -fsanitize=undefined) will diagnose such cases.
-8<-

I am not sure if you or anybody else cares, but this sounds pretty similar to
"Some children study too hard to succeed" [20]. I can understand the above
excerpt in two ways:

1. As an extension to the C language, GCC does not use the latitude given in
C99 and C11 in order to treat only certain aspects of signed ‘<<’ as undefined.
[..]

2. As an extension to the C language, GCC does not use the latitude given in
C99 and C11 of only treating certain aspects of signed ‘<<’ as undefined. [..]

Regardless of the variant I pick, without GCC community support I'm still
clueless about:
- what kind of latitude is provided by C99/C11 standards.
- which gcc aspects of signed ‘<<’ diverge from the C99/C11 standards and which
 do not.

> There is no bug here. It's correct for UBsan to depend on the C standard 
> chosen,
> because what is and isn't undefined depends on the standard.

As expressed above, my expectation was that UBSAN reflects how compiler
implements the standard rather than what standard itself says. I am still
surprised that this expectation is broken, but I will just accept it and keep
going. It's also unfortunate that, as a user, I can't disable the UBSAN warning
of shifting into (not past) the sign bit. Enabling either
-fsanitize=shift-exponent or -fsanitize=shift-base alone would disable other
precious UBSAN warnings as a side effect, which is not acceptable.

All in all, the summary which I take from this discussion is "don't be
concerned about left shifting into the sign bit, even if UBSAN annoys you with
a warning".

If there is no way GCC folks can support communities like U-Boot/coreboot to at
least individually turn off benign UBSAN warnings, I have no choice but to
finally agree with the status of this bug report.

Thanks and regards,
Eugeniu.

[18] http://port70.net/~nsz/c/c89/c89-draft.html
[19]
http://read.pudn.com/downloads133/doc/565041/ANSI_ISO%2B9899-1990%2B[1].pdf
[20]
https://guinlist.wordpress.com/2016/02/08/124-structures-with-a-double-meaning/

[Bug c/87392] UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

2018-09-26 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392

--- Comment #9 from Andrew Pinski  ---
(In reply to Eugeniu Rosca from comment #8)
> On 2018-09-25 at 08:53:34 UTC, Jonathan Wakely wrote in comment #6:
> 
> > He already did. Comment 1 quotes the GCC manual which references
> > the relevant sections of the standards.
> 
> Comment 1 does indeed reference "(C90 6.3, C99 and C11 6.5)". But if you
> check my first post again, you'll see that neither C89->"3.3.7 Bitwise shift
> operators" [18] nor its C90 [19] counterpart C90->"6.3.7 Bitwise shift
> operators" says a word about signed integers in the definition of the '<<'
> operator. This is in contrast with later C standards which supplement the
> definition of "E1 << E2" by "If E1 has a signed type ...". It is simply not
> possible to conclude from C89->3.3.7 or from C90->6.3.7 that "left-shifting
> of signed integer into the sign bit" is implementation-defined behavior. The
> C89/C90 sections conveying this message are:
>  - C89: "A.6.3 Implementation-defined behavior" -> "A.6.3.4 Integers"
>  - C90: "G.3 Implementation-defined behavior" -> "G.3.5 Integers"

But then there is a defect report against C90 which takes exclitly about this:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_081.html

So there is still no bug.  The C90 and C99 standards differ here and that is
plain and simple with that defect report showing their intent for C90 and all.