Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jonathan Wakely
On 25 April 2014 11:22, Andrew Haley wrote:
> Summary: Devirtualization uses type information to determine if a
> virtual method is reachable from a call site.  If type information
> indicates that it is not, devirt marks the site as unreachable.  I
> think this is wrong, and it breaks some programs.  At least, it should
> not do this if the user specifies -fno-strict-aliasing.
>
> Consider this class:
>
> class Container {
>   void *buffer[5];
> public:
>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>   Container() { new (buffer) EmbeddedObject(); }
> };
>
> Placement new is used to embed an object in a buffer inside another
> object.  Its address can be retrieved.  This usage of placement new is
> common, and it even appears as the canonical use of placement new in
> the in the C++ FAQ at
> http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
> this is not strictly legal.  For one thing, the memory at buffer may
> not be suitably aligned.  Please bear with me.)

I think the program is strictly legal if you define Container like this:

class Container {
  alignas(EmbeddedObject) char buffer[sizeof(EmbeddedObject)];
public:
  EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
  Container() { new (buffer) EmbeddedObject(); }
};

But GCC still does the same transformation and prints nothing, which I
agree is wrong (even with -fstrict-aliasing).

With ubsan that program gives:

sa.cc:9:24: runtime error: load of null pointer of type ' *'
Segmentation fault (core dumped)


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Martin Jambor
Hi,


On Fri, Apr 25, 2014 at 11:22:22AM +0100, Andrew Haley wrote:
> Summary: Devirtualization uses type information to determine if a
> virtual method is reachable from a call site.  If type information
> indicates that it is not, devirt marks the site as unreachable.  I
> think this is wrong, and it breaks some programs.  At least, it should
> not do this if the user specifies -fno-strict-aliasing.
> 
> Consider this class:
> 
> class Container {
>   void *buffer[5];
> public:
>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>   Container() { new (buffer) EmbeddedObject(); }
> };
> 
> Placement new is used to embed an object in a buffer inside another
> object.  Its address can be retrieved.  This usage of placement new is
> common, and it even appears as the canonical use of placement new in
> the in the C++ FAQ at
> http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
> this is not strictly legal.  For one thing, the memory at buffer may
> not be suitably aligned.  Please bear with me.)
> 
> The embedded object is an instance of:
> 
> class EmbeddedObject {
> public:
>   virtual int val() { return 2; }
> };
> 
> And it is called like this:
> 
> extern Container o;
> int main() {
> 
>   cout << o.obj()->val() << endl;
> }
> 
> The devirtualization pass looks into the call to val() and the type of
> o, decides that there is no type inside o that is compatible with
> EmbeddedObject, and inserts a call to __builtin_unreachanble().  As a
> result, instead of printing 2, the program does nothing.
> 
> I'm not sure what this transformation is supposed to achieve.  It does
> nothing for strictly compliant code

It can be and often is useful.  Very often such calls are in a branch
that is guarded by a type check and performs a down-cast.
Devirtualization can figure out the branch is unreachable when it
knows that the call in it would be illegal.  (And yes, placement new
has been considered, see PR 45734, especially comment #4.)

In this case, however, I believe that ipa-devirt should not mark its
result as final if the offset with outer_type leads to a
non-artificial field, certainly not if such a field is a POD,
regardless of any compiler option.  I believe you should open a PR for
this.

Martin


> and it silently breaks non-
> compliant code like this.  GCC users would be better off if it were
> not done.  At least an error message could be printed instead of silently
> failing to generate code.
> 
> IMO, this transformation should not be performed when
> -fno-strict-aliasing is used.
> 
> `-fstrict-aliasing'
>  Allow the compiler to assume the strictest aliasing rules
>  applicable to the language being compiled.  For C (and C++), this
>  activates optimizations based on the type of expressions.
> 
> I have appended a suggested patch to this message.
> 
> Andrew.
> 
> 
> Index: gcc/ipa-devirt.c
> ===
> --- gcc/ipa-devirt.c  (revision 209656)
> +++ gcc/ipa-devirt.c  (working copy)
> @@ -1362,8 +1362,9 @@
> 
> /* Only type inconsistent programs can have otr_type that is
>not part of outer type.  */
> -   if (!contains_type_p (TREE_TYPE (base),
> - context->offset + offset2, *otr_type))
> +   if (flag_strict_aliasing
> +   && !contains_type_p (TREE_TYPE (base),
> +context->offset + offset2, 
> *otr_type))
>   {
> /* Use OTR_TOKEN = INT_MAX as a marker of probably type 
> inconsistent
>code sequences; we arrange the calls to be 
> builtin_unreachable
> @@ -1441,7 +1442,8 @@
> gcc_assert (!POINTER_TYPE_P (context->outer_type));
> /* Only type inconsistent programs can have otr_type that is
>not part of outer type.  */
> -   if (!contains_type_p (context->outer_type, context->offset,
> +   if (flag_strict_aliasing
> +   && !contains_type_p (context->outer_type, context->offset,
>   *otr_type))
>   {
> /* Use OTR_TOKEN = INT_MAX as a marker of probably type 
> inconsistent
> 




Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Richard Biener
On Fri, Apr 25, 2014 at 1:18 PM, Jonathan Wakely  wrote:
> On 25 April 2014 11:22, Andrew Haley wrote:
>> Summary: Devirtualization uses type information to determine if a
>> virtual method is reachable from a call site.  If type information
>> indicates that it is not, devirt marks the site as unreachable.  I
>> think this is wrong, and it breaks some programs.  At least, it should
>> not do this if the user specifies -fno-strict-aliasing.
>>
>> Consider this class:
>>
>> class Container {
>>   void *buffer[5];
>> public:
>>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>>   Container() { new (buffer) EmbeddedObject(); }
>> };
>>
>> Placement new is used to embed an object in a buffer inside another
>> object.  Its address can be retrieved.  This usage of placement new is
>> common, and it even appears as the canonical use of placement new in
>> the in the C++ FAQ at
>> http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
>> this is not strictly legal.  For one thing, the memory at buffer may
>> not be suitably aligned.  Please bear with me.)
>
> I think the program is strictly legal if you define Container like this:
>
> class Container {
>   alignas(EmbeddedObject) char buffer[sizeof(EmbeddedObject)];
> public:
>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>   Container() { new (buffer) EmbeddedObject(); }
> };
>
> But GCC still does the same transformation and prints nothing, which I
> agree is wrong (even with -fstrict-aliasing).

I agree, -fstrict-aliasing has nothing to do with this.  Sounds simply like
a genuine bug (please open a bugzilla).

Thanks,
Richard.

> With ubsan that program gives:
>
> sa.cc:9:24: runtime error: load of null pointer of type ' *'
> Segmentation fault (core dumped)


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jonathan Wakely
On 25 April 2014 13:01, Richard Biener wrote:
> On Fri, Apr 25, 2014 at 1:18 PM, Jonathan Wakely  
> wrote:
>> On 25 April 2014 11:22, Andrew Haley wrote:
>>> Summary: Devirtualization uses type information to determine if a
>>> virtual method is reachable from a call site.  If type information
>>> indicates that it is not, devirt marks the site as unreachable.  I
>>> think this is wrong, and it breaks some programs.  At least, it should
>>> not do this if the user specifies -fno-strict-aliasing.
>>>
>>> Consider this class:
>>>
>>> class Container {
>>>   void *buffer[5];
>>> public:
>>>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>>>   Container() { new (buffer) EmbeddedObject(); }
>>> };
>>>
>>> Placement new is used to embed an object in a buffer inside another
>>> object.  Its address can be retrieved.  This usage of placement new is
>>> common, and it even appears as the canonical use of placement new in
>>> the in the C++ FAQ at
>>> http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
>>> this is not strictly legal.  For one thing, the memory at buffer may
>>> not be suitably aligned.  Please bear with me.)
>>
>> I think the program is strictly legal if you define Container like this:
>>
>> class Container {
>>   alignas(EmbeddedObject) char buffer[sizeof(EmbeddedObject)];
>> public:
>>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>>   Container() { new (buffer) EmbeddedObject(); }
>> };
>>
>> But GCC still does the same transformation and prints nothing, which I
>> agree is wrong (even with -fstrict-aliasing).
>
> I agree, -fstrict-aliasing has nothing to do with this.  Sounds simply like
> a genuine bug (please open a bugzilla).
>
> Thanks,
> Richard.
>
>> With ubsan that program gives:
>>
>> sa.cc:9:24: runtime error: load of null pointer of type ' *'
>> Segmentation fault (core dumped)

I created http://gcc.gnu.org/60963 for the ubsan error. The testcase
there is a single file which crashes with ubsan and returns the wrong
value without.  Changing the return statement to
__builtin_printf("%d\n", o.obj()->val()) prints nothing, like Andrew's
original example.

If it's not really a ubsan problem feel free to re-categorise that as
a devirt bug rather than open a new one, whatever's most appropriate.


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Andrew Haley
On 04/25/2014 01:01 PM, Richard Biener wrote:
> I agree, -fstrict-aliasing has nothing to do with this.  Sounds simply like
> a genuine bug (please open a bugzilla).

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965

Andrew.



Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Volker Simonis
On Fri, Apr 25, 2014 at 2:05 PM, Jonathan Wakely  wrote:
>On 25 April 2014 13:01, Richard Biener wrote:
>> On Fri, Apr 25, 2014 at 1:18 PM, Jonathan Wakely  
>> wrote:
>>> On 25 April 2014 11:22, Andrew Haley wrote:
 Summary: Devirtualization uses type information to determine if a
 virtual method is reachable from a call site.  If type information
 indicates that it is not, devirt marks the site as unreachable.  I
 think this is wrong, and it breaks some programs.  At least, it should
 not do this if the user specifies -fno-strict-aliasing.

 Consider this class:

 class Container {
   void *buffer[5];
 public:
   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
   Container() { new (buffer) EmbeddedObject(); }
 };

 Placement new is used to embed an object in a buffer inside another
 object.  Its address can be retrieved.  This usage of placement new is
 common, and it even appears as the canonical use of placement new in
 the in the C++ FAQ at
 http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
 this is not strictly legal.  For one thing, the memory at buffer may
 not be suitably aligned.  Please bear with me.)
>>>
>>> I think the program is strictly legal if you define Container like this:
>>>
>>> class Container {
>>>   alignas(EmbeddedObject) char buffer[sizeof(EmbeddedObject)];
>>> public:
>>>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>>>   Container() { new (buffer) EmbeddedObject(); }
>>> };
>>>
>>> But GCC still does the same transformation and prints nothing, which I
>>> agree is wrong (even with -fstrict-aliasing).
>>
>> I agree, -fstrict-aliasing has nothing to do with this.  Sounds simply like
>> a genuine bug (please open a bugzilla).
>>
>> Thanks,
>> Richard.
>>
>>> With ubsan that program gives:
>>>
>>> sa.cc:9:24: runtime error: load of null pointer of type ' *'
>>> Segmentation fault (core dumped)
>
>I created http://gcc.gnu.org/60963 for the ubsan error. The testcase
>there is a single file which crashes with ubsan and returns the wrong
>value without.  Changing the return statement to
>__builtin_printf("%d\n", o.obj()->val()) prints nothing, like Andrew's
>original example.
>
>If it's not really a ubsan problem feel free to re-categorise that as
>a devirt bug rather than open a new one, whatever's most appropriate.
>

Hi Jonathan,

I don't think this is a ubsan bug. I rather think it is an devirt bug.
If you look at the generated code (taking Andrews initial example) you
can see the following:

00400720 :
main():
/tmp/embed-test/embed.cpp:7
  400720:   48 83 ec 08 sub$0x8,%rsp
/tmp/embed-test/embed.cpp:9
  400724:   48 83 3d b4 07 20 00cmpq   $0x0,0x2007b4(%rip)
   # 600ee0 
  40072b:   00
  40072c:   74 00   je 40072e 
  40072e:   31 f6   xor%esi,%esi
  400730:   bf 80 0d 60 00  mov$0x600d80,%edi
  400735:   e8 d6 ff ff ff  callq  400710
<__ubsan_handle_type_mismatch@plt>
  40073a:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)

So because of the devirt bug, the main method unconditionally falls
into the ubsan handler and misleadingly prints the warning. Afterwards
it crashes, but again not because of an ubsan error, but because it
wrongly entered the ubsan handler et all because of the
devirtualization bug.

Notice that this bug was first detected while compiling the OpenJDK
with gcc 4.9.0 (see mail threads at
http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-April/thread.html#13577
and http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-April/013614.html).
We as well compiled OpenJDK with -fsanitze=undefined and got some
warnings. But after a closer look it turned out these were false
positives as well, for exactly the same reason. As you can read in
this mail 
(http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-April/013602.html)
the OpenJDK code executed several ubsan handlers not because the
corresponding NULL-checks failed, but because the code for the
containing method had nor return at the end and just fell over into
the ubsan handlers.

Could you therefore please re-categorize this as devirt bug.

Thank you and best regards,
Volker


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Andrew Haley
On 04/25/2014 03:14 PM, Volker Simonis wrote:
> Could you therefore please re-categorize this as devirt bug.

It is an IPA bug.  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965

Andrew.



Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jan Hubicka
> Summary: Devirtualization uses type information to determine if a
> virtual method is reachable from a call site.  If type information
> indicates that it is not, devirt marks the site as unreachable.  I
> think this is wrong, and it breaks some programs.  At least, it should
> not do this if the user specifies -fno-strict-aliasing.
> 
> Consider this class:
> 
> class Container {
>   void *buffer[5];
> public:
>   EmbeddedObject *obj() { return (EmbeddedObject*)buffer; }
>   Container() { new (buffer) EmbeddedObject(); }
> };
> 
> Placement new is used to embed an object in a buffer inside another
> object.  Its address can be retrieved.  This usage of placement new is
> common, and it even appears as the canonical use of placement new in
> the in the C++ FAQ at
> http://www.parashift.com/c++-faq/placement-new.html.  (I am aware that
> this is not strictly legal.  For one thing, the memory at buffer may
> not be suitably aligned.  Please bear with me.)
> 
> The embedded object is an instance of:
> 
> class EmbeddedObject {
> public:
>   virtual int val() { return 2; }
> };
> 
> And it is called like this:
> 
> extern Container o;
> int main() {
> 
>   cout << o.obj()->val() << endl;
> }
> 
> The devirtualization pass looks into the call to val() and the type of
> o, decides that there is no type inside o that is compatible with
> EmbeddedObject, and inserts a call to __builtin_unreachanble().  As a
> result, instead of printing 2, the program does nothing.

I see, the rule I am trying to apply here is that by Jason you can not
use placement new to turn one non-POD to another non-POD.
Here you turn char buffer that happens to be at same address as non-POD.
I actually intedned to implement it as non-polymorphic type to  non-polymorphic
type, will make fix in this sense (i.e. not mark context invalid if the outer
type is non-polymorphic). I think that should also fix your container example
and if container had virtual method in it, we will be all fine since buffer
will not be 0th item. Does that sound OK?
> 
> I'm not sure what this transformation is supposed to achieve.  It does
> nothing for strictly compliant code and it silently breaks non-
> compliant code like this.  GCC users would be better off if it were
> not done.  At least an error message could be printed instead of silently
> failing to generate code.

The pattern is relatively common in a valid code when you do 
upcasting/downcasting.
Imagine we had gimple converted to class hiearchy with virtual methods in it.
We had gimple_label type for labels that got converted to gimple and called to
generic function that does type checking (invisible to the pass) and casts to
gimple_assign (only on path where it is indeed assignment). Then you call
lhs method that is not defined for gimple_label.
We want to detect these cases and not work hard to inline them. It is 
surprisingly
common in C++ code, i.e. over 5% on firefox.
> 
> IMO, this transformation should not be performed when
> -fno-strict-aliasing is used.
> 
> `-fstrict-aliasing'
>  Allow the compiler to assume the strictest aliasing rules
>  applicable to the language being compiled.  For C (and C++), this
>  activates optimizations based on the type of expressions.
> 
> I have appended a suggested patch to this message.

I agree with Richard that we want to keep this independent of -fstrict-aliasing.

Honza
> 
> Andrew.
> 
> 
> Index: gcc/ipa-devirt.c
> ===
> --- gcc/ipa-devirt.c  (revision 209656)
> +++ gcc/ipa-devirt.c  (working copy)
> @@ -1362,8 +1362,9 @@
> 
> /* Only type inconsistent programs can have otr_type that is
>not part of outer type.  */
> -   if (!contains_type_p (TREE_TYPE (base),
> - context->offset + offset2, *otr_type))
> +   if (flag_strict_aliasing
> +   && !contains_type_p (TREE_TYPE (base),
> +context->offset + offset2, 
> *otr_type))
>   {
> /* Use OTR_TOKEN = INT_MAX as a marker of probably type 
> inconsistent
>code sequences; we arrange the calls to be 
> builtin_unreachable
> @@ -1441,7 +1442,8 @@
> gcc_assert (!POINTER_TYPE_P (context->outer_type));
> /* Only type inconsistent programs can have otr_type that is
>not part of outer type.  */
> -   if (!contains_type_p (context->outer_type, context->offset,
> +   if (flag_strict_aliasing
> +   && !contains_type_p (context->outer_type, context->offset,
>   *otr_type))
>   {
> /* Use OTR_TOKEN = INT_MAX as a marker of probably type 
> inconsistent
> 




Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jan Hubicka
> On 04/25/2014 03:14 PM, Volker Simonis wrote:
> > Could you therefore please re-categorize this as devirt bug.
> 
> It is an IPA bug.  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965

Now when I have interest from ubsan direction, I wanted to ask. Would it make 
sense to turn
those unreachables into traps with ubsan enabled? (similarly in the loop stuff)

Honza
> 
> Andrew.


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jakub Jelinek
On Fri, Apr 25, 2014 at 08:23:22PM +0200, Jan Hubicka wrote:
> > On 04/25/2014 03:14 PM, Volker Simonis wrote:
> > > Could you therefore please re-categorize this as devirt bug.
> > 
> > It is an IPA bug.  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965
> 
> Now when I have interest from ubsan direction, I wanted to ask. Would it make 
> sense to turn
> those unreachables into traps with ubsan enabled? (similarly in the loop 
> stuff)

With -fsanitize=undefined __builtin_unreachable is folded right away into a
library call that will emit a message and then die.

Jakub


Re: IPA: Devirtualization versus placement new

2014-04-25 Thread Jan Hubicka
> On Fri, Apr 25, 2014 at 08:23:22PM +0200, Jan Hubicka wrote:
> > > On 04/25/2014 03:14 PM, Volker Simonis wrote:
> > > > Could you therefore please re-categorize this as devirt bug.
> > > 
> > > It is an IPA bug.  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965
> > 
> > Now when I have interest from ubsan direction, I wanted to ask. Would it 
> > make sense to turn
> > those unreachables into traps with ubsan enabled? (similarly in the loop 
> > stuff)
> 
> With -fsanitize=undefined __builtin_unreachable is folded right away into a
> library call that will emit a message and then die.

I see, that sounds good.  Who will ensure that all calls to builtin_unreachable 
we introduce
late will get folded?
The testcase dies with -fsanitize=undefined -fdump-ipa-all, I will look into it.
W/o it I get:
.$ ./a.out
t.C:19:41: runtime error: load of null pointer of type ' *'

I would certainly not really parse this message. It goes away with 
-fno-devirtualize

Honza


Re: IPA: Devirtualization versus placement new

2014-04-26 Thread Andrew Haley
On 04/25/2014 07:48 PM, Jakub Jelinek wrote:
> On Fri, Apr 25, 2014 at 08:23:22PM +0200, Jan Hubicka wrote:
>>> On 04/25/2014 03:14 PM, Volker Simonis wrote:
 Could you therefore please re-categorize this as devirt bug.
>>>
>>> It is an IPA bug.  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60965
>>
>> Now when I have interest from ubsan direction, I wanted to ask. Would it 
>> make sense to turn
>> those unreachables into traps with ubsan enabled? (similarly in the loop 
>> stuff)
> 
> With -fsanitize=undefined __builtin_unreachable is folded right away into a
> library call that will emit a message and then die.

In theory, yes.  With this one, though, gcc 4.9-compiled binary just
crashed with a segv.

Andrew.