Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-18 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 05:39:57PM -0800, Marek Polacek wrote:
> > Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
> > it will be just insanely huge, but each SAVE_EXPR appears exactly twice
> > in its containing SAVE_EXPR and the second time cp_genericize_r sees
> > the SAVE_EXPR, it will do the
> > 1138  /* Other than invisiref parms, don't walk the same tree 
> > twice.  */
> > 1139  if (p_set->contains (stmt))
> > 1140{
> > 1141  *walk_subtrees = 0;
> > 1142  return NULL_TREE;
> > 1143}
> > early exit.
>  
> Clearly I misread things here.  I blame the cold I've caught!
> 
> In any case, I'm sorry for wasting your time and I'll just close the PR.

I bet it is a compile-time hog with -fsanitize=undefined
-fdump-tree-original.  So, if anything, we'd need some hack in the SAVE_EXPR
generic printing code, say track the depth of SAVE_EXPRs being printed
concurrently and if we go above a certain level of them (say 10), change
into a different mode where we print the content of SAVE_EXPR just the
first time we encounter it + e.g. the SAVE_EXPR address and then when
encountering the same SAVE_EXPR again, just print the address.  When
the SAVE_EXPR nesting level shrinks below 10, clear the pointer set
of SAVE_EXPRs and resume normal behavior.
-fsanitize=undefined -fdump-tree-gimple is fast.

Jakub


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-17 Thread Marek Polacek
On Thu, Nov 17, 2016 at 09:10:51AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote:
> > Sorry.  So consider the following:
> > 
> > class S
> > {
> >   virtual void foo () = 0;
> > };
> > 
> > struct T {
> >   T  << (const char *s);
> > };
> > 
> > T t;
> > 
> > void
> > S::foo ()
> > {
> >   t << "a" << "b" << "c";
> > }
> > 
> > Before
> > 1498   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> > 1499 ubsan_maybe_instrument_member_call (stmt, is_ctor);
> > 
> > the stmt will be
> > 
> > T::operator<< (T::operator<< (T::operator<< (, "a"), "b"), "c")
> > 
> > after ubsan_maybe_instrument_member_call it will be
> > 
> > T::operator<< (UBSAN_NULL (SAVE_EXPR  > "a"), "b")>, 4B, 0);, SAVE_EXPR  > "b")>;, "c")
> > 
> > and that's what is saved into the hash table.  Then another stmt will be the
> > inner
> > 
> > T::operator<< (T::operator<< (, "a"), "b")
> > 
> > which we instrument and put into the hash table, and so on.  But those
> > SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
> > T::operator<< calls and we kind of recursively instrument all of them.
> 
> But those SAVE_EXPRs are the same.  If you put a breakpoint on the:
> 489 if (op)
> 490   CALL_EXPR_ARG (stmt, 0) = op;
> line 490 in c-ubsan.c, on the above short testcase is called 2 times
> (the T< not to be NULL), rather than 3 times.
> When trying larger testcase like below I count ~ 120 calls (counted by hand,
> so it is possible it was the right 119 times).
> If this has not been linear, the testcase would be compiling for years.
> Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
> it will be just insanely huge, but each SAVE_EXPR appears exactly twice
> in its containing SAVE_EXPR and the second time cp_genericize_r sees
> the SAVE_EXPR, it will do the
> 1138/* Other than invisiref parms, don't walk the same tree twice.  */
> 1139if (p_set->contains (stmt))
> 1140  {
> 1141*walk_subtrees = 0;
> 1142return NULL_TREE;
> 1143  }
> early exit.
 
Clearly I misread things here.  I blame the cold I've caught!

In any case, I'm sorry for wasting your time and I'll just close the PR.

Marek


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-17 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote:
> Sorry.  So consider the following:
> 
> class S
> {
>   virtual void foo () = 0;
> };
> 
> struct T {
>   T  << (const char *s);
> };
> 
> T t;
> 
> void
> S::foo ()
> {
>   t << "a" << "b" << "c";
> }
> 
> Before
> 1498   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> 1499 ubsan_maybe_instrument_member_call (stmt, is_ctor);
> 
> the stmt will be
> 
> T::operator<< (T::operator<< (T::operator<< (, "a"), "b"), "c")
> 
> after ubsan_maybe_instrument_member_call it will be
> 
> T::operator<< (UBSAN_NULL (SAVE_EXPR  "b")>, 4B, 0);, SAVE_EXPR ;, 
> "c")
> 
> and that's what is saved into the hash table.  Then another stmt will be the
> inner
> 
> T::operator<< (T::operator<< (, "a"), "b")
> 
> which we instrument and put into the hash table, and so on.  But those
> SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
> T::operator<< calls and we kind of recursively instrument all of them.

But those SAVE_EXPRs are the same.  If you put a breakpoint on the:
489   if (op)
490 CALL_EXPR_ARG (stmt, 0) = op;
line 490 in c-ubsan.c, on the above short testcase is called 2 times
(the T<contains (stmt))
1140{
1141  *walk_subtrees = 0;
1142  return NULL_TREE;
1143}
early exit.

class S
{
  virtual void foo () = 0;
};

struct T {
  T  << (const char *s);
};

T t;

void
S::foo ()
{
  t << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
<< "s" << "t" << "u" << "v" << "w" << "z"
<< "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
<< "s" << "t" << "u" << "v" << "w" << "z"
<< "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
<< "s" << "t" << "u" << "v" << "w" << "z"
<< "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
<< "s" << "t" << "u" << "v" << "w" << "z"
<< "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
<< "s" << "t" << "u" << "v" << "w" << "z";
}

Jakub


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-16 Thread Marek Polacek
On Fri, Nov 04, 2016 at 05:16:00PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
> > This is a similar case to PR sanitizer/70342.  Here, we were generating 
> > expression
> > in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, 
> > then 
> > UBSAN_NULL >, and then COMPOUND_EXPR of these two and so on.
> > 
> > On this testcase we were instrumention CALL_EXPR that is in fact 
> > operator<<.  I
> > think those always return a reference, so it cannot be NULL, so there's no
> > point in instrumenting those?
> 
> How do you know what is the return type of a user defined overloaded
> operator?
> Even when it returns a reference, I thought the point of -fsanitize=null was
> for all references to verify their addresses are non-null.
> 
> I must say I don't understand the issue, if it is the same SAVE_EXPR in both
> lhs of COMPOUND_EXPR and UBSAN_NULL argument, shouldn't cp_genericize_r use
> of hash table to avoid walking the same tree multiple times avoid the
> exponential compile time/memory?

Sorry.  So consider the following:

class S
{
  virtual void foo () = 0;
};

struct T {
  T  << (const char *s);
};

T t;

void
S::foo ()
{
  t << "a" << "b" << "c";
}

Before
1498   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
1499 ubsan_maybe_instrument_member_call (stmt, is_ctor);

the stmt will be

T::operator<< (T::operator<< (T::operator<< (, "a"), "b"), "c")

after ubsan_maybe_instrument_member_call it will be

T::operator<< (UBSAN_NULL (SAVE_EXPR , 4B, 0);, SAVE_EXPR ;, "c")

and that's what is saved into the hash table.  Then another stmt will be the
inner

T::operator<< (T::operator<< (, "a"), "b")

which we instrument and put into the hash table, and so on.  But those
SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
T::operator<< calls and we kind of recursively instrument all of them.

Not sure if I made this any clearer nor if this can be avoided... :(

Marek


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-04 Thread Jason Merrill
On Fri, Nov 4, 2016 at 12:16 PM, Jakub Jelinek  wrote:
> On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
>> This is a similar case to PR sanitizer/70342.  Here, we were generating 
>> expression
>> in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, 
>> then
>> UBSAN_NULL >, and then COMPOUND_EXPR of these two and so on.
>>
>> On this testcase we were instrumention CALL_EXPR that is in fact operator<<. 
>>  I
>> think those always return a reference, so it cannot be NULL, so there's no
>> point in instrumenting those?
>
> How do you know what is the return type of a user defined overloaded
> operator?

Indeed, there are no constraints on the return type of overloaded operator<<.

Jason


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-04 Thread Jakub Jelinek
On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
> This is a similar case to PR sanitizer/70342.  Here, we were generating 
> expression
> in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, 
> then 
> UBSAN_NULL >, and then COMPOUND_EXPR of these two and so on.
> 
> On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  
> I
> think those always return a reference, so it cannot be NULL, so there's no
> point in instrumenting those?

How do you know what is the return type of a user defined overloaded
operator?
Even when it returns a reference, I thought the point of -fsanitize=null was
for all references to verify their addresses are non-null.

I must say I don't understand the issue, if it is the same SAVE_EXPR in both
lhs of COMPOUND_EXPR and UBSAN_NULL argument, shouldn't cp_genericize_r use
of hash table to avoid walking the same tree multiple times avoid the
exponential compile time/memory?

> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-11-04  Marek Polacek  
> 
>   PR sanitizer/78208
>   * cp-gimplify.c (cp_genericize_r): Don't instrument
>   CALL_EXPR_OPERATOR_SYNTAX.
> 
>   * g++.dg/ubsan/null-8.C: New.
> 
> diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> index 9b9b511..f39e9d5 100644
> --- gcc/cp/cp-gimplify.c
> +++ gcc/cp/cp-gimplify.c
> @@ -1495,7 +1495,8 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
> *data)
>   = TREE_CODE (fn) == ADDR_EXPR
> && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL
> && DECL_CONSTRUCTOR_P (TREE_OPERAND (fn, 0));
> -   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> +   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT)
> +   && !CALL_EXPR_OPERATOR_SYNTAX (stmt))
>   ubsan_maybe_instrument_member_call (stmt, is_ctor);
> if ((flag_sanitize & SANITIZE_VPTR) && !is_ctor)
>   cp_ubsan_maybe_instrument_member_call (stmt);

Jakub


ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-04 Thread Marek Polacek
This is a similar case to PR sanitizer/70342.  Here, we were generating 
expression
in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, then 
UBSAN_NULL >, and then COMPOUND_EXPR of these two and so on.

On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  I
think those always return a reference, so it cannot be NULL, so there's no
point in instrumenting those?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-11-04  Marek Polacek  

PR sanitizer/78208
* cp-gimplify.c (cp_genericize_r): Don't instrument
CALL_EXPR_OPERATOR_SYNTAX.

* g++.dg/ubsan/null-8.C: New.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 9b9b511..f39e9d5 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -1495,7 +1495,8 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
= TREE_CODE (fn) == ADDR_EXPR
  && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL
  && DECL_CONSTRUCTOR_P (TREE_OPERAND (fn, 0));
- if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
+ if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT)
+ && !CALL_EXPR_OPERATOR_SYNTAX (stmt))
ubsan_maybe_instrument_member_call (stmt, is_ctor);
  if ((flag_sanitize & SANITIZE_VPTR) && !is_ctor)
cp_ubsan_maybe_instrument_member_call (stmt);
diff --git gcc/testsuite/g++.dg/ubsan/null-8.C 
gcc/testsuite/g++.dg/ubsan/null-8.C
index e69de29..0600b93 100644
--- gcc/testsuite/g++.dg/ubsan/null-8.C
+++ gcc/testsuite/g++.dg/ubsan/null-8.C
@@ -0,0 +1,22 @@
+// PR sanitizer/78208
+// { dg-do compile }
+// { dg-options "-fsanitize=null" }
+
+class S
+{
+  virtual void foo () = 0;
+};
+
+struct T {
+  T  << (const char *s);
+};
+
+T t;
+
+void
+S::foo ()
+{
+  t << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
+<< "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
+<< "s" << "t" << "u" << "v" << "w" << "z";
+}

Marek