Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-20 Thread Jeff Law
On 11/19/2017 03:26 PM, Martin Sebor wrote:
>>> I have seen it in the early IL but this late I couldn't come
>>> up with a test case where such a loop would be necessary.
>>> The COMPONENT_REF always refers to the member array.  If you
>>> can show me an example to test with I'll add the handling if
>>> possible.  There are cases where it isn't, such as in
>>>
>>>     struct A { char a[4] __attribute__ ((nonstring)); } *pa;
>>>
>>>     strlen (pa->a + 1);
>>>
>>> where pa->a is represented as a MEM_REF (char[4], pa, 1) with
>>> the member information having been lost in translation.  (This
>>> is the same problem that I had solved for the -Warray-bounds
>>> warning for out-of bounds offsets by implementing it in
>>> tree-ssa-forwprop.c: because that's where the member is folded
>>> into a reference to the base object.)
>> Set up a nested structure then reference it like a.b.c.d.
> 
> I did that.  From my reply to Jakub:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01671.html
> 
> with
> 
>   struct A { char a[4], b[4] __attribute__ ((nonstring)); };
>   struct B { struct A a; };
>   struct C { struct B b; };
> 
>   void f (struct C *p)
>   {
>     __builtin_strcpy (p->b.a.a, p->b.a.b);
>   }
> 
> in get_attr_nonstring_decl() where EXPR is p->a.b.b it is
> a COMPONENT_REF (COMPONENT_REF (..,), FIELD_DECL (b)).  I.e.,
> the outermost FIELD_DECL is the one of interest.
> 
> The code extracts TREE_OPERAND (decl, 1) from this outermost
> COMPONENT_REF, which is FIELD_DECL (b).
> 
> What test case gives a structure where it's necessary to loop
> over the components to get at the field?  I can't think of one.
Bah.  It's so damn easy to forget that the outermost COMPONENT_REF
refers to the last component in the expression.

Can you incorporate the lazy initialization from Jakub and commit?

Jeff


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-19 Thread Martin Sebor

I have seen it in the early IL but this late I couldn't come
up with a test case where such a loop would be necessary.
The COMPONENT_REF always refers to the member array.  If you
can show me an example to test with I'll add the handling if
possible.  There are cases where it isn't, such as in

struct A { char a[4] __attribute__ ((nonstring)); } *pa;

strlen (pa->a + 1);

where pa->a is represented as a MEM_REF (char[4], pa, 1) with
the member information having been lost in translation.  (This
is the same problem that I had solved for the -Warray-bounds
warning for out-of bounds offsets by implementing it in
tree-ssa-forwprop.c: because that's where the member is folded
into a reference to the base object.)

Set up a nested structure then reference it like a.b.c.d.


I did that.  From my reply to Jakub:

  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01671.html

with

  struct A { char a[4], b[4] __attribute__ ((nonstring)); };
  struct B { struct A a; };
  struct C { struct B b; };

  void f (struct C *p)
  {
__builtin_strcpy (p->b.a.a, p->b.a.b);
  }

in get_attr_nonstring_decl() where EXPR is p->a.b.b it is
a COMPONENT_REF (COMPONENT_REF (..,), FIELD_DECL (b)).  I.e.,
the outermost FIELD_DECL is the one of interest.

The code extracts TREE_OPERAND (decl, 1) from this outermost
COMPONENT_REF, which is FIELD_DECL (b).

What test case gives a structure where it's necessary to loop
over the components to get at the field?  I can't think of one.

Martin


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-19 Thread Jeff Law
On 11/19/2017 10:06 AM, Martin Sebor wrote:

>>> +/* If EXPR refers to a character array or pointer declared attribute
>>> +   nonstring return a decl for that array or pointer and set *REF to
>>> +   the referenced enclosing object or pointer.  Otherwise returns
>>> +   null.  */
>> Generally we'd write NULL rather than null for a generic pointer and in
>> this specific case NULL_TREE is even better.
> 
> I agree.  The C NULL macro, C++ nullptr, and GCC's NULL_TREE and
> NULL_RTL are helpful both for type safety and readability of code.
> But they all are implementation details that's ot not relevant to the
> specification
> of an API or its users.
> 
> The established term that every C and C++ programmer is familiar
> with is "null pointer."
> 
>  and exist
> mainly for type safety.  They are not relevant to the specification
> of an API or its users.  I didn't think too hard about it when
> I wrote the comment (it just didn't seem important enough) but
> now
> that I have I believe "null" is better, clearer, and more future-
> proof (in case the return type should ever change to some smart
> pointer class).
> 
> (There also comments in this file and elsewhere in GCC that use
> various forms of null.)
Perhaps so, but please capitalize it -- while there may be "null"
references, I believe in all-caps is by far more common within comments
within the GCC sources.

WRT NULL vs NULL_TREE, I'd tend to disagree.  The function isn't going
to return NULL, it's going to return NULL_TREE.  That's actually already
exposed in the API/ABI by the return value's type.  The fact that NULL
and NULL_TREE are interchangeable is an implementation detail and we
shouldn't encourage folks to rely on that by way of the function comment.





> 
>>> +  if (TREE_CODE (dcl) == SSA_NAME)
>>> +    {
>>> +  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
>>> +    dcl = SSA_NAME_VAR (dcl);
>>> +  else
>>> +    {
>>> +  gimple *def = SSA_NAME_DEF_STMT (dcl);
>>> +
>>> +  if (is_gimple_assign (def))
>>> +    {
>>> +  tree_code code = gimple_assign_rhs_code (def);
>>> +  if (code == ADDR_EXPR
>>> +  || code == COMPONENT_REF
>>> +  || code == VAR_DECL)
>>> +    dcl = gimple_assign_rhs1 (def);
>> Note that COMPONENT_REFs can have arguments that are themselves
>> COMPONENT_REFS.  So you typically want to use a loop to strip all away.
> 
> I have seen it in the early IL but this late I couldn't come
> up with a test case where such a loop would be necessary.
> The COMPONENT_REF always refers to the member array.  If you
> can show me an example to test with I'll add the handling if
> possible.  There are cases where it isn't, such as in
> 
>     struct A { char a[4] __attribute__ ((nonstring)); } *pa;
> 
>     strlen (pa->a + 1);
> 
> where pa->a is represented as a MEM_REF (char[4], pa, 1) with
> the member information having been lost in translation.  (This
> is the same problem that I had solved for the -Warray-bounds
> warning for out-of bounds offsets by implementing it in
> tree-ssa-forwprop.c: because that's where the member is folded
> into a reference to the base object.)
Set up a nested structure then reference it like a.b.c.d.


> 
>>> +  /* It's safe to call strndup and strnlen with a non-string argument
>>> + since the functions provide an explicit bound for this
>>> purpose.  */
>>> +  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_STRNDUP)
>>> +    return;
>> So I see you handle BUILT_IN_STRNDUP here, but not BUILT_IN_STRNLEN.  Is
>> there a reason for the omission?
> 
> strnlen is not a GCC built-in so it's handled implicitly by
> the test for DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL
> just before the if statement above.  The test I added verifies
> that strnlen is excluded.  (Bug 81384 tracks the missing
> intrinsic support for strnlen.)
> 
> That said, more testing revealed that Glibc uses functions like
> strncmp and strncpy to compare and copy non-strings, so I've
> added those to the white list and improved the checking a bit.
> I've also changed the hash_map to a pointer to make Jakub happy.
OK.  Thanks.


> 
> 



Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-19 Thread Martin Sebor

On 11/17/2017 12:47 AM, Jakub Jelinek wrote:

On Thu, Nov 16, 2017 at 05:15:20PM -0700, Jeff Law wrote:

+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == ADDR_EXPR
+ || code == COMPONENT_REF
+ || code == VAR_DECL)
+   dcl = gimple_assign_rhs1 (def);

Note that COMPONENT_REFs can have arguments that are themselves
COMPONENT_REFS.  So you typically want to use a loop to strip all away.


Do you want to handle just COMPONENT_REF, or other handled components
(ARRAY_REFs, etc.)?
We have handled_component_p predicate and get_base_address to strip them
(or get_inner_reference if you want other details about it).


Thanks.  I'm aware of the function and I considered using it here
in a loop as Jeff suggested and as is done elsewhere but couldn't
come up with a test case to exercise it (I tried a few variations
on the one below).  I don't know if that's because of my lack of
imagination or because these more complex structures end up
simplified by the time they reach the expander.  If you can show
me one where the loop is needed I'll adjust the code.

  struct A { char a[4], b[4] __attribute__ ((nonstring)); };
  struct B { struct A a; };
  struct C { struct B b; };

  void f (struct C *p)
  {
__builtin_strcpy (p->b.a.a, p->b.a.b);
  }

Thanks
Martin


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-19 Thread Martin Sebor

On 11/16/2017 05:15 PM, Jeff Law wrote:

On 11/13/2017 06:21 PM, Martin Sebor wrote:

Attached is an improved version that rather than hardcoding
the built-in functions to check uses the constness of each
built-in's char* argument as a trigger to do the checking.

This avoids the problem of the list being incomplete either
by omission or due to getting out of sync, and also makes it
trivial to extend the same approach to user-defined functions
in the future.

I've excluded strndup and strnlen (but no other "bounded"
string functions like strncmp) from the checking.

Martin

On 11/12/2017 05:52 PM, Martin Sebor wrote:

The recently introduced -Wstringop-truncation warning relies
on the new nonstring attribute to allow the historical use case
of calling strncpy to completely fill the destination with a copy
of a string without adding a terminating nul.  Glibc is currently
considering making use of the attribute to decorate some of its
data members.  To help find misuses of such data members in
arguments to string functions like strlen or strdup, the attached
patch adds checking for this new attribute in these contexts.
The checking is intentionally done late so  that uses such arrays
that can be proven to be safe (and thus folded) are not diagnosed.

While testing this simple enhancement I noticed that the handling
I added for the nul termination in cases like

  strncpy (array, s, sizeof array);
  array[sizeof array - 1] = 0;

to avoid the new warning wasn't quite as robust as it could and
arguably should be so I improved it a bit to silently accept
more forms of the idiom.  For instance, this is now correctly
handled (and not diagnosed):

  *stpcpy (array, s, sizeof array - 1) = 0;

Martin

PS A useful future enhancement would be to detect the one byte
overflow in:

  *stpcpy (array, s, sizeof array) = 0;



gcc-82945.diff


PR tree-optimization/82945 - add warning for passing non-strings to functions 
that expect string arguments

gcc/ChangeLog:

PR tree-optimization/82945
* calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
functions.
(initialize_argument_information): Call maybe_warn_nonstring_arg.
* calls.h (get_attr_nonstring_decl): Declare new function.
* doc/extend.texi (attribute nonstring): Update.
* gimple-fold.c (gimple_fold_builtin_strncpy): Call
get_attr_nonstring_decl and handle it.
* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
detection of nul-termination.

gcc/testsuite/ChangeLog:

PR tree-optimization/82945
* c-c++-common/Wstringop-truncation-2.c: New test.
* c-c++-common/Wstringop-truncation.c: Adjust.
* c-c++-common/attr-nonstring-2.c: Adjust.
* c-c++-common/attr-nonstring-3.c: New test.




diff --git a/gcc/calls.c b/gcc/calls.c
index 3730f43..197d907 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1494,6 +1494,108 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree 
args[2], int idx[2])
 }
 }

+/* If EXPR refers to a character array or pointer declared attribute
+   nonstring return a decl for that array or pointer and set *REF to
+   the referenced enclosing object or pointer.  Otherwise returns
+   null.  */

Generally we'd write NULL rather than null for a generic pointer and in
this specific case NULL_TREE is even better.


I agree.  The C NULL macro, C++ nullptr, and GCC's NULL_TREE and
NULL_RTL are helpful both for type safety and readability of code.
But they all are an implementation detail that's not relevant to
the specification of an API or its users. The established term
that every C and C++ programmer is familiar with is "null pointer."
I didn't think too hard about it when I wrote the comment (it just
didn't seem important enough) but now that I have I believe "null"
is better, clearer, and more future- proof (in case the return
type should ever change to some smart pointer class).

(There also comments in this file and elsewhere in GCC that use
various forms of null.)


+tree
+get_attr_nonstring_decl (tree expr, tree *ref)
+{
+  tree dcl = expr;

I think Jakub may have pointed this out, but we generally prefer decl.
Truly a nit though.


Right.  I've changed it to DECL.


+  if (TREE_CODE (dcl) == SSA_NAME)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
+   dcl = SSA_NAME_VAR (dcl);
+  else
+   {
+ gimple *def = SSA_NAME_DEF_STMT (dcl);
+
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == ADDR_EXPR
+ || code == COMPONENT_REF
+ || code == VAR_DECL)
+   dcl = gimple_assign_rhs1 (def);

Note that COMPONENT_REFs can have arguments that are themselves
COMPONENT_REFS.  So you typically want to use a loop to strip all away.


I have seen it in the early IL but this late I couldn't come
up with a test case where such a loop would be necessary.
The COMPONENT_REF always 

Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-19 Thread Martin Sebor

On 11/16/2017 05:15 PM, Jeff Law wrote:

On 11/13/2017 06:21 PM, Martin Sebor wrote:

Attached is an improved version that rather than hardcoding
the built-in functions to check uses the constness of each
built-in's char* argument as a trigger to do the checking.

This avoids the problem of the list being incomplete either
by omission or due to getting out of sync, and also makes it
trivial to extend the same approach to user-defined functions
in the future.

I've excluded strndup and strnlen (but no other "bounded"
string functions like strncmp) from the checking.

Martin

On 11/12/2017 05:52 PM, Martin Sebor wrote:

The recently introduced -Wstringop-truncation warning relies
on the new nonstring attribute to allow the historical use case
of calling strncpy to completely fill the destination with a copy
of a string without adding a terminating nul.  Glibc is currently
considering making use of the attribute to decorate some of its
data members.  To help find misuses of such data members in
arguments to string functions like strlen or strdup, the attached
patch adds checking for this new attribute in these contexts.
The checking is intentionally done late so  that uses such arrays
that can be proven to be safe (and thus folded) are not diagnosed.

While testing this simple enhancement I noticed that the handling
I added for the nul termination in cases like

  strncpy (array, s, sizeof array);
  array[sizeof array - 1] = 0;

to avoid the new warning wasn't quite as robust as it could and
arguably should be so I improved it a bit to silently accept
more forms of the idiom.  For instance, this is now correctly
handled (and not diagnosed):

  *stpcpy (array, s, sizeof array - 1) = 0;

Martin

PS A useful future enhancement would be to detect the one byte
overflow in:

  *stpcpy (array, s, sizeof array) = 0;



gcc-82945.diff


PR tree-optimization/82945 - add warning for passing non-strings to functions 
that expect string arguments

gcc/ChangeLog:

PR tree-optimization/82945
* calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
functions.
(initialize_argument_information): Call maybe_warn_nonstring_arg.
* calls.h (get_attr_nonstring_decl): Declare new function.
* doc/extend.texi (attribute nonstring): Update.
* gimple-fold.c (gimple_fold_builtin_strncpy): Call
get_attr_nonstring_decl and handle it.
* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
detection of nul-termination.

gcc/testsuite/ChangeLog:

PR tree-optimization/82945
* c-c++-common/Wstringop-truncation-2.c: New test.
* c-c++-common/Wstringop-truncation.c: Adjust.
* c-c++-common/attr-nonstring-2.c: Adjust.
* c-c++-common/attr-nonstring-3.c: New test.




diff --git a/gcc/calls.c b/gcc/calls.c
index 3730f43..197d907 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1494,6 +1494,108 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree 
args[2], int idx[2])
 }
 }

+/* If EXPR refers to a character array or pointer declared attribute
+   nonstring return a decl for that array or pointer and set *REF to
+   the referenced enclosing object or pointer.  Otherwise returns
+   null.  */

Generally we'd write NULL rather than null for a generic pointer and in
this specific case NULL_TREE is even better.


I agree.  The C NULL macro, C++ nullptr, and GCC's NULL_TREE and
NULL_RTL are helpful both for type safety and readability of code.
But they all are implementation details that's ot not relevant to the 
specification

of an API or its users.

The established term that every C and C++ programmer is familiar
with is "null pointer."

 and exist
mainly for type safety.  They are not relevant to the specification
of an API or its users.  I didn't think too hard about it when
I wrote the comment (it just didn't seem important enough) but
now
that I have I believe "null" is better, clearer, and more future-
proof (in case the return type should ever change to some smart
pointer class).

(There also comments in this file and elsewhere in GCC that use
various forms of null.)


+tree
+get_attr_nonstring_decl (tree expr, tree *ref)
+{
+  tree dcl = expr;

I think Jakub may have pointed this out, but we generally prefer decl.
Truly a nit though.


Right.  I've changed it to DECL.


+  if (TREE_CODE (dcl) == SSA_NAME)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
+   dcl = SSA_NAME_VAR (dcl);
+  else
+   {
+ gimple *def = SSA_NAME_DEF_STMT (dcl);
+
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == ADDR_EXPR
+ || code == COMPONENT_REF
+ || code == VAR_DECL)
+   dcl = gimple_assign_rhs1 (def);

Note that COMPONENT_REFs can have arguments that are themselves
COMPONENT_REFS.  So you typically want to use a loop to strip all away.


I have seen it in the early IL but 

Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-16 Thread Jakub Jelinek
On Thu, Nov 16, 2017 at 05:15:20PM -0700, Jeff Law wrote:
> > + tree_code code = gimple_assign_rhs_code (def);
> > + if (code == ADDR_EXPR
> > + || code == COMPONENT_REF
> > + || code == VAR_DECL)
> > +   dcl = gimple_assign_rhs1 (def);
> Note that COMPONENT_REFs can have arguments that are themselves
> COMPONENT_REFS.  So you typically want to use a loop to strip all away.

Do you want to handle just COMPONENT_REF, or other handled components
(ARRAY_REFs, etc.)?
We have handled_component_p predicate and get_base_address to strip them
(or get_inner_reference if you want other details about it).

Jakub


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-16 Thread Jeff Law
On 11/13/2017 06:21 PM, Martin Sebor wrote:
> Attached is an improved version that rather than hardcoding
> the built-in functions to check uses the constness of each
> built-in's char* argument as a trigger to do the checking.
> 
> This avoids the problem of the list being incomplete either
> by omission or due to getting out of sync, and also makes it
> trivial to extend the same approach to user-defined functions
> in the future.
> 
> I've excluded strndup and strnlen (but no other "bounded"
> string functions like strncmp) from the checking.
> 
> Martin
> 
> On 11/12/2017 05:52 PM, Martin Sebor wrote:
>> The recently introduced -Wstringop-truncation warning relies
>> on the new nonstring attribute to allow the historical use case
>> of calling strncpy to completely fill the destination with a copy
>> of a string without adding a terminating nul.  Glibc is currently
>> considering making use of the attribute to decorate some of its
>> data members.  To help find misuses of such data members in
>> arguments to string functions like strlen or strdup, the attached
>> patch adds checking for this new attribute in these contexts.
>> The checking is intentionally done late so  that uses such arrays
>> that can be proven to be safe (and thus folded) are not diagnosed.
>>
>> While testing this simple enhancement I noticed that the handling
>> I added for the nul termination in cases like
>>
>>   strncpy (array, s, sizeof array);
>>   array[sizeof array - 1] = 0;
>>
>> to avoid the new warning wasn't quite as robust as it could and
>> arguably should be so I improved it a bit to silently accept
>> more forms of the idiom.  For instance, this is now correctly
>> handled (and not diagnosed):
>>
>>   *stpcpy (array, s, sizeof array - 1) = 0;
>>
>> Martin
>>
>> PS A useful future enhancement would be to detect the one byte
>> overflow in:
>>
>>   *stpcpy (array, s, sizeof array) = 0;
> 
> 
> gcc-82945.diff
> 
> 
> PR tree-optimization/82945 - add warning for passing non-strings to functions 
> that expect string arguments
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/82945
>   * calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
>   functions.
>   (initialize_argument_information): Call maybe_warn_nonstring_arg.
>   * calls.h (get_attr_nonstring_decl): Declare new function.
>   * doc/extend.texi (attribute nonstring): Update.
>   * gimple-fold.c (gimple_fold_builtin_strncpy): Call
>   get_attr_nonstring_decl and handle it.
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
>   detection of nul-termination.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/82945
>   * c-c++-common/Wstringop-truncation-2.c: New test.
>   * c-c++-common/Wstringop-truncation.c: Adjust.
>   * c-c++-common/attr-nonstring-2.c: Adjust.
>   * c-c++-common/attr-nonstring-3.c: New test.

> 
> diff --git a/gcc/calls.c b/gcc/calls.c
> index 3730f43..197d907 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1494,6 +1494,108 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, 
> tree args[2], int idx[2])
>  }
>  }
>  
> +/* If EXPR refers to a character array or pointer declared attribute
> +   nonstring return a decl for that array or pointer and set *REF to
> +   the referenced enclosing object or pointer.  Otherwise returns
> +   null.  */
Generally we'd write NULL rather than null for a generic pointer and in
this specific case NULL_TREE is even better.

> +
> +tree
> +get_attr_nonstring_decl (tree expr, tree *ref)
> +{
> +  tree dcl = expr;
I think Jakub may have pointed this out, but we generally prefer decl.
Truly a nit though.


> +  if (TREE_CODE (dcl) == SSA_NAME)
> +{
> +  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
> + dcl = SSA_NAME_VAR (dcl);
> +  else
> + {
> +   gimple *def = SSA_NAME_DEF_STMT (dcl);
> +
> +   if (is_gimple_assign (def))
> + {
> +   tree_code code = gimple_assign_rhs_code (def);
> +   if (code == ADDR_EXPR
> +   || code == COMPONENT_REF
> +   || code == VAR_DECL)
> + dcl = gimple_assign_rhs1 (def);
Note that COMPONENT_REFs can have arguments that are themselves
COMPONENT_REFS.  So you typically want to use a loop to strip all away.




> +  /* It's safe to call strndup and strnlen with a non-string argument
> + since the functions provide an explicit bound for this purpose.  */
> +  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_STRNDUP)
> +return;
So I see you handle BUILT_IN_STRNDUP here, but not BUILT_IN_STRNLEN.  Is
there a reason for the omission?


Jeff


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-13 Thread Martin Sebor

Attached is an improved version that rather than hardcoding
the built-in functions to check uses the constness of each
built-in's char* argument as a trigger to do the checking.

This avoids the problem of the list being incomplete either
by omission or due to getting out of sync, and also makes it
trivial to extend the same approach to user-defined functions
in the future.

I've excluded strndup and strnlen (but no other "bounded"
string functions like strncmp) from the checking.

Martin

On 11/12/2017 05:52 PM, Martin Sebor wrote:

The recently introduced -Wstringop-truncation warning relies
on the new nonstring attribute to allow the historical use case
of calling strncpy to completely fill the destination with a copy
of a string without adding a terminating nul.  Glibc is currently
considering making use of the attribute to decorate some of its
data members.  To help find misuses of such data members in
arguments to string functions like strlen or strdup, the attached
patch adds checking for this new attribute in these contexts.
The checking is intentionally done late so  that uses such arrays
that can be proven to be safe (and thus folded) are not diagnosed.

While testing this simple enhancement I noticed that the handling
I added for the nul termination in cases like

  strncpy (array, s, sizeof array);
  array[sizeof array - 1] = 0;

to avoid the new warning wasn't quite as robust as it could and
arguably should be so I improved it a bit to silently accept
more forms of the idiom.  For instance, this is now correctly
handled (and not diagnosed):

  *stpcpy (array, s, sizeof array - 1) = 0;

Martin

PS A useful future enhancement would be to detect the one byte
overflow in:

  *stpcpy (array, s, sizeof array) = 0;


PR tree-optimization/82945 - add warning for passing non-strings to functions that expect string arguments

gcc/ChangeLog:

	PR tree-optimization/82945
	* calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
	functions.
	(initialize_argument_information): Call maybe_warn_nonstring_arg.
	* calls.h (get_attr_nonstring_decl): Declare new function.
	* doc/extend.texi (attribute nonstring): Update.
	* gimple-fold.c (gimple_fold_builtin_strncpy): Call
	get_attr_nonstring_decl and handle it.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
	detection of nul-termination.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82945
	* c-c++-common/Wstringop-truncation-2.c: New test.
	* c-c++-common/Wstringop-truncation.c: Adjust.
	* c-c++-common/attr-nonstring-2.c: Adjust.
	* c-c++-common/attr-nonstring-3.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 3730f43..197d907 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1494,6 +1494,108 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree args[2], int idx[2])
 }
 }
 
+/* If EXPR refers to a character array or pointer declared attribute
+   nonstring return a decl for that array or pointer and set *REF to
+   the referenced enclosing object or pointer.  Otherwise returns
+   null.  */
+
+tree
+get_attr_nonstring_decl (tree expr, tree *ref)
+{
+  tree dcl = expr;
+  if (TREE_CODE (dcl) == SSA_NAME)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
+	dcl = SSA_NAME_VAR (dcl);
+  else
+	{
+	  gimple *def = SSA_NAME_DEF_STMT (dcl);
+
+	  if (is_gimple_assign (def))
+	{
+	  tree_code code = gimple_assign_rhs_code (def);
+	  if (code == ADDR_EXPR
+		  || code == COMPONENT_REF
+		  || code == VAR_DECL)
+		dcl = gimple_assign_rhs1 (def);
+	}
+	}
+}
+
+  if (TREE_CODE (dcl) == ADDR_EXPR)
+dcl = TREE_OPERAND (dcl, 0);
+
+  if (ref)
+*ref = dcl;
+
+  if (TREE_CODE (dcl) == COMPONENT_REF)
+dcl = TREE_OPERAND (dcl, 1);
+
+  if (DECL_P (dcl)
+  && lookup_attribute ("nonstring", DECL_ATTRIBUTES (dcl)))
+return dcl;
+
+  return NULL_TREE;
+}
+
+/* Warn about passing a non-string array/pointer to a function that
+   expects a nul-terminated string argument.  */
+
+static void
+maybe_warn_nonstring_arg (tree fndecl, tree exp)
+{
+  if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return;
+
+  /* It's safe to call strndup and strnlen with a non-string argument
+ since the functions provide an explicit bound for this purpose.  */
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_STRNDUP)
+return;
+
+  /* Iterate over the built-in function's formal arguments and check
+ each const char* against the actual argument.  If the actual
+ argument is declared attribute non-string issue a warning.  */
+  function_args_iterator it;
+  function_args_iter_init (, TREE_TYPE (fndecl));
+
+  for (unsigned argno = 0; ; ++argno, function_args_iter_next ())
+{
+  tree argtype = function_args_iter_cond ();
+  if (!argtype)
+	break;
+
+  if (TREE_CODE (argtype) != POINTER_TYPE)
+	continue;
+
+  argtype = TREE_TYPE (argtype);
+
+  if (TREE_CODE (argtype) != INTEGER_TYPE
+	  || !TYPE_READONLY (argtype))
+	continue;
+
+  argtype = TYPE_MAIN_VARIANT 

Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-13 Thread Martin Sebor

On 11/13/2017 12:20 PM, Jakub Jelinek wrote:

On Sun, Nov 12, 2017 at 05:52:41PM -0700, Martin Sebor wrote:

+   the referenced enclosing object or pointer.  Otherwise return
+   null.  */
+
+tree
+get_attr_nonstring_decl (tree expr, tree *ref)
+{
+  tree dcl = expr;


Usually we call vars decl, not dcl.
Or what does it stand for?  In multiple other spots.


+  /* -1 terminated array of zero-based string arguments.  */
+  unsigned argno[] = { -1, -1, -1 };
+
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+case BUILT_IN_STRCASECMP:
+case BUILT_IN_STRCMP:
+case BUILT_IN_STRCSPN:
+case BUILT_IN_STRSPN:
+case BUILT_IN_STRNCMP:
+case BUILT_IN_STRNCASECMP:
+case BUILT_IN_VSSCANF:
+  argno[0] = 0;
+  argno[1] = 1;
+  break;
+
+case BUILT_IN_STPCPY:
+case BUILT_IN_STPNCPY:
+case BUILT_IN_STRCAT:
+case BUILT_IN_STRCPY:
+case BUILT_IN_STRNCAT:
+case BUILT_IN_STRNCPY:
+  argno[0] = 1;
+  break;
+
+case BUILT_IN_FPRINTF:
+case BUILT_IN_FPUTS:
+case BUILT_IN_SPRINTF:
+case BUILT_IN_STPCPY_CHK:
+case BUILT_IN_STPNCPY_CHK:
+case BUILT_IN_STRCAT_CHK:
+case BUILT_IN_STRCPY_CHK:
+case BUILT_IN_STRNCAT_CHK:
+case BUILT_IN_STRNCPY_CHK:
+case BUILT_IN_VFPRINTF:
+case BUILT_IN_VSPRINTF:
+case BUILT_IN_VFSCANF:
+  argno[0] = 1;
+  break;
+
+case BUILT_IN_SNPRINTF:
+case BUILT_IN_VSNPRINTF:
+  argno[0] = 2;
+  break;
+
+case BUILT_IN_PRINTF:
+case BUILT_IN_PRINTF_UNLOCKED:
+case BUILT_IN_PUTS:
+case BUILT_IN_PUTS_UNLOCKED:
+case BUILT_IN_STRCHR:
+case BUILT_IN_STRDUP:
+case BUILT_IN_STRLEN:


How was the above list of builtins chosen?
I don't see why some are included and others that behave similarly aren't.
Say, you have vsscanf and vfscanf in the list, but not vscanf, fscanf,
scanf and sscanf.  Or {f,s,sn,}printf and v{f,s,sn}printf,
but not vprintf, and have printf_unlocked, but not fprintf_unlocked.
And no *printf_chk.


Right.  It occurred to me only after I submitted the patch that
there's a better way to do this than by hardcoding the functions.
Let me post an updated patch.

Martin


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-13 Thread Jakub Jelinek
On Sun, Nov 12, 2017 at 05:52:41PM -0700, Martin Sebor wrote:
> +   the referenced enclosing object or pointer.  Otherwise return
> +   null.  */
> +
> +tree
> +get_attr_nonstring_decl (tree expr, tree *ref)
> +{
> +  tree dcl = expr;

Usually we call vars decl, not dcl.
Or what does it stand for?  In multiple other spots.

> +  /* -1 terminated array of zero-based string arguments.  */
> +  unsigned argno[] = { -1, -1, -1 };
> +
> +  switch (DECL_FUNCTION_CODE (fndecl))
> +{
> +case BUILT_IN_STRCASECMP:
> +case BUILT_IN_STRCMP:
> +case BUILT_IN_STRCSPN:
> +case BUILT_IN_STRSPN:
> +case BUILT_IN_STRNCMP:
> +case BUILT_IN_STRNCASECMP:
> +case BUILT_IN_VSSCANF:
> +  argno[0] = 0;
> +  argno[1] = 1;
> +  break;
> +
> +case BUILT_IN_STPCPY:
> +case BUILT_IN_STPNCPY:
> +case BUILT_IN_STRCAT:
> +case BUILT_IN_STRCPY:
> +case BUILT_IN_STRNCAT:
> +case BUILT_IN_STRNCPY:
> +  argno[0] = 1;
> +  break;
> +
> +case BUILT_IN_FPRINTF:
> +case BUILT_IN_FPUTS:
> +case BUILT_IN_SPRINTF:
> +case BUILT_IN_STPCPY_CHK:
> +case BUILT_IN_STPNCPY_CHK:
> +case BUILT_IN_STRCAT_CHK:
> +case BUILT_IN_STRCPY_CHK:
> +case BUILT_IN_STRNCAT_CHK:
> +case BUILT_IN_STRNCPY_CHK:
> +case BUILT_IN_VFPRINTF:
> +case BUILT_IN_VSPRINTF:
> +case BUILT_IN_VFSCANF:
> +  argno[0] = 1;
> +  break;
> +
> +case BUILT_IN_SNPRINTF:
> +case BUILT_IN_VSNPRINTF:
> +  argno[0] = 2;
> +  break;
> +
> +case BUILT_IN_PRINTF:
> +case BUILT_IN_PRINTF_UNLOCKED:
> +case BUILT_IN_PUTS:
> +case BUILT_IN_PUTS_UNLOCKED:
> +case BUILT_IN_STRCHR:
> +case BUILT_IN_STRDUP:
> +case BUILT_IN_STRLEN:

How was the above list of builtins chosen?
I don't see why some are included and others that behave similarly aren't.
Say, you have vsscanf and vfscanf in the list, but not vscanf, fscanf,
scanf and sscanf.  Or {f,s,sn,}printf and v{f,s,sn}printf,
but not vprintf, and have printf_unlocked, but not fprintf_unlocked.
And no *printf_chk.

Jakub


Re: [PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-13 Thread Bernhard Reutner-Fischer
On 13 November 2017 01:52:41 CET, Martin Sebor  wrote:

s/^\(he array\)/t\1/

thanks, 


[PATCH] detect nonstring arguments to string functions (PR 82945)

2017-11-12 Thread Martin Sebor

The recently introduced -Wstringop-truncation warning relies
on the new nonstring attribute to allow the historical use case
of calling strncpy to completely fill the destination with a copy
of a string without adding a terminating nul.  Glibc is currently
considering making use of the attribute to decorate some of its
data members.  To help find misuses of such data members in
arguments to string functions like strlen or strdup, the attached
patch adds checking for this new attribute in these contexts.
The checking is intentionally done late so  that uses such arrays
that can be proven to be safe (and thus folded) are not diagnosed.

While testing this simple enhancement I noticed that the handling
I added for the nul termination in cases like

  strncpy (array, s, sizeof array);
  array[sizeof array - 1] = 0;

to avoid the new warning wasn't quite as robust as it could and
arguably should be so I improved it a bit to silently accept
more forms of the idiom.  For instance, this is now correctly
handled (and not diagnosed):

  *stpcpy (array, s, sizeof array - 1) = 0;

Martin

PS A useful future enhancement would be to detect the one byte
overflow in:

  *stpcpy (array, s, sizeof array) = 0;
PR tree-optimization/82945 - add warning for passing non-strings to functions that expect string arguments

gcc/ChangeLog:

	PR tree-optimization/82945
	* calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
	functions.
	(initialize_argument_information): Call maybe_warn_nonstring_arg.
	* calls.h (get_attr_nonstring_decl): Declare new function.
	* doc/extend.texi (attribute nonstring): Update.
	* gimple-fold.c (gimple_fold_builtin_strncpy): Call
	get_attr_nonstring_decl and handle it.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
	detection of nul-termination.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82945
	* c-c++-common/Wstringop-truncation-2.c: New test.
	* c-c++-common/Wstringop-truncation.c: Adjust.
	* c-c++-common/attr-nonstring-2.c: Adjust.
	* c-c++-common/attr-nonstring-3.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 3730f43..c555f9a 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1494,6 +1494,139 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree args[2], int idx[2])
 }
 }
 
+/* If EXPR refers to a character array or pointer declared attribute
+   nonstring return a decl for that array or pointer and set *REF to
+   the referenced enclosing object or pointer.  Otherwise return
+   null.  */
+
+tree
+get_attr_nonstring_decl (tree expr, tree *ref)
+{
+  tree dcl = expr;
+  if (TREE_CODE (dcl) == SSA_NAME)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (dcl))
+	dcl = SSA_NAME_VAR (dcl);
+  else
+	{
+	  gimple *def = SSA_NAME_DEF_STMT (dcl);
+
+	  if (is_gimple_assign (def))
+	{
+	  tree_code code = gimple_assign_rhs_code (def);
+	  if (code == ADDR_EXPR
+		  || code == COMPONENT_REF
+		  || code == VAR_DECL)
+		dcl = gimple_assign_rhs1 (def);
+	}
+	}
+}
+
+  if (TREE_CODE (dcl) == ADDR_EXPR)
+dcl = TREE_OPERAND (dcl, 0);
+
+  if (ref)
+*ref = dcl;
+
+  if (TREE_CODE (dcl) == COMPONENT_REF)
+dcl = TREE_OPERAND (dcl, 1);
+
+  if (DECL_P (dcl)
+  && lookup_attribute ("nonstring", DECL_ATTRIBUTES (dcl)))
+return dcl;
+
+  return NULL_TREE;
+}
+
+/* Warn about passing a non-string array/pointer to a function that
+   expects a nul-terminated string argument.  */
+
+static void
+maybe_warn_nonstring_arg (tree fndecl, tree exp)
+{
+  if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return;
+
+  /* -1 terminated array of zero-based string arguments.  */
+  unsigned argno[] = { -1, -1, -1 };
+
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+case BUILT_IN_STRCASECMP:
+case BUILT_IN_STRCMP:
+case BUILT_IN_STRCSPN:
+case BUILT_IN_STRSPN:
+case BUILT_IN_STRNCMP:
+case BUILT_IN_STRNCASECMP:
+case BUILT_IN_VSSCANF:
+  argno[0] = 0;
+  argno[1] = 1;
+  break;
+
+case BUILT_IN_STPCPY:
+case BUILT_IN_STPNCPY:
+case BUILT_IN_STRCAT:
+case BUILT_IN_STRCPY:
+case BUILT_IN_STRNCAT:
+case BUILT_IN_STRNCPY:
+  argno[0] = 1;
+  break;
+
+case BUILT_IN_FPRINTF:
+case BUILT_IN_FPUTS:
+case BUILT_IN_SPRINTF:
+case BUILT_IN_STPCPY_CHK:
+case BUILT_IN_STPNCPY_CHK:
+case BUILT_IN_STRCAT_CHK:
+case BUILT_IN_STRCPY_CHK:
+case BUILT_IN_STRNCAT_CHK:
+case BUILT_IN_STRNCPY_CHK:
+case BUILT_IN_VFPRINTF:
+case BUILT_IN_VSPRINTF:
+case BUILT_IN_VFSCANF:
+  argno[0] = 1;
+  break;
+
+case BUILT_IN_SNPRINTF:
+case BUILT_IN_VSNPRINTF:
+  argno[0] = 2;
+  break;
+
+case BUILT_IN_PRINTF:
+case BUILT_IN_PRINTF_UNLOCKED:
+case BUILT_IN_PUTS:
+case BUILT_IN_PUTS_UNLOCKED:
+case BUILT_IN_STRCHR:
+case BUILT_IN_STRDUP:
+case BUILT_IN_STRLEN:
+  argno[0] = 0;
+  break;
+
+default:
+  return;
+}
+
+  for (unsigned i = 0; argno[i] != -1U; ++i)
+{
+