Patch for PR analyzer/98797

2021-02-09 Thread brian.sobulefsky via Gcc-patches
The attached patch has been bootstrapped and regression tested. It addresses PR 
analyzer/98797, which is built off the expected failures in 
gcc/testsuite/gcc.dg/analyzer/casts-1.c It fixes those failures and additional 
manifestations. That testsuite file has been updated to no longer expect 
failures and to include additional tests addressing the other manifestations I 
found.

Some additional work can probably be done to handle further cases, but this is 
now substantially more robust than the current status.

Thanks.commit f3654c5a322f241c8cc551f88257a495689ed9d9
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, 
as
recorded by the XFAIL, and some simpler and more complex versions found 
during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) 
is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function 
region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the 
function
region_model_manager::get_offset_region was failing to make the needed 
offset
region at all. This was due to the test for a constant 0 pointer that was 
then
returning get_cast_region. A special case is needed for when PARENT is of 
type
array_type whose type matches TYPE. In this case, get_offset_region is 
allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for 
the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)&x;
__analyzer_eval (p->a == 'A');
//etc

This requires a case added to region_model_manager::maybe_fold_sub_svalue in
the individual characters from string constant section for a field region.

Finally, the prior only works for the case where struct s2 was a single 
field
struct. The more general case is:

struct s2
{
  char arr1[2];
  char arr2[2];
};

struct s2 x = {{'A', 'B'}, {'C', 'D'}};

Here the array will never be found in the get_any_binding routines. A new
routine is added to class binding_cluster that checks if the region being
searched is a field_region of a cast_region, and if so, tries to find a 
field
of the original region that contains the region under examination. This new
function is named binding_cluster::get_parent_cast_binding. It is called 
from
get_binding_recursive.

gcc/analyzer/ChangeLog:
PR analyzer/98797
* region-model-manager.cc region_model_manager::get_offset_region: 
Add
check for a PARENT array whose type matches TYPE, and have this skip
returning get_cast_region and rather conclude the function normally.
* region-model-manager.cc 
region_model_manager::maybe_fold_sub_svalue
Update the get character from string_cst section to include cases 
for
an offset_region and a field_region.
* store.cc binding_cluster::get_binding_recursive: Add case for call
to new function get_parent_cast_binding.
* store.cc binding_cluster::get_parent_cast_binding: New function.
* store.h class binding_cluster: Add declaration for new member
function get_parent_class_binding and macros for bit to byte
conversion.

gcc/testsuite/ChangeLog:
PR analyzer/98797
* gcc.dg/analyzer/casts-1.c: Update file to no longer expect 
failures
and add test cases for additional bugs solved.

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index dfd2413e914..9aee4c498c6 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type,
   /* Handle getting individual chars from a STRING_CST.  */
   if (tree cst = parent_svalue->maybe_get_constant ())
 if (TREE_CODE (cst) == STRING_CST)
-  if (const element_region *element_reg
+  {
+   if (const element_region *element_reg
= subregion->dyn_cast_elem

Re: Patch for PR analyzer/98797

2021-02-10 Thread brian.sobulefsky via Gcc-patches
I apologize. I left a dangling space in the patch. I am guessing
that this is against whitespace policy. I had actually corrected
it but forgot to add the file to the commit. Please see the
attached.commit a9b60f36b499463b49095f28ce8053f6387c506d
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
recorded by the XFAIL, and some simpler and more complex versions found during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function
region_model_manager::get_offset_region was failing to make the needed offset
region at all. This was due to the test for a constant 0 pointer that was then
returning get_cast_region. A special case is needed for when PARENT is of type
array_type whose type matches TYPE. In this case, get_offset_region is allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)&x;
__analyzer_eval (p->a == 'A');
//etc

This requires a case added to region_model_manager::maybe_fold_sub_svalue in
the individual characters from string constant section for a field region.

Finally, the prior only works for the case where struct s2 was a single field
struct. The more general case is:

struct s2
{
  char arr1[2];
  char arr2[2];
};

struct s2 x = {{'A', 'B'}, {'C', 'D'}};

Here the array will never be found in the get_any_binding routines. A new
routine is added to class binding_cluster that checks if the region being
searched is a field_region of a cast_region, and if so, tries to find a field
of the original region that contains the region under examination. This new
function is named binding_cluster::get_parent_cast_binding. It is called from
get_binding_recursive.

gcc/analyzer/ChangeLog:
PR analyzer/98797
* region-model-manager.cc region_model_manager::get_offset_region: Add
check for a PARENT array whose type matches TYPE, and have this skip
returning get_cast_region and rather conclude the function normally.
* region-model-manager.cc region_model_manager::maybe_fold_sub_svalue
Update the get character from string_cst section to include cases for
an offset_region and a field_region.
* store.cc binding_cluster::get_binding_recursive: Add case for call
to new function get_parent_cast_binding.
* store.cc binding_cluster::get_parent_cast_binding: New function.
* store.h class binding_cluster: Add declaration for new member
function get_parent_class_binding and macros for bit to byte
conversion.

gcc/testsuite/ChangeLog:
PR analyzer/98797
* gcc.dg/analyzer/casts-1.c: Update file to no longer expect failures
and add test cases for additional bugs solved.

diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc
index dfd2413e914..1fd6b94f20a 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type,
   /* Handle getting individual chars from a STRING_CST.  */
   if (tree cst = parent_svalue->maybe_get_constant ())
 if (TREE_CODE (cst) == STRING_CST)
-  if (const element_region *element_reg
+  {
+	if (const element_region *element_reg
 	= subregion->dyn_cast_element_region ())
-	{
-	  const svalue *idx_sval = element_reg->get_index ();
-	  if (tree cst_idx = idx_sval->maybe_get_constant ())
-	if (const svalue *char_sval
-		= maybe_get_char_from_string_cst (cst, cst_idx))
-	  return get_or_create_cast (type, char_sval);
-	}
-
+	  {
+	const svalue *idx_sval = element_reg->get_index ();
+	if (tree cst

Re: Patch for PR analyzer/98797

2021-02-12 Thread David Malcolm via Gcc-patches
On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:

Hi Brian

Thanks for the patch.

The patch is large enough to count as legally significant; I've sent
you information on copyright assignment separately; the patch can't be
committed until that paperwork is in place.

In the meantime, I've added some review notes inline below throughout:
 
> Address the bug found in test file 
> gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
> recorded by the XFAIL, and some simpler and more complex versions found 
> during
> the investigation of it. PR analyzer/98797 was opened for these bugs.
> 
> The simplest manifestation of this bug can be replicated with:
> 
> char arr[] = {'A', 'B', 'C', 'D'};
> char *pt = ar;
> __analyzer_eval(*(pt + 0) == 'A');
> __analyzer_eval(*(pt + 1) == 'B');
> //etc
> 
> The result of the eval call is "UNKNOWN" because the analyzer is unable to
> determine the value at the pointer. Note that array access (such as 
> arr[0]) is
> correctly handled. The code responsible for this is in file
> region-model-manager.cc, function 
> region_model_manager::maybe_fold_sub_svalue.
> The relevant section is commented /* Handle getting individual chars from
> STRING_CST */. This section only had a case for an element_region. A case
> needed to be added for an offset_region.
> 
> Additionally, when the offset was 0, such as in *pt or *(pt + 0), the 
> function
> region_model_manager::get_offset_region was failing to make the needed 
> offset
> region at all. This was due to the test for a constant 0 pointer that was 
> then
> returning get_cast_region. A special case is needed for when PARENT is of 
> type
> array_type whose type matches TYPE. In this case, get_offset_region is 
> allowed
> to continue to normal conclusion.
> 
> The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for 
> the
> case:
> 
> struct s1
> {
>   char a;
>   char b;
>   char c;
>   char d;
> };
> 
> struct s2
> {
>   char arr[4];
> };
> 
> struct s2 x = {{'A', 'B', 'C', 'D'}};
> struct s1 *p = (struct s1 *)&x;
> __analyzer_eval (p->a == 'A');
> //etc
> 
> This requires a case added to region_model_manager::maybe_fold_sub_svalue 
> in
> the individual characters from string constant section for a field region.
> 
> Finally, the prior only works for the case where struct s2 was a single 
> field
> struct. The more general case is:
> 
> struct s2
> {
>   char arr1[2];
>   char arr2[2];
> };
> 
> struct s2 x = {{'A', 'B'}, {'C', 'D'}};

This is s3 in the testcase, rather than s2; looks like this commit
message should be updated accordingly to match your change to the
testcase.

BTW, did this case arise in practice?  The existing cases are rather
artificial; IIRC I created them whilst experimenting with various casts
whilst prototypeing the code, in the hope of generating coverage, but
without any specific real-world examples in mind.  (kind of "kicking
the tires", if you will).  Am I right in thinking that this new one
arose in a similar way?


> Here the array will never be found in the get_any_binding routines. A new
> routine is added to class binding_cluster that checks if the region being
> searched is a field_region of a cast_region, and if so, tries to find a 
> field
> of the original region that contains the region under examination. This 
> new
> function is named binding_cluster::get_parent_cast_binding. It is called 
> from
> get_binding_recursive.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/98797
> * region-model-manager.cc 
> region_model_manager::get_offset_region: Add
> check for a PARENT array whose type matches TYPE, and have this 
> skip
> returning get_cast_region and rather conclude the function 
> normally.
> * region-model-manager.cc 
> region_model_manager::maybe_fold_sub_svalue
> Update the get character from string_cst section to include cases 
> for
> an offset_region and a field_region.
> * store.cc binding_cluster::get_binding_recursive: Add case for 
> call
> to new function get_parent_cast_binding.
> * store.cc binding_cluster::get_parent_cast_binding: New function.
> * store.h class binding_cluster: Add declaration for new member
> function get_parent_class_binding and macros for bit to byte
> conversion.

Formatting nit:  the items in the ChangeLog within each file should be
enclosed by parentheses.  We have a git commit hook that verifies the
format.  In theory there's a way to run it ahead of time, but I don't
know of the top of my head where it is.


> gcc/testsuite/ChangeLog:
> PR analyzer/98797
> * gcc.dg/analyzer/casts-1.c

Re: Patch for PR analyzer/98797

2021-02-13 Thread brian.sobulefsky via Gcc-patches
Hi, answers below. Note, do you want my resubmitted patch to be with
commit --amend, and therefore relative to "real" commits in the history,
or do you want me to chain it onto the last submission?

I have already sent to assign at gnu for the form.

I will update the commit message to call it s3 now.

The case did not arise "in practice." After solving the xfail from the
test suite, I tried dividing the array to have more than one field and
found that this still did not work, so I tracked down why. I would
argue that the case of a struct made of only a single array as its one
field is a lot less likely to arise in practice. For what its worth,
the single field example basically works "for the wrong
reason."
When get_binding_recursive goes up the parent chain, it finds
a binding
in the parent cast region because that cast is a struct made of
only one
field, and so its binding_key is the same as that field. The
"right"
logic would check for a field covering the desired region.

I'll fix the ChangeLog formatting parentheses. I found the formatting
script, but I did not have one of the imported python files so I just copied 
others.

I can add the helper routine you have requested. Where would you like it,
I am thinking analyzer.h so it is visible basically everywhere?

I can add the check for size matching, which helper routine is easiest.

Yes, in the event of *p == 'A', *p, having a constant zerop offset
would get sent to get_cast_region, where it would become a cast region.
This defeats the new logic in maybe_fold_sub_svalue. I think in the case
where we are pointing to an array of the requested type, it is much more
correct to return the offset region with a zero offset, as arr[0] should not
be different than arr[1]. Sometimes the 0 tree would be of pointer_type,
and I was not sure if this would defeat it or not, so I made sure it was
of integer_type. This may just be a matter of my being new and not
knowing for sure how everything works and so erring on the side of safety.

As of now, the routine immediately rejects any case other than where
reg is a field_region and parent is a cast_region. I will think if there
is a C like syntax for the function, really it is checking if the original
form of parent had a field covering the requested field.

I will remove the first guard, leave the second, and try to reformat
the latters into a similar rejection style.

Currently, get_field_at_bit_offset is not an externally visible function.
I am taking your instruction to reuse it to mean a prototype should
be added to the relevant header (it would be region.h I think, as long as
both region-model-manager.cc and store.cc include those).

As you know, I am very new to gcc and so was happy when I could hack my
way to this working at all. I already assumed my direct access was not
correct. Most of what I did is based on "RTFS" since the manual does not
really cover the right way to do these things. I will try the routine
or otherwise macro you say.

I will change the testfile to leave pre exisiting lines in place.


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Friday, February 12, 2021 4:16 PM, David Malcolm  wrote:

> On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:
>
> Hi Brian
>
> Thanks for the patch.
>
> The patch is large enough to count as legally significant; I've sent
> you information on copyright assignment separately; the patch can't be
> committed until that paperwork is in place.
>
> In the meantime, I've added some review notes inline below throughout:
>
> > Address the bug found in test file 
> > gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
> > recorded by the XFAIL, and some simpler and more complex versions found 
> > during
> > the investigation of it. PR analyzer/98797 was opened for these bugs.
> >
> > The simplest manifestation of this bug can be replicated with:
> >
> > char arr[] = {'A', 'B', 'C', 'D'};
> > char *pt = ar;
> > __analyzer_eval(*(pt + 0) == 'A');
> > __analyzer_eval(*(pt + 1) == 'B');
> > //etc
> >
> > The result of the eval call is "UNKNOWN" because the analyzer is unable 
> > to
> > determine the value at the pointer. Note that array access (such as 
> > arr[0]) is
> > correctly handled. The code responsible for this is in file
> > region-model-manager.cc, function 
> > region_model_manager::maybe_fold_sub_svalue.
> > The relevant section is commented /* Handle getting individual chars 
> > from
> > STRING_CST */. This section only had a case for an element_region. A 
> > case
> > needed to be added for an offset_region.
> >
> > Additionally, when the offset was 0, such as in *pt or *(pt + 0), the 
> > function
> > region_model_manager::get_offset_region was failing to make the needed 
> > offset
> > region at all. This was due to the test for a constant 0 pointer that 
> > was then
> > returning get_cast_region. A special case is needed for when PARENT is 
> > of t

Re: Patch for PR analyzer/98797

2021-02-16 Thread brian.sobulefsky via Gcc-patches
David,

Two items I needs answered to get you the patch.

1) The get_field_at_bit_offset function is static in region.cc, so I cannot use 
it outside of the file (you want me to use it in store.cc) without updating 
that definition to extern. I am assuming you want this.

2) I am updating my direct access of the tree fields. I am using 
int_bit_position for the position of a field in a struct as you requested. What 
do I use for the size of the struct. I had been using 
->decl_common.size->int_cst[0].


Brian


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Saturday, February 13, 2021 9:53 AM, brian.sobulefsky 
 wrote:

> Hi, answers below. Note, do you want my resubmitted patch to be with
> commit --amend, and therefore relative to "real" commits in the history,
> or do you want me to chain it onto the last submission?
>
> I have already sent to assign at gnu for the form.
>
> I will update the commit message to call it s3 now.
>
> The case did not arise "in practice." After solving the xfail from the
> test suite, I tried dividing the array to have more than one field and
> found that this still did not work, so I tracked down why. I would
> argue that the case of a struct made of only a single array as its one
> field is a lot less likely to arise in practice. For what its worth,
> the single field example basically works "for the wrong
> reason."
> When get_binding_recursive goes up the parent chain, it finds
> a binding
> in the parent cast region because that cast is a struct made of
> only one
> field, and so its binding_key is the same as that field. The
> "right"
> logic would check for a field covering the desired region.
>
> I'll fix the ChangeLog formatting parentheses. I found the formatting
> script, but I did not have one of the imported python files so I just 
copied others.
>
> I can add the helper routine you have requested. Where would you like it,
> I am thinking analyzer.h so it is visible basically everywhere?
>
> I can add the check for size matching, which helper routine is easiest.
>
> Yes, in the event of *p == 'A', *p, having a constant zerop offset
> would get sent to get_cast_region, where it would become a cast region.
> This defeats the new logic in maybe_fold_sub_svalue. I think in the case
> where we are pointing to an array of the requested type, it is much more
> correct to return the offset region with a zero offset, as arr[0] should 
not
> be different than arr[1]. Sometimes the 0 tree would be of pointer_type,
> and I was not sure if this would defeat it or not, so I made sure it was
> of integer_type. This may just be a matter of my being new and not
> knowing for sure how everything works and so erring on the side of safety.
>
> As of now, the routine immediately rejects any case other than where
> reg is a field_region and parent is a cast_region. I will think if there
> is a C like syntax for the function, really it is checking if the original
> form of parent had a field covering the requested field.
>
> I will remove the first guard, leave the second, and try to reformat
> the latters into a similar rejection style.
>
> Currently, get_field_at_bit_offset is not an externally visible function.
> I am taking your instruction to reuse it to mean a prototype should
> be added to the relevant header (it would be region.h I think, as long as
> both region-model-manager.cc and store.cc include those).
>
> As you know, I am very new to gcc and so was happy when I could hack my
> way to this working at all. I already assumed my direct access was not
> correct. Most of what I did is based on "RTFS" since the manual does not
> really cover the right way to do these things. I will try the routine
> or otherwise macro you say.
>
> I will change the testfile to leave pre exisiting lines in place.
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, February 12, 2021 4:16 PM, David Malcolm dmalc...@redhat.com 
wrote:
>
> > On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:
> > Hi Brian
> > Thanks for the patch.
> > The patch is large enough to count as legally significant; I've sent
> > you information on copyright assignment separately; the patch can't be
> > committed until that paperwork is in place.
> > In the meantime, I've added some review notes inline below throughout:
> >
> > > Address the bug found in test file 
gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
> > > recorded by the XFAIL, and some simpler and more complex 
versions found during
> > > the investigation of it. PR analyzer/98797 was opened for 
these bugs.
> > >
> > > The simplest manifestation of this bug can be replicated 
with:
> > >
> > > char arr[] = {'A', 'B', 'C', 'D'};
> > > char *pt = ar;
> > > __analyzer_eval(*(pt + 0) == 'A');
> > > __analyzer_eval(*(pt + 1) == 'B');
> > > //etc
> > >
> > > The result of the eval call is "UNKNOWN" because the 
analyzer is unable to
> > > determine the va

Re: Patch for PR analyzer/98797

2021-02-17 Thread brian.sobulefsky via Gcc-patches
Hi David,

I'm sorry but I wanted to get this out to you. To clarify, I had a statement of 
the form:

end_offset <= covering_field->field_decl.bit_offset->int_cst.val[0]
covering_field->decl_common.size->int_cst.val[0] - 1;

(Sorry if my email is clobbering the angle brackets). I have replaced the first
expression with int_bit_position (covering_field), I am not sure where to 
properly
access the size of the field. FWIW, I found your region::get_byte_size, which
calls int_size_in_bytes, but this crashes gcc for a field tree, it wants a
type tree.

Additionally, is there some proper way to access a bit_offset_t other than
bit_off.get_val ()[0]? That is what I am using now, but I can swap it out.

Sorry for the newbie questions, but these things aren't really documented
in one place, at least that I am aware of.


Brian


Re: Patch for PR analyzer/98797

2021-02-17 Thread David Malcolm via Gcc-patches
On Wed, 2021-02-17 at 23:18 +, brian.sobulefsky wrote:
> Hi David,
> 
> I'm sorry but I wanted to get this out to you. To clarify, I had a
> statement of the form:
> 
> end_offset <= covering_field->field_decl.bit_offset->int_cst.val[0]
>     covering_field->decl_common.size->int_cst.val[0] - 1;
> 
> (Sorry if my email is clobbering the angle brackets). I have replaced
> the first
> expression with int_bit_position (covering_field), I am not sure
> where to properly
> access the size of the field. FWIW, I found your
> region::get_byte_size, which
> calls int_size_in_bytes, but this crashes gcc for a field tree, it
> wants a
> type tree.

BTW, gcc/stor-layout.{c|h} holds the compiler's logic for laying out
the fields of structs and unions, determining bit-offsets and sizes. 
place_field is called repeatedly per field, and handles things like
sizes, alignments, etc.

I think you want DECL_SIZE() on the FIELD_DECL, which is typically a
INTEGER_CST (but won't necessarily be, consider VLAs, trailing empty
arrays, etc).  Quoting tree.h:

/* Holds the size of the datum, in bits, as a tree expression.
   Need not be constant and may be null.  May be less than TYPE_SIZE
   for a C++ FIELD_DECL representing a base class subobject with its
   own virtual base classes (which are laid out separately).  */
#define DECL_SIZE(NODE) (DECL_COMMON_CHECK (NODE)->decl_common.size)


> Additionally, is there some proper way to access a bit_offset_t other
> than
> bit_off.get_val ()[0]? That is what I am using now, but I can swap it
> out.

bit_offset_t is an offset_int, which is defined in gcc/wide-int.h
though I confess I find that header file difficult to understand. 
There are some overloaded operators and member functions for working
with them; see that header.  That said, looking at the analyzer
sources, I've typically been converting them to HOST_WIDE_INT via
to_shwi ().  I suspect the way I'm doing that is buggy when dealing
with very large types.


> Sorry for the newbie questions, but these things aren't really
> documented
> in one place, at least that I am aware of.

Fair enough, and don't worry, we were all new once.  There is some
documentation in
  https://gcc.gnu.org/onlinedocs/gccint/index.html
e.g.:
  https://gcc.gnu.org/onlinedocs/gccint/GENERIC.html
but it's a lot to wade through and is mostly just as an API reference
rather than a tutorial.

Hope this is helpful
Dave



Re: Patch for PR analyzer/98797

2021-02-18 Thread brian.sobulefsky via Gcc-patches
Hi David,

Please find my revised patch attached. You should find all items addressed. I
have bootstrapped and rerun the testsuite (gcc and g++ only).  Also, you
should have seen my legal paperwork go in.

1) The commit message has been updated to reflect the nomenclature in the
testsuite, as well as include the required parentheses.

2) The commit message has been updated to reflect the other changes you
requested below.

3) My macros and inline code for "bit_byte_conversion" have been replaced with
the subroutine you requested. It works by modulo and division rather than
bitwise-and and bitshift, as requested. Without any specific instructions, I
put the routine in analyzer.cc and the prototype in analyzer.h, as it provides
a generic helper service. You can move it anywhere else you might want it.

4) I added the check for the field to be sizeof (char). This is actually a
cause to reject getting a char from a string constant for any case, so I
added the check near the outside of the nested ifs. Also, as a side note,
it looks to me like your example:

void test_4 ()
{
  const char *s = "ABCD";
  __analyzer_eval (*(short *)s == 'A');
}

is unrelated to my patch, as I remember a char pointer to a string constant is
handled as a special case elsewhere and does not end up in
maybe_fold_sub_svalue.

5) I updated the modification to get_offset_region so that my new code is only
run in the case of a zerop. This was really the failure issue anyway. I still
have it building a new constant svalue of zero, because I do not know for sure
if the zero pointer type tree and zero integer type tree are different enough to
cause confusion down the line when getting the character.

6) My new subroutine get_parent_cast_region takes any svalues as an argument
and does its own checks for svalue_kind correctness, returning NULL otherwise.
My reason for this is so get_binding_recursive would remain absolutely clean,
the way its recursive call is. Basically, get_binding_recursive can just call
get_parent_cast_region as an assignment within a conditional, just like
the recursion step is, and the returned NULLs, whether by incorrect type or by
not finding a binding for the correct type, causes it to bypass returning there.

7) The if guards were updated as requested.

8) I used your get_field_at_bit_offset routine as requested. To stress again,
this was a static function to region.cc. I had to add a forward declaration
(I again used analyzer.h because it is a "generic helper" function, but that
is easy for you to change if you do not like it there) *and* I had to update
your definition to extern rather than static. Otherwise, it seems to work.

9) The compound conditionals should all be to your liking now.

10) After all the back and forth, I think I am using "approved" helper
functions and macros rather than accessing the tree fields directly. It seems
they all work correctly.

11) All preexisting deja-gnu tests are on the original lines they were on
before. All new tests are appended. Blank lines are left where the dg-bogus
calls were.


Thank you,
Briancommit 7f0e3685900b527b64b81c3b44af36fd9e99bea3
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
recorded by the XFAIL, and some simpler and more complex versions found during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function
region_model_manager::get_offset_region was failing to make the needed offset
region at all. This was due to the test for a constant 0 pointer that was then
returning get_cast_region. A special case is needed for when PARENT is of type
array_type whose type matches TYPE. In this case, get_offset_region is allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)&x;
__analyzer_eval (p->a == 'A');
//etc