Re: GSoC Timeline Review

2024-04-03 Thread Eric Feng via Gcc
Hi Nada,

Apologies for not being able to reply earlier as well. I’m glad to hear
you’re interested in continuing this project! There is still a lot of work
to be done — my work from last summer is in a very prototype stage. As
David mentioned, familiarizing myself with the analyzer took some time, but
I'm confident you’ll be able to make quicker progress if you're more
experienced. If you’re interested in continuing to work on the reference
count checking aspect of the project in a similar direction to what I was
doing, this bugzilla post might be helpful:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Specifically, David
and I discussed the challenge of scaling the implementation effectively,
particularly when tracking the behavior of every API entrypoint through
subclassing of known functions, and how we might reduce some of this effort
with function attributes. Moreover, you might also be interested in
exploring the work presented by Xutong Ma et al. at ASE 2023 late last
summer, titled “Detecting Memory Errors in Python Native Code by Tracking
Object Lifecycle with Reference Count,” as well as the works it was
compared against, if you’re interested in exploring alternative approaches.
Overall, I hope you find the experience as enriching as I did should you
work on this project! Please feel free to reach out with any questions and
I’ll be more than happy to help where I can.

Best,
Eric


On Tue, Apr 2, 2024 at 10:35 AM Martin Jambor  wrote:

> Hello,
>
> On Sat, Mar 30 2024, Nada Elsayed via Gcc wrote:
> > I think that I didn't fully understand the project, so I read more and
> > updated the Timeline Suggestion.
>
> Sorry that we were for not being able to respond sooner, Easter got into
> way in an unfortunate way.  I do not know much about Cython or static
> analysis (so I would not be able to give much advice about improvements
> to the time-line), but can confirm we have received your application.
>
> Thanks a lot.
>
> Martin
>
> >
> > Suggested Timeline:
> >
> >-
> >
> >May 1-26:
> >-
> >
> >   Explore Cython modules and try more realistic codes to see how it
> >   translates Python to c/c++.
> >   -
> >
> >   Know more about entry-points that Cython uses.
> >   -
> >
> >   Understand common bugs that happen when converting Python to c/c++.
> >
> >
> >
> >-
> >
> >Explore static analysis tool for CPython Extension code -which is
> >written in Python- and try this analyzer to understand the bugs in
> >translated Python code fully.
> >-
> >
> >Know how we can emit warnings and errors.
> >
> >
> >
> >-
> >
> >Debug the codebase to grasp its structure and potential areas for
> >improvement.
> >
> >
> >-
> >
> >Weeks 1-2:
> >-
> >
> >   Understand more about reference counting verification.
> >   -
> >
> >   Develop verifying reference counts for PyObjects passed as
> parameters.
> >   -
> >
> >Weeks 3-4:
> >-
> >
> >   Begin to investigate Error Handling Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Error Handling
> checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Weeks 5-7:
> >-
> >
> >   Begin to investigate Exception Handling Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Exception Handling
> >   checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Weeks 8-11
> >-
> >
> >   Begin to investigate Format String Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Format String
> Checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Week 12
> >-
> >
> >   Writing the GSoC wrapping-up document.
> >
> >
> > ‫في الأربعاء، 27 مارس 2024 في 2:31 ص تمت كتابة ما يلي بواسطة ‪Nada
> > Elsayed‬‏ <‪nadaelsayed...@gmail.com‬‏>:‬
> >
> >> Greetings All,
> >> Hope this email finds you well.
> >> I am interested in "Extend the plugin to add checking for usage of the
> >> CPython API" project. First of all, I built the library, and now I am
> >> trying to debug it. Then, I also used Cpython in 3 demos to understand
> how
> >> it works. Finally, I read the uploaded patch comments to understand the
> >> codebase and file structure.
> >>
> >> I was wondering if you could review my suggested timeline?
> >> suggested Timeline:
> >>
> >>-
> >>
> >>May 1-26:
> >>-
> >>
> >>   Explore Cython modules, emphasizing entry-points and bug
> >>   identification.
> >>   -
> >>
> >>   Study analyzers, particularly cpy-analyzer, to enhance
> >>   understanding.
> >>   -
> >>
> >>   Debug the codebase to grasp its structure and potential areas for
> >>   improvement.
> >>   -
> >>
> >>   Focus investigation on "errors in GIL handling" and "tp_traverse
> >>   errors".
> >>   -
> >>
> >>Weeks 1-6:
> >>   

Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

2023-09-10 Thread Eric Feng via Gcc
On Thu, Sep 7, 2023 at 1:28 PM David Malcolm  wrote:

> On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
>
> > Hi Dave,
>
> Hi Eric, thanks for the patch.
>
> >
> > Recently I've been working on symbolic value support for the reference
> > count checker. I've attached a patch for it below; let me know it looks
> > OK for trunk. Thanks!
> >
> > Best,
> > Eric
> >
> > ---
> >
> > This patch enhances the reference count checker in the CPython plugin by
> > adding support for symbolic values. Whereas previously we were only able
> > to check the reference count of PyObject* objects created in the scope
> > of the function; we are now able to emit diagnostics on reference count
> > mismatch of objects that were, for example, passed in as a function
> > parameter.
> >
> > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but
> ob_refcnt field is N + ‘2’
> > 6 |   return obj;
> >   |  ^~~
>
> [...snip...]
>
> >  create mode 100644
> gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index bf1982e79c3..d7ecd7fce09 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -314,17 +314,20 @@ public:
> >{
> >  diagnostic_metadata m;
> >  bool warned;
> > -// just assuming constants for now
> > -auto actual_refcnt
> > - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> > -auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue
> ()->get_constant ();
> > -warned = warning_meta (rich_loc, m, get_controlling_option (),
> > -"expected %qE to have "
> > -"reference count: %qE but ob_refcnt field is:
> %qE",
> > -m_reg_tree, actual_refcnt, ob_refcnt);
> > -
> > -// location_t loc = rich_loc->get_loc ();
> > -// foo (loc);
> > +
> > +const auto *actual_refcnt_constant
> > + = m_actual_refcnt->dyn_cast_constant_svalue ();
> > +const auto *ob_refcnt_constant =
> m_ob_refcnt->dyn_cast_constant_svalue ();
> > +if (!actual_refcnt_constant || !ob_refcnt_constant)
> > +  return false;
> > +
> > +auto actual_refcnt = actual_refcnt_constant->get_constant ();
> > +auto ob_refcnt = ob_refcnt_constant->get_constant ();
> > +warned = warning_meta (
> > + rich_loc, m, get_controlling_option (),
> > + "expected %qE to have "
> > + "reference count: N + %qE but ob_refcnt field is N + %qE",
> > + m_reg_tree, actual_refcnt, ob_refcnt);
> >  return warned;
>
> I know you're emulating the old behavior I implemented way back in
> cpychecker, but I don't like that behavior :(
>
> Specifically, although the patch improves the behavior for symbolic
> values, it regresses the precision of wording for the concrete values
> case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only have
> 1 pointer, then it's more readable to say:
>
>   warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
> field is ‘2’
>
> than:
>
>   warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt
> field is N + ‘2’
>
> ...and we shouldn't quote concrete numbers, the message should be:
>
>   warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field
> is 2


> or better:
>
>   warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2
>
>
> Can you move the unwrapping of the svalue from the tests below into the
> emit vfunc?  That way the m_actual_refcnt doesn't have to be a
> constant_svalue; you could have logic in the emit vfunc to print
> readable messages based on what kind of svalue it is.
>
> Rather than 'N', it might be better to say 'initial'; how about:
>
>   warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt'
> field has increased by 1
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field has increased by 2
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field is unchanged
>   warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt'
> field has decreased by 1
>   warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field
> is unchanged
>
> and similar?
>

That makes sense to me as well (indeed I was just emulating the old
behavior)! Will experiment and keep you posted on a revised patch with this
in mind.  This is somewhat of a minor detail but can we emit ‘*obj’ as
bolded text in the diagnostic message? Currently, I can emit this
(including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't
bold the body of the single quotations. Is this possible?

>
> Maybe have a flag that tracks whether we're talking about a concrete
> value that's absolute versus a concrete value that's relative to the
> initial value?
>
>
> [...snip...]
>
>
> > @@ -369,6 +368,19 @@ 

Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

2023-09-10 Thread Eric Feng via Gcc-patches
On Thu, Sep 7, 2023 at 1:28 PM David Malcolm  wrote:

> On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
>
> > Hi Dave,
>
> Hi Eric, thanks for the patch.
>
> >
> > Recently I've been working on symbolic value support for the reference
> > count checker. I've attached a patch for it below; let me know it looks
> > OK for trunk. Thanks!
> >
> > Best,
> > Eric
> >
> > ---
> >
> > This patch enhances the reference count checker in the CPython plugin by
> > adding support for symbolic values. Whereas previously we were only able
> > to check the reference count of PyObject* objects created in the scope
> > of the function; we are now able to emit diagnostics on reference count
> > mismatch of objects that were, for example, passed in as a function
> > parameter.
> >
> > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but
> ob_refcnt field is N + ‘2’
> > 6 |   return obj;
> >   |  ^~~
>
> [...snip...]
>
> >  create mode 100644
> gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index bf1982e79c3..d7ecd7fce09 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -314,17 +314,20 @@ public:
> >{
> >  diagnostic_metadata m;
> >  bool warned;
> > -// just assuming constants for now
> > -auto actual_refcnt
> > - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> > -auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue
> ()->get_constant ();
> > -warned = warning_meta (rich_loc, m, get_controlling_option (),
> > -"expected %qE to have "
> > -"reference count: %qE but ob_refcnt field is:
> %qE",
> > -m_reg_tree, actual_refcnt, ob_refcnt);
> > -
> > -// location_t loc = rich_loc->get_loc ();
> > -// foo (loc);
> > +
> > +const auto *actual_refcnt_constant
> > + = m_actual_refcnt->dyn_cast_constant_svalue ();
> > +const auto *ob_refcnt_constant =
> m_ob_refcnt->dyn_cast_constant_svalue ();
> > +if (!actual_refcnt_constant || !ob_refcnt_constant)
> > +  return false;
> > +
> > +auto actual_refcnt = actual_refcnt_constant->get_constant ();
> > +auto ob_refcnt = ob_refcnt_constant->get_constant ();
> > +warned = warning_meta (
> > + rich_loc, m, get_controlling_option (),
> > + "expected %qE to have "
> > + "reference count: N + %qE but ob_refcnt field is N + %qE",
> > + m_reg_tree, actual_refcnt, ob_refcnt);
> >  return warned;
>
> I know you're emulating the old behavior I implemented way back in
> cpychecker, but I don't like that behavior :(
>
> Specifically, although the patch improves the behavior for symbolic
> values, it regresses the precision of wording for the concrete values
> case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only have
> 1 pointer, then it's more readable to say:
>
>   warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
> field is ‘2’
>
> than:
>
>   warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt
> field is N + ‘2’
>
> ...and we shouldn't quote concrete numbers, the message should be:
>
>   warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field
> is 2


> or better:
>
>   warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2
>
>
> Can you move the unwrapping of the svalue from the tests below into the
> emit vfunc?  That way the m_actual_refcnt doesn't have to be a
> constant_svalue; you could have logic in the emit vfunc to print
> readable messages based on what kind of svalue it is.
>
> Rather than 'N', it might be better to say 'initial'; how about:
>
>   warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt'
> field has increased by 1
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field has increased by 2
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field is unchanged
>   warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt'
> field has decreased by 1
>   warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field
> is unchanged
>
> and similar?
>

That makes sense to me as well (indeed I was just emulating the old
behavior)! Will experiment and keep you posted on a revised patch with this
in mind.  This is somewhat of a minor detail but can we emit ‘*obj’ as
bolded text in the diagnostic message? Currently, I can emit this
(including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't
bold the body of the single quotations. Is this possible?

>
> Maybe have a flag that tracks whether we're talking about a concrete
> value that's absolute versus a concrete value that's relative to the
> initial value?
>
>
> [...snip...]
>
>
> > @@ -369,6 +368,19 @@ 

[PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

2023-09-04 Thread Eric Feng via Gcc
Hi Dave,

Recently I've been working on symbolic value support for the reference
count checker. I've attached a patch for it below; let me know it looks
OK for trunk. Thanks!

Best,
Eric

---

This patch enhances the reference count checker in the CPython plugin by
adding support for symbolic values. Whereas previously we were only able
to check the reference count of PyObject* objects created in the scope
of the function; we are now able to emit diagnostics on reference count
mismatch of objects that were, for example, passed in as a function
parameter.

rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but 
ob_refcnt field is N + ‘2’
6 |   return obj;
  |  ^~~
  ‘create_py_object2’: event 1
|
|6 |   return obj;
|  |  ^~~
|  |  |
|  |  (1) here
|


gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Support reference count 
checking
of symbolic values.
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: New test.
* gcc.dg/plugin/plugin.exp: New test.
* gcc.dg/plugin/cpython-plugin-test-refcnt.c: New test.

Signed-off-by: Eric Feng 

---
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 133 +++---
 .../cpython-plugin-test-PyList_Append.c   |  21 ++-
 .../plugin/cpython-plugin-test-refcnt.c   |  18 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   1 +
 4 files changed, 118 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index bf1982e79c3..d7ecd7fce09 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -314,17 +314,20 @@ public:
   {
 diagnostic_metadata m;
 bool warned;
-// just assuming constants for now
-auto actual_refcnt
-   = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-warned = warning_meta (rich_loc, m, get_controlling_option (),
-  "expected %qE to have "
-  "reference count: %qE but ob_refcnt field is: %qE",
-  m_reg_tree, actual_refcnt, ob_refcnt);
-
-// location_t loc = rich_loc->get_loc ();
-// foo (loc);
+
+const auto *actual_refcnt_constant
+   = m_actual_refcnt->dyn_cast_constant_svalue ();
+const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue ();
+if (!actual_refcnt_constant || !ob_refcnt_constant)
+  return false;
+
+auto actual_refcnt = actual_refcnt_constant->get_constant ();
+auto ob_refcnt = ob_refcnt_constant->get_constant ();
+warned = warning_meta (
+   rich_loc, m, get_controlling_option (),
+   "expected %qE to have "
+   "reference count: N + %qE but ob_refcnt field is N + %qE",
+   m_reg_tree, actual_refcnt, ob_refcnt);
 return warned;
   }
 
@@ -336,10 +339,6 @@ public:
 
 private:
 
-  void foo(location_t loc) const 
-  {
-inform(loc, "something is up right here");
-  }
   const region *m_base_region;
   const svalue *m_ob_refcnt;
   const svalue *m_actual_refcnt;
@@ -369,6 +368,19 @@ increment_region_refcnt (hash_map 
, const region *key)
   refcnt = existed ? refcnt + 1 : 1;
 }
 
+static const region *
+get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
+{
+  const auto *region_sval = sval->dyn_cast_region_svalue ();
+  if (region_sval)
+return region_sval->get_pointee ();
+
+  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
+  if (initial_sval)
+return mgr->get_symbolic_region (initial_sval);
+
+  return nullptr;
+}
 
 /* Recursively fills in region_to_refcnt with the references owned by
pyobj_ptr_sval.  */
@@ -381,20 +393,9 @@ count_pyobj_references (const region_model *model,
   if (!pyobj_ptr_sval)
 return;
 
-  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
-  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
-  if (!pyobj_region_sval && !pyobj_initial_sval)
-return;
-
-  // todo: support initial sval (e.g passed in as parameter)
-  if (pyobj_initial_sval)
-{
-  // increment_region_refcnt (region_to_refcnt,
-  //  pyobj_initial_sval->get_region ());
-  return;
-}
+  region_model_manager *mgr = model->get_manager ();
 
-  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  const region *pyobj_region = get_region_from_svalue (pyobj_ptr_sval, mgr);
   if (!pyobj_region || seen.contains (pyobj_region))
 return;
 
@@ -409,49 +410,75 @@ count_pyobj_references (const region_model *model,
 return;
 
   const auto _binding_map = retval_cluster->get_map ();
-
   for (const 

[PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

2023-09-04 Thread Eric Feng via Gcc-patches
Hi Dave,

Recently I've been working on symbolic value support for the reference
count checker. I've attached a patch for it below; let me know it looks
OK for trunk. Thanks!

Best,
Eric

---

This patch enhances the reference count checker in the CPython plugin by
adding support for symbolic values. Whereas previously we were only able
to check the reference count of PyObject* objects created in the scope
of the function; we are now able to emit diagnostics on reference count
mismatch of objects that were, for example, passed in as a function
parameter.

rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but 
ob_refcnt field is N + ‘2’
6 |   return obj;
  |  ^~~
  ‘create_py_object2’: event 1
|
|6 |   return obj;
|  |  ^~~
|  |  |
|  |  (1) here
|


gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Support reference count 
checking
of symbolic values.
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: New test.
* gcc.dg/plugin/plugin.exp: New test.
* gcc.dg/plugin/cpython-plugin-test-refcnt.c: New test.

Signed-off-by: Eric Feng 

---
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 133 +++---
 .../cpython-plugin-test-PyList_Append.c   |  21 ++-
 .../plugin/cpython-plugin-test-refcnt.c   |  18 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   1 +
 4 files changed, 118 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index bf1982e79c3..d7ecd7fce09 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -314,17 +314,20 @@ public:
   {
 diagnostic_metadata m;
 bool warned;
-// just assuming constants for now
-auto actual_refcnt
-   = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-warned = warning_meta (rich_loc, m, get_controlling_option (),
-  "expected %qE to have "
-  "reference count: %qE but ob_refcnt field is: %qE",
-  m_reg_tree, actual_refcnt, ob_refcnt);
-
-// location_t loc = rich_loc->get_loc ();
-// foo (loc);
+
+const auto *actual_refcnt_constant
+   = m_actual_refcnt->dyn_cast_constant_svalue ();
+const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue ();
+if (!actual_refcnt_constant || !ob_refcnt_constant)
+  return false;
+
+auto actual_refcnt = actual_refcnt_constant->get_constant ();
+auto ob_refcnt = ob_refcnt_constant->get_constant ();
+warned = warning_meta (
+   rich_loc, m, get_controlling_option (),
+   "expected %qE to have "
+   "reference count: N + %qE but ob_refcnt field is N + %qE",
+   m_reg_tree, actual_refcnt, ob_refcnt);
 return warned;
   }
 
@@ -336,10 +339,6 @@ public:
 
 private:
 
-  void foo(location_t loc) const 
-  {
-inform(loc, "something is up right here");
-  }
   const region *m_base_region;
   const svalue *m_ob_refcnt;
   const svalue *m_actual_refcnt;
@@ -369,6 +368,19 @@ increment_region_refcnt (hash_map 
, const region *key)
   refcnt = existed ? refcnt + 1 : 1;
 }
 
+static const region *
+get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
+{
+  const auto *region_sval = sval->dyn_cast_region_svalue ();
+  if (region_sval)
+return region_sval->get_pointee ();
+
+  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
+  if (initial_sval)
+return mgr->get_symbolic_region (initial_sval);
+
+  return nullptr;
+}
 
 /* Recursively fills in region_to_refcnt with the references owned by
pyobj_ptr_sval.  */
@@ -381,20 +393,9 @@ count_pyobj_references (const region_model *model,
   if (!pyobj_ptr_sval)
 return;
 
-  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
-  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
-  if (!pyobj_region_sval && !pyobj_initial_sval)
-return;
-
-  // todo: support initial sval (e.g passed in as parameter)
-  if (pyobj_initial_sval)
-{
-  // increment_region_refcnt (region_to_refcnt,
-  //  pyobj_initial_sval->get_region ());
-  return;
-}
+  region_model_manager *mgr = model->get_manager ();
 
-  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  const region *pyobj_region = get_region_from_svalue (pyobj_ptr_sval, mgr);
   if (!pyobj_region || seen.contains (pyobj_region))
 return;
 
@@ -409,49 +410,75 @@ count_pyobj_references (const region_model *model,
 return;
 
   const auto _binding_map = retval_cluster->get_map ();
-
   for (const 

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-09-01 Thread Eric Feng via Gcc-patches
Thank you for the patch!

On Fri, Sep 1, 2023 at 10:51 AM David Malcolm  wrote:
>
> On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> > (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> >
> > > Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng
> > > 
> > > From: Eric Feng via Gcc 
> >
> > > gcc/testsuite/ChangeLog:
> > >   PR analyzer/107646
> > > * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > > reference count
> > >   * checking for PyObjects.
> > > * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> > > * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > > ...here (and
> > >   * added more tests).
> > > * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> > > * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > > (and added
> > >   * more tests).
> > > * gcc.dg/plugin/plugin.exp: New tests.
> > > * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> > > * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > > test.
> > > * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > > test.
> >
> > It seems this was more or less a rewrite, but that said,
> > it's generally preferable to always *add* tests, never *modify* them.
> >
> > >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > > +-
> >
> > ^^^ Ouch!  Was it not within reason to keep that test as it
> > was, and just add another test?
Thanks for the feedback. To clarify, 'analyzer_cpython_plugin.c' is
not a test itself but rather a plugin that currently lives within the
testsuite. The core of the test cases were also not modified, rather I
renamed certain filenames containing them for clarity (unless this is
what you meant in terms of modification, in which case noted) and
added to them. However, I understand the preference and will keep that
in mind.
> >
> > Anyway, the test after rewrite fails, and for some targets
> > like cris-elf and apparently m68k-linux, yields an error.
> > I see a PR was already opened.
> >
> > Also, mostly for future reference, several files in the
> > patch miss a final newline, as seen by a "\ No newline at
> > end of file"-marker.
Noted.
> >
> > I think I found the problem; a mismatch between default C++
> > language standard between host-gcc and target-gcc.
> >
> > (It's actually *not* as simple as "auto var = typeofvar()"
> > not being recognized in C++11 --or else there'd be an error
> > for the hash_set declaration too, which I just changed for
> > consistency-- but it's close enough for me.)
> >
> > With this, retesting plugin.exp for cris-elf works.
Sounds good, thanks again! I was also curious about why hash_map had
an issue here with that syntax whilst hash_set did not, so I tried to
investigate a bit further. I believe the issue was due to the compiler
having trouble disambiguating between the hash_map constructors in
C++11.

>From the error message we received:

test/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c:480:58:
error: no matching function for call to 'hash_map::hash_map(hash_map)'
   auto region_to_refcnt = hash_map ();

I think the compiler is mistakenly interpreting the call here as we
would like to create a new hash_map object using its copy constructor,
but we "forgot" to provide the object to be copied, rather than our
intention of using the default constructor.

Looking at hash_map.h and hash_set.h seems to support this hypothesis,
as hash_map has two constructors, one of which resembles a copy
constructor with additional arguments:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-map.h#L147.
Perhaps the default arguments here further complicated the ambiguity
as to which constructor to use in the presence of the empty
parenthesis.

On the other hand, hash_set has only the default constructor with
default parameters, and thus there is no ambiguity:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-set.h#L40.

I assume this ambiguity was cleared up by later versions, and thus we
observed no problems in C++17. However, I am certainly still a
relative newbie of C++, so please anyone feel free to correct my
reasoning and chime in!
> >
> > Ok to commit?
>
> Sorry about the failing tests.
>
> Thanks for the patch; please go ahead and commit.
>
> Dave
>
> >
> > -- >8 --
> > From: Hans-Peter Nilsson 
> > Date: Fri, 1 Sep 2023 04:36:03 +0200
> > Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> > declarations, PR t

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-09-01 Thread Eric Feng via Gcc
Thank you for the patch!

On Fri, Sep 1, 2023 at 10:51 AM David Malcolm  wrote:
>
> On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> > (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> >
> > > Cc: gcc@gcc.gnu.org, gcc-patc...@gcc.gnu.org, Eric Feng
> > > 
> > > From: Eric Feng via Gcc 
> >
> > > gcc/testsuite/ChangeLog:
> > >   PR analyzer/107646
> > > * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > > reference count
> > >   * checking for PyObjects.
> > > * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> > > * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > > ...here (and
> > >   * added more tests).
> > > * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> > > * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > > (and added
> > >   * more tests).
> > > * gcc.dg/plugin/plugin.exp: New tests.
> > > * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> > > * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > > test.
> > > * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > > test.
> >
> > It seems this was more or less a rewrite, but that said,
> > it's generally preferable to always *add* tests, never *modify* them.
> >
> > >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > > +-
> >
> > ^^^ Ouch!  Was it not within reason to keep that test as it
> > was, and just add another test?
Thanks for the feedback. To clarify, 'analyzer_cpython_plugin.c' is
not a test itself but rather a plugin that currently lives within the
testsuite. The core of the test cases were also not modified, rather I
renamed certain filenames containing them for clarity (unless this is
what you meant in terms of modification, in which case noted) and
added to them. However, I understand the preference and will keep that
in mind.
> >
> > Anyway, the test after rewrite fails, and for some targets
> > like cris-elf and apparently m68k-linux, yields an error.
> > I see a PR was already opened.
> >
> > Also, mostly for future reference, several files in the
> > patch miss a final newline, as seen by a "\ No newline at
> > end of file"-marker.
Noted.
> >
> > I think I found the problem; a mismatch between default C++
> > language standard between host-gcc and target-gcc.
> >
> > (It's actually *not* as simple as "auto var = typeofvar()"
> > not being recognized in C++11 --or else there'd be an error
> > for the hash_set declaration too, which I just changed for
> > consistency-- but it's close enough for me.)
> >
> > With this, retesting plugin.exp for cris-elf works.
Sounds good, thanks again! I was also curious about why hash_map had
an issue here with that syntax whilst hash_set did not, so I tried to
investigate a bit further. I believe the issue was due to the compiler
having trouble disambiguating between the hash_map constructors in
C++11.

>From the error message we received:

test/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c:480:58:
error: no matching function for call to 'hash_map::hash_map(hash_map)'
   auto region_to_refcnt = hash_map ();

I think the compiler is mistakenly interpreting the call here as we
would like to create a new hash_map object using its copy constructor,
but we "forgot" to provide the object to be copied, rather than our
intention of using the default constructor.

Looking at hash_map.h and hash_set.h seems to support this hypothesis,
as hash_map has two constructors, one of which resembles a copy
constructor with additional arguments:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-map.h#L147.
Perhaps the default arguments here further complicated the ambiguity
as to which constructor to use in the presence of the empty
parenthesis.

On the other hand, hash_set has only the default constructor with
default parameters, and thus there is no ambiguity:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-set.h#L40.

I assume this ambiguity was cleared up by later versions, and thus we
observed no problems in C++17. However, I am certainly still a
relative newbie of C++, so please anyone feel free to correct my
reasoning and chime in!
> >
> > Ok to commit?
>
> Sorry about the failing tests.
>
> Thanks for the patch; please go ahead and commit.
>
> Dave
>
> >
> > -- >8 --
> > From: Hans-Peter Nilsson 
> > Date: Fri, 1 Sep 2023 04:36:03 +0200
> > Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> > declarations, PR t

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Eric Feng via Gcc
On Thu, Aug 31, 2023 at 4:19 PM David Malcolm  wrote:
>
> On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> > On Thu, Aug 31, 2023 at 1:01 PM David Malcolm 
> > wrote:
> > >
> > > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
>
> [...]
>
> > > >
> > > > Thanks; pushed to trunk with nits fixed:
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > > .
> > >
> > > Thanks; looks good.
> > >
> > > Do you want to add this to the GCC 14 part of the "History" section
> > > on
> > > the wiki page:
> > >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > > or should I?
> > Happy to add it myself, but I'm not finding an option to edit the
> > page
> > (created an account under ef...@gcc.gnu.org). Do I need to be added
> > to
> > the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?
>
> I can do this.  What's your account's WikiName ?
Thank you — it is EricFeng.
>
> Thanks
> Dave
>


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Eric Feng via Gcc
On Thu, Aug 31, 2023 at 1:01 PM David Malcolm  wrote:
>
> On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> > On Tue, Aug 29, 2023 at 5:14 PM David Malcolm 
> > wrote:
> > >
> > > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > > Additionally, by using the old model and the pointer per your
> > > > suggestion,
> > > > we are able to find the representative tree and emit a more
> > > > accurate
> > > > diagnostic!
> > > >
> > > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > > ‘1’
> > > > but ob_refcnt field is: ‘2’
> > > >23 |   return list;
> > > >   |  ^~~~
> > > >   ‘create_py_object’: events 1-4
> > > > |
> > > > |4 |   PyObject* item = PyLong_FromLong(3);
> > > > |  |^~
> > > > |  ||
> > > > |  |(1) when ‘PyLong_FromLong’
> > > > succeeds
> > > > |5 |   PyObject* list = PyList_New(1);
> > > > |  |~
> > > > |  ||
> > > > |  |(2) when ‘PyList_New’ succeeds
> > > > |..
> > > > |   14 |   PyList_Append(list, item);
> > > > |  |   ~
> > > > |  |   |
> > > > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > > > |..
> > > > |   23 |   return list;
> > > > |  |  
> > > > |  |  |
> > > > |  |  (4) here
> > > > |
> > >
> > > Excellent, that's a big improvement.
> > >
> > > >
> > > > If a representative tree is not found, I decided we should just
> > > > bail
> > > > out
> > > > of emitting a diagnostic for now, to avoid confusing the user on
> > > > what
> > > > the problem is.
> > >
> > > Fair enough.
> > >
> > > >
> > > > I've attached the patch for this (on top of the previous one)
> > > > below.
> > > > If
> > > > it also looks good, I can merge it with the last patch and push
> > > > it in
> > > > at
> > > > the same time.
> > >
> > > I don't mind either way, but please can you update the tests so
> > > that we
> > > have some automated test coverage that the correct name is being
> > > printed in the warning.
> > >
> > > Thanks
> > > Dave
> > >
> >
> > Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> > to
> > preserve our chain on the list:
> >
> > ---
> >
> > Thanks; pushed to trunk with nits fixed:
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > .
>
> Thanks; looks good.
>
> Do you want to add this to the GCC 14 part of the "History" section on
> the wiki page:
>   https://gcc.gnu.org/wiki/StaticAnalyzer
> or should I?
Happy to add it myself, but I'm not finding an option to edit the page
(created an account under ef...@gcc.gnu.org). Do I need to be added to
the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?

>
> >
> > Incidentally, I updated my formatting settings in VSCode, which I've
> > previously mentioned in passing. In case anyone is interested:
> >
> > "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> > TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> > AlignAfterOpenBracket: Align,
> > AllowAllParametersOfDeclarationOnNextLine: true }",
> >
> > This fixes some issues with the indent width and also ensures
> > function
> > parameters of appropriate length are aligned properly and on a new
> > line each (like the rest of the analyzer code).
>
> Thanks
> Dave
>
>


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-30 Thread Eric Feng via Gcc
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm  wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >23 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > |..
> > |   23 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-30 Thread Eric Feng via Gcc-patches
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm  wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >23 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > |..
> > |   23 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).


[PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-29 Thread Eric Feng via Gcc
Additionally, by using the old model and the pointer per your suggestion,
we are able to find the representative tree and emit a more accurate diagnostic!

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but 
ob_refcnt field is: ‘2’
   23 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   23 |   return list;
|  |  
|  |  |
|  |  (4) here
|

If a representative tree is not found, I decided we should just bail out
of emitting a diagnostic for now, to avoid confusing the user on what
the problem is.

I've attached the patch for this (on top of the previous one) below. If
it also looks good, I can merge it with the last patch and push it in at
the same time.

Best,
Eric

---
 gcc/analyzer/region-model.cc  |  3 +-
 gcc/analyzer/region-model.h   |  7 ++--
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 35 +++
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb4f976b83a..c1d266d351b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5391,6 +5391,7 @@ region_model::pop_frame (tree result_lvalue,
 {
   gcc_assert (m_current_frame);
 
+  const region_model pre_popped_model = *this;
   const frame_region *frame_reg = m_current_frame;
 
   /* Notify state machines.  */
@@ -5424,7 +5425,7 @@ region_model::pop_frame (tree result_lvalue,
 }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
-  notify_on_pop_frame (this, retval, ctxt);
+  notify_on_pop_frame (this, _popped_model, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 440ea6d828d..b89c6f6c649 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -237,6 +237,7 @@ public:
 struct append_regions_cb_data;
 
 typedef void (*pop_frame_callback) (const region_model *model,
+   const region_model *prev_model,
const svalue *retval,
region_model_context *ctxt);
 
@@ -543,11 +544,13 @@ class region_model
   }
 
   static void
-  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  notify_on_pop_frame (const region_model *model,
+  const region_model *prev_model,
+  const svalue *retval,
   region_model_context *ctxt)
   {
 for (auto  : pop_frame_callbacks)
-   callback (model, retval, ctxt);
+   callback (model, prev_model, retval, ctxt);
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index b2caed8fc1b..6f0a355fe30 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -318,11 +318,10 @@ public:
 auto actual_refcnt
= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
 auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-warned = warning_meta (
-   rich_loc, m, get_controlling_option (),
-   "expected  to have "
-   "reference count: %qE but ob_refcnt field is: %qE",
-   actual_refcnt, ob_refcnt);
+warned = warning_meta (rich_loc, m, get_controlling_option (),
+  "expected %qE to have "
+  "reference count: %qE but ob_refcnt field is: %qE",
+  m_reg_tree, actual_refcnt, ob_refcnt);
 
 // location_t loc = rich_loc->get_loc ();
 // foo (loc);
@@ -425,7 +424,8 @@ count_expected_pyobj_references (const region_model *model,
 
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
-check_refcnt (const region_model *model, region_model_context *ctxt,
+check_refcnt (const region_model *model, const region_model *old_model,
+ region_model_context *ctxt,
  const hash_map::iterator::reference_pair region_refcnt)
 {
@@ -438,8 +438,11 @@ check_refcnt (const region_model *model, 
region_model_context *ctxt,
 
   if (ob_refcnt_sval != actual_refcnt_sval)
   {
-// todo: fix this
-tree reg_tree = model->get_representative_tree 

[PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-29 Thread Eric Feng via Gcc-patches
Additionally, by using the old model and the pointer per your suggestion,
we are able to find the representative tree and emit a more accurate diagnostic!

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but 
ob_refcnt field is: ‘2’
   23 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   23 |   return list;
|  |  
|  |  |
|  |  (4) here
|

If a representative tree is not found, I decided we should just bail out
of emitting a diagnostic for now, to avoid confusing the user on what
the problem is.

I've attached the patch for this (on top of the previous one) below. If
it also looks good, I can merge it with the last patch and push it in at
the same time.

Best,
Eric

---
 gcc/analyzer/region-model.cc  |  3 +-
 gcc/analyzer/region-model.h   |  7 ++--
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 35 +++
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb4f976b83a..c1d266d351b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5391,6 +5391,7 @@ region_model::pop_frame (tree result_lvalue,
 {
   gcc_assert (m_current_frame);
 
+  const region_model pre_popped_model = *this;
   const frame_region *frame_reg = m_current_frame;
 
   /* Notify state machines.  */
@@ -5424,7 +5425,7 @@ region_model::pop_frame (tree result_lvalue,
 }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
-  notify_on_pop_frame (this, retval, ctxt);
+  notify_on_pop_frame (this, _popped_model, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 440ea6d828d..b89c6f6c649 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -237,6 +237,7 @@ public:
 struct append_regions_cb_data;
 
 typedef void (*pop_frame_callback) (const region_model *model,
+   const region_model *prev_model,
const svalue *retval,
region_model_context *ctxt);
 
@@ -543,11 +544,13 @@ class region_model
   }
 
   static void
-  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  notify_on_pop_frame (const region_model *model,
+  const region_model *prev_model,
+  const svalue *retval,
   region_model_context *ctxt)
   {
 for (auto  : pop_frame_callbacks)
-   callback (model, retval, ctxt);
+   callback (model, prev_model, retval, ctxt);
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index b2caed8fc1b..6f0a355fe30 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -318,11 +318,10 @@ public:
 auto actual_refcnt
= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
 auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-warned = warning_meta (
-   rich_loc, m, get_controlling_option (),
-   "expected  to have "
-   "reference count: %qE but ob_refcnt field is: %qE",
-   actual_refcnt, ob_refcnt);
+warned = warning_meta (rich_loc, m, get_controlling_option (),
+  "expected %qE to have "
+  "reference count: %qE but ob_refcnt field is: %qE",
+  m_reg_tree, actual_refcnt, ob_refcnt);
 
 // location_t loc = rich_loc->get_loc ();
 // foo (loc);
@@ -425,7 +424,8 @@ count_expected_pyobj_references (const region_model *model,
 
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
-check_refcnt (const region_model *model, region_model_context *ctxt,
+check_refcnt (const region_model *model, const region_model *old_model,
+ region_model_context *ctxt,
  const hash_map::iterator::reference_pair region_refcnt)
 {
@@ -438,8 +438,11 @@ check_refcnt (const region_model *model, 
region_model_context *ctxt,
 
   if (ob_refcnt_sval != actual_refcnt_sval)
   {
-// todo: fix this
-tree reg_tree = model->get_representative_tree 

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc-patches
On Tue, Aug 29, 2023 at 12:32 AM Eric Feng  wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done 
> a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected  to 
> have reference count: ‘1’ but ob_refcnt field is: ‘2’
>23 |   return list;
>   |  ^~~~
>   ‘create_py_object’: events 1-4
> |
> |4 |   PyObject* item = PyLong_FromLong(3);
> |  |^~
> |  ||
> |  |(1) when ‘PyLong_FromLong’ succeeds
> |5 |   PyObject* list = PyList_New(1);
> |  |~
> |  ||
> |  |(2) when ‘PyList_New’ succeeds
> |..
> |   14 |   PyList_Append(list, item);
> |  |   ~
> |  |   |
> |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> |..
> |   23 |   return list;
> |  |  
> |  |  |
> |  |  (4) here
> |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
> * engine.cc (impl_region_model_context::warn): New optional parameter.
> * exploded-graph.h (class impl_region_model_context): Likewise.
> * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
> * region-model.h (struct append_regions_cb_data): Likewise.
> (class region_model): Likewise.
> (class region_model_context): New optional parameter.
> (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> * gcc.dg/plugin/plugin.exp: New tests.
> * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng 
>
> ---
>  gcc/analyzer/engine.cc|   8 +-
>  gcc/analyzer/exploded-graph.h |   4 +-
>  gcc/analyzer/region-model.cc  |   3 +
>  gcc/analyzer/region-model.h   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
>  c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c |  78 
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
> cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 
> 

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc
On Tue, Aug 29, 2023 at 12:32 AM Eric Feng  wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done 
> a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected  to 
> have reference count: ‘1’ but ob_refcnt field is: ‘2’
>23 |   return list;
>   |  ^~~~
>   ‘create_py_object’: events 1-4
> |
> |4 |   PyObject* item = PyLong_FromLong(3);
> |  |^~
> |  ||
> |  |(1) when ‘PyLong_FromLong’ succeeds
> |5 |   PyObject* list = PyList_New(1);
> |  |~
> |  ||
> |  |(2) when ‘PyList_New’ succeeds
> |..
> |   14 |   PyList_Append(list, item);
> |  |   ~
> |  |   |
> |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> |..
> |   23 |   return list;
> |  |  
> |  |  |
> |  |  (4) here
> |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
> * engine.cc (impl_region_model_context::warn): New optional parameter.
> * exploded-graph.h (class impl_region_model_context): Likewise.
> * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
> * region-model.h (struct append_regions_cb_data): Likewise.
> (class region_model): Likewise.
> (class region_model_context): New optional parameter.
> (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> * gcc.dg/plugin/plugin.exp: New tests.
> * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng 
>
> ---
>  gcc/analyzer/engine.cc|   8 +-
>  gcc/analyzer/exploded-graph.h |   4 +-
>  gcc/analyzer/region-model.cc  |   3 +
>  gcc/analyzer/region-model.h   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
>  c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c |  78 
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
> cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 
> 

[PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc-patches
Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected  to 
have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   23 |   return list;
|  |  
|  |  |
|  |  (4) here
|

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
* engine.cc (impl_region_model_context::warn): New optional parameter.
* exploded-graph.h (class impl_region_model_context): Likewise.
* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
* region-model.h (struct append_regions_cb_data): Likewise.
(class region_model): Likewise.
(class region_model_context): New optional parameter.
(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
* gcc.dg/plugin/plugin.exp: New tests.
* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng 

---
 gcc/analyzer/engine.cc|   8 +-
 gcc/analyzer/exploded-graph.h |   4 +-
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
 c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c |  78 
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => 
cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc

[PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc
Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected  to 
have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   23 |   return list;
|  |  
|  |  |
|  |  (4) here
|

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
* engine.cc (impl_region_model_context::warn): New optional parameter.
* exploded-graph.h (class impl_region_model_context): Likewise.
* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
* region-model.h (struct append_regions_cb_data): Likewise.
(class region_model): Likewise.
(class region_model_context): New optional parameter.
(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
* gcc.dg/plugin/plugin.exp: New tests.
* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng 

---
 gcc/analyzer/engine.cc|   8 +-
 gcc/analyzer/exploded-graph.h |   4 +-
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
 c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c |  78 
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => 
cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc

Update on CPython Extension Module -fanalyzer plugin development

2023-08-25 Thread Eric Feng via Gcc
Hi Dave,

Please find an updated WIP patch on reference count checking below. Some
parts aren't properly formatted yet; I apologize for that.

Since the last WIP patch, the major updates include:
- Updated certain areas of the core analyzer to support custom stmt_finder.
- A significant revamp of the reference count checking logic.
- __analyzer_cpython_dump_refcounts: This dumps reference count related 
information.

Here's an updated look at the current WIP diagnostic we issue:
rc3.c:25:10: warning: Expected  to 
have reference count: ‘1’ but ob_refcnt field is: ‘2’
   25 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   16 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   25 |   return list;
|  |  
|  |  |
|  |  (4) here
|

The reference count checking logic for v1 should be almost complete.
Currently, it supports situations where the returned object is newly created.
It doesn't yet support the other case (i.e., incremented by 1 from
what it was previously, if not newly created). 

This week, I've focused primarily on the reference count checking logic. I plan
to shift my focus to refining the diagnostic next. As seen in the placeholder
diagnostic message above, I believe we should at least inform the user about the
variable name associated with the region that has an unexpected reference count
issue (in this case, 'item'). Initially, I suspect the issue might be that:

tree reg_tree = model->get_representative_tree (curr_region);

returns NULL since curr_region is heap allocated and thus the path_var returned 
would be:

path_var (NULL_TREE, 0);

Which means that:

refcnt_stmt_finder finder (*eg, reg_tree);

always receives a NULL_TREE, causing it to always default to the
not_found case. A workaround might be necessary, but I haven't delved too 
deeply into this yet,
so my suspicion could be off. Additionally, I think it would be helpful to show 
users what the
ob_refcnt looks like in each event as well. I'll keep you updated on both these
points and welcome any feedback.

Best,
Eric
---
 gcc/analyzer/engine.cc|   8 +-
 gcc/analyzer/exploded-graph.h |   4 +-
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  38 +-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
 5 files changed, 421 insertions(+), 8 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 61685f43fba..f9e239128a0 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr d)
+impl_region_model_context::warn (std::unique_ptr d,
+const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
 {
   if (get_logger ())
get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn 
(std::unique_ptr d)
   bool terminate_path = d->terminate_path_p ();
   if (m_eg->get_diagnostic_manager ().add_diagnostic
  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-  m_stmt, m_stmt_finder, std::move (d)))
+  m_stmt, curr_stmt_finder, std::move (d)))
{
  if (m_path_ctxt
  && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 4a4ef9d12b4..633f8c263fc 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 uncertainty_t *uncertainty,
 logger *logger = NULL);
 
-  bool warn (std::unique_ptr d) final override;
+  bool warn (std::unique_ptr d,
+const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr pn) final override;
   void on_svalue_leak (const svalue *) override;
   void on_liveness_change (const svalue_set _svalues,
@@ -106,6 +107,7 @@ class impl_region_model_context : public 
region_model_context
 std::unique_ptr *out_sm_context) override;
 
   const gimple *get_stmt () 

Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-24 Thread Eric Feng via Gcc
On Wed, Aug 23, 2023 at 7:16 PM David Malcolm  wrote:
>
> On Wed, 2023-08-23 at 17:15 -0400, Eric Feng wrote:
> > On Mon, Aug 21, 2023 at 11:04 AM David Malcolm 
> > wrote:
> > >
> > > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > > > Hi Dave,
> > > >
> > > > Just wanted to give you and everyone else a short update on how
> > > > reference count checking is going — we can now observe the refcnt
> > > > diagnostic being emitted:
> > > >
> > > > rc3.c:22:10: warning: REF COUNT PROBLEM
> > > >22 |   return list;
> > > >   |  ^~~~
> > > >   ‘create_py_object’: events 1-4
> > > > |
> > > > |4 |   PyObject* item = PyLong_FromLong(3);
> > > > |  |^~
> > > > |  ||
> > > > |  |(1) when ‘PyLong_FromLong’
> > > > succeeds
> > > > |5 |   PyObject* list = PyList_New(1);
> > > > |  |~
> > > > |  ||
> > > > |  |(2) when ‘PyList_New’ succeeds
> > > > |..
> > > > |   14 |   PyList_Append(list, item);
> > > > |  |   ~
> > > > |  |   |
> > > > |  |   (3) when ‘PyList_Append’ fails
> > > > |..
> > > > |   22 |   return list;
> > > > |  |  
> > > > |  |  |
> > > > |  |  (4) here
> > > > |
> > > >
> > > > I will fix up and refactor the logic for counting the actual ref
> > > > count
> > > > before coming back and refining the diagnostic to give much more
> > > > detailed information.
> > >
> > > Excellent!  Thanks for the update.
> > >
> > > Dave
> > >
> >
> > Hi Dave,
> >
> > I've since fixed up the logic to count the actual reference counts of
> > the PyObject* instances.
>
> Sounds promising.
>
> > Now, I'm contemplating the specific
> > diagnostics we'd want to issue and the appropriate conditions for
> > emitting them. With this in mind, I wanted to check in with you on
> > the
> > appropriate approach:
> >
> > To start, I'm adopting the same assumptions as cpychecker for
> > functions returning a PyObject*. That is, I'm operating under the
> > premise that by default such functions return a new reference upon
> > success rather than, for example, a borrowed reference (which we can
> > tackle later on). Given this, it's my understanding that the
> > reference
> > count of the returned object should be 1 if the object is newly
> > created within the function body and incremented by 1 from what it
> > was
> > previously if not newly created (e.g passed in as an argument).
> > Furthermore, the reference count for any PyObject* instances created
> > within the function should be 0, barring situations where we're
> > returning a collection, like a list, that includes references to
> > these
> > objects.
> >
> > Let me know what you think; thanks!
>
> This sounds like a good approach for v1 of the implementation.
>
> It's probably best to focus on getting a simple version of the patch
> into trunk, and leave any polish of it to followups.
>
> In terms of deciding what the reference count of a returned PyObject *
> ought to be, cpychecker had logic to try to detect callbacks used by
> PyMethodDef tables, so that e.g. in:
>
> static PyMethodDef widget_methods[] = {
> {"display",
>  (PyCFunction)widget_display,
>  (METH_VARARGS | METH_KEYWORDS), /* ml_flags */
>  NULL},
>
> {NULL, NULL, 0, NULL} /* terminator */
> };
>
> ...we'd know that the callback function "widget_display" follows the
> standard rules for a PyCFunction (e.g. returns a new reference).
>
> But that's for later; don't bother trying to implement that until we
> have the basics working.
I see; sounds good!
>
> Is it worth posting a work-in-progress patch of what you have so far?
> (you don't need to bother with a ChangeLog for that)
Will post a WIP soon. Thanks!
>
> Dave
>


Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-23 Thread Eric Feng via Gcc
On Mon, Aug 21, 2023 at 11:04 AM David Malcolm  wrote:
>
> On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Just wanted to give you and everyone else a short update on how
> > reference count checking is going — we can now observe the refcnt
> > diagnostic being emitted:
> >
> > rc3.c:22:10: warning: REF COUNT PROBLEM
> >22 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ fails
> > |..
> > |   22 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
> >
> > I will fix up and refactor the logic for counting the actual ref
> > count
> > before coming back and refining the diagnostic to give much more
> > detailed information.
>
> Excellent!  Thanks for the update.
>
> Dave
>

Hi Dave,

I've since fixed up the logic to count the actual reference counts of
the PyObject* instances. Now, I'm contemplating the specific
diagnostics we'd want to issue and the appropriate conditions for
emitting them. With this in mind, I wanted to check in with you on the
appropriate approach:

To start, I'm adopting the same assumptions as cpychecker for
functions returning a PyObject*. That is, I'm operating under the
premise that by default such functions return a new reference upon
success rather than, for example, a borrowed reference (which we can
tackle later on). Given this, it's my understanding that the reference
count of the returned object should be 1 if the object is newly
created within the function body and incremented by 1 from what it was
previously if not newly created (e.g passed in as an argument).
Furthermore, the reference count for any PyObject* instances created
within the function should be 0, barring situations where we're
returning a collection, like a list, that includes references to these
objects.

Let me know what you think; thanks!

Best,
Eric


Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-21 Thread Eric Feng via Gcc
Hi Dave,

Just wanted to give you and everyone else a short update on how
reference count checking is going — we can now observe the refcnt
diagnostic being emitted:

rc3.c:22:10: warning: REF COUNT PROBLEM
   22 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ fails
|..
|   22 |   return list;
|  |  
|  |  |
|  |  (4) here
|

I will fix up and refactor the logic for counting the actual ref count
before coming back and refining the diagnostic to give much more
detailed information.

Best,
Eric


On Wed, Aug 16, 2023 at 9:47 PM Eric Feng  wrote:
>
> Hi Dave,
>
> Thanks for the feedback!
>
>
> On Wed, Aug 16, 2023 at 5:29 PM David Malcolm  wrote:
> >
> > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > > Hi everyone,
> >
> > [fixing typo in my email address]
> >
> > Hi Eric, thanks for the update, and the WIP patch.
> >
> > >
> > > After pushing the code that supports various known function classes last 
> > > week,
> > > I've turned my attention back to the core reference count checking
> > > functionality. This functionality used to reside in region_model, which
> > > wasn't ideal. To address this, I've introduced a hook to register 
> > > callbacks
> > > to pop_frame. Specifically, this allows the code that checks the reference
> > > count and emits diagnostics to be housed within the plugin, rather than 
> > > the
> > > core analyzer.
> > >
> > > As of now, the parameters of pop_frame_callback are tailored specifically 
> > > to
> > > our needs. If the use of callbacks at the end of pop_frame becomes more
> > > prevalent, we can revisit the setup to potentially make it more general.
> > >
> > > Moreover, the core reference count checking logic was previously somewhat
> > > bloated, contained in one extensive function. I've since refactored it,
> > > breaking it down into several helper functions to simplify and reduce
> > > complexity. There are still some aspects that need refinement, especially
> > > since the plugin has seen changes since I last worked on this logic. 
> > > However,
> > > I believe that there aren't any significant problems.
> >
> > Suggestion: introduce some more decls into analyzer-decls.h and
> > known_functions for them into the plugin so that you can run/test/debug
> > the helper functions independently (similar to the existing ones in kf-
> > analyzer.cc).
> >
> > e.g.
> >   extern void __analyzer_cpython_dump_real_refcounts (void);
> >   extern void __analyzer_cpython_dump_ob_refcnt (void);
> >
> > >
> Thanks for the suggestion. This will be even more helpful now that we
> have split the logic into helper functions. I will look into these
> when I come back to the "is there a refcount problem" side of the
> equation.
> > > Currently, I've started working a custom stmt_finder similar to 
> > > leak_stmt_finder
> > > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > > region_model::pop_frame. This approach was discussed as a viable solution 
> > > in
> > > a previous email, and I'll keep everyone posted on my progress. 
> > > Afterwards, I
> > > will go back to address the refinements necessary mentioned above.
> >
> > You might want to experiment with splitting out
> > (a) "is there a refcount problem" from
> > (b) "emit a refcount problem".
> >
> > For example, you could hardcode (a) to true, so we always complain with
> > (b) on every heap-allocated object, just to debug the stmt_finder
> > workaround.
> >
> >
> > [...snip...]
> >
> > BTW, you don't need to bother to write ChangeLog entries if you're just
> > sending a work-in-progress for me.
> >
> > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
> > > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > index 7cd72e8a886..918bb5a5587

Re: [PATCH] testsuite: Improve test in dg-require-python-h

2023-08-18 Thread Eric Feng via Gcc-patches
Thanks for the patch, Thiago. I've pushed it to trunk:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6785917c9103e18bba0d718ac3b65a386d9a14f7.

Best,
Eric

On Fri, Aug 18, 2023 at 2:11 PM David Malcolm  wrote:
>
> On Thu, 2023-08-17 at 23:30 -0300, Thiago Jung Bauermann wrote:
> > If GCC is tested with a sysroot which doesn't contain a Python
> > installation (e.g., with a command such as
> > "make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but
> > there's
> > a python3-config in $PATH, then the testsuite will pick up the host's
> > Python.h which can't actually be used:
> >
> > Executing on host: python3-config --includes(timeout = 300)
> > spawn -ignore SIGHUP python3-config --includes
> > -I/usr/include/python3.10 -I/usr/include/python3.10
> > Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --
> > sysroot=/some/sysroot/libc -Wl,-dynamic-
> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> > rpath=/some/sysroot/libc/lib
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> > 2.c-fdiagnostics-plain-output  -
> > fplugin=./analyzer_cpython_plugin.so -fanalyzer -
> > I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-
> > plugin-test-2.s(timeout = 600)
> > spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc
> > --sysroot=/some/sysroot/libc -Wl,-dynamic-
> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> > rpath=/some/sysroot/libc/lib
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> > -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -
> > fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o
> > cpython-plugin-test-2.s
> > In file included from /usr/include/python3.10/Python.h:8,
> >  from
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> > 2.c:8:
> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> > gnu/python3.10/pyconfig.h: No such file or directory
> > compilation terminated.
> > compiler exited with status 1
> >
> > This problem causes these testsuite failures:
> >
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so (test for excess errors)
> > Excess errors:
> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> > gnu/python3.10/pyconfig.h: No such file or directory
> > compilation terminated.
> >
> > So try to compile a test file so that the testcase can be marked as
> > unsupported instead.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc/testsuite/lib/target-supports.exp (dg-require-python-
> > h): Test
> > whether Python.h can really be used.
> > ---
> >  gcc/testsuite/lib/target-supports.exp | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-supports.exp
> > index 92b6f69730e9..5b5f86551844 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -12570,11 +12570,21 @@ proc dg-require-python-h { args } {
> >
> >  verbose "ENTER dg-require-python-h" 2
> >
> > +set supported 0
> >  set result [remote_exec host "python3-config --includes"]
> >  set status [lindex $result 0]
> >  if { $status == 0 } {
> > -set python_flags [lindex $result 1]
> > -} else {
> > +   # Remove trailing newline from python3-config output.
> > +   set python_flags [string trim [lindex $result 1]]
> 

Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-16 Thread Eric Feng via Gcc
Hi Dave,

Thanks for the feedback!


On Wed, Aug 16, 2023 at 5:29 PM David Malcolm  wrote:
>
> On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > Hi everyone,
>
> [fixing typo in my email address]
>
> Hi Eric, thanks for the update, and the WIP patch.
>
> >
> > After pushing the code that supports various known function classes last 
> > week,
> > I've turned my attention back to the core reference count checking
> > functionality. This functionality used to reside in region_model, which
> > wasn't ideal. To address this, I've introduced a hook to register callbacks
> > to pop_frame. Specifically, this allows the code that checks the reference
> > count and emits diagnostics to be housed within the plugin, rather than the
> > core analyzer.
> >
> > As of now, the parameters of pop_frame_callback are tailored specifically to
> > our needs. If the use of callbacks at the end of pop_frame becomes more
> > prevalent, we can revisit the setup to potentially make it more general.
> >
> > Moreover, the core reference count checking logic was previously somewhat
> > bloated, contained in one extensive function. I've since refactored it,
> > breaking it down into several helper functions to simplify and reduce
> > complexity. There are still some aspects that need refinement, especially
> > since the plugin has seen changes since I last worked on this logic. 
> > However,
> > I believe that there aren't any significant problems.
>
> Suggestion: introduce some more decls into analyzer-decls.h and
> known_functions for them into the plugin so that you can run/test/debug
> the helper functions independently (similar to the existing ones in kf-
> analyzer.cc).
>
> e.g.
>   extern void __analyzer_cpython_dump_real_refcounts (void);
>   extern void __analyzer_cpython_dump_ob_refcnt (void);
>
> >
Thanks for the suggestion. This will be even more helpful now that we
have split the logic into helper functions. I will look into these
when I come back to the "is there a refcount problem" side of the
equation.
> > Currently, I've started working a custom stmt_finder similar to 
> > leak_stmt_finder
> > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > region_model::pop_frame. This approach was discussed as a viable solution in
> > a previous email, and I'll keep everyone posted on my progress. Afterwards, 
> > I
> > will go back to address the refinements necessary mentioned above.
>
> You might want to experiment with splitting out
> (a) "is there a refcount problem" from
> (b) "emit a refcount problem".
>
> For example, you could hardcode (a) to true, so we always complain with
> (b) on every heap-allocated object, just to debug the stmt_finder
> workaround.
>
>
> [...snip...]
>
> BTW, you don't need to bother to write ChangeLog entries if you're just
> sending a work-in-progress for me.
>
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
> > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index 7cd72e8a886..918bb5a5587 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
>
> [...]
>
> > +/* For PyListObjects: processes the ob_item field within the current 
> > region and
> > + * increments the reference count if conditions are met. */
> > +void
> > +process_ob_item_region (const region_model *model, region_model_manager 
> > *mgr,
> > +   region_model_context *ctxt, const region 
> > *curr_region,
> > +   const svalue *pylist_type_ptr, const region 
> > *base_reg,
> > +   int _refcnt)
>
> You seem to be special-casing PyListObject here; why?  That seems like
> it's not going to be scalable tothe general case.
>
> Am I right in thinking the intent of this code is to count the actual
> number of pointers in memory that point to a particular region?
>
> Doesn't the ob_item buffer show up in the store as another cluster?
> Can't you just look at the bindings in the clusters and tally up the
> pointers for each heap_allocated_region? (accumulating a result map
> from region to int of the actual reference counts).
Yes, you're correct, I hadn't thought of that ... This simplifies a
lot of things. Thanks for the great suggestion!
>
> Or am I missing something?
>
> What does
>   model->debug ();
> show in your examples?
>
>
> > +{
> > +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, 
> > "ob_item");
> &

Update on CPython Extension Module -fanalyzer plugin development

2023-08-16 Thread Eric Feng via Gcc
Hi everyone,

After pushing the code that supports various known function classes last week,
I've turned my attention back to the core reference count checking 
functionality. This functionality used to reside in region_model, which 
wasn't ideal. To address this, I've introduced a hook to register callbacks 
to pop_frame. Specifically, this allows the code that checks the reference 
count and emits diagnostics to be housed within the plugin, rather than the 
core analyzer.

As of now, the parameters of pop_frame_callback are tailored specifically to 
our needs. If the use of callbacks at the end of pop_frame becomes more 
prevalent, we can revisit the setup to potentially make it more general.

Moreover, the core reference count checking logic was previously somewhat 
bloated, contained in one extensive function. I've since refactored it, 
breaking it down into several helper functions to simplify and reduce 
complexity. There are still some aspects that need refinement, especially 
since the plugin has seen changes since I last worked on this logic. However, 
I believe that there aren't any significant problems.

Currently, I've started working a custom stmt_finder similar to 
leak_stmt_finder 
to address the issue of m_stmt and m_stmt_finder being NULL at the time of 
region_model::pop_frame. This approach was discussed as a viable solution in 
a previous email, and I'll keep everyone posted on my progress. Afterwards, I 
will go back to address the refinements necessary mentioned above.

For those interested, I've attached a WIP patch that highlights the specific 
changes mentioned above.

Best,
Eric

gcc/analyzer/ChangeLog:

* region-model.cc (region_model::pop_frame): New callback
* mechanism.
* region-model.h (struct append_regions_cb_data): New variables.
(class region_model): New functions and variables.

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/analyzer_cpython_plugin.c: New functions on
* reference count checking.

---
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  19 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 234 +-
 3 files changed, 254 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 494a9cdf149..18cea279e53 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
purposes.  */
 
@@ -4813,6 +4815,7 @@ region_model::pop_frame (tree result_lvalue,
 }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4f09f2e585a..2fe6a60f7ba 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+   const svalue *retval,
+   region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
a tree of regions, along with their associated values.
The representation is graph-like because values can be pointers to
@@ -505,6 +509,20 @@ class region_model
   void check_for_null_terminated_string_arg (const call_details ,
 unsigned idx);
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback )
+  {
+pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  region_model_context *ctxt)
+  {
+for (auto  : pop_frame_callbacks)
+   callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -592,6 +610,7 @@ private:
tree callee_fndecl,
region_model_context *ctxt) 
const;
 
+  static auto_vec pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..918bb5a5587 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"

Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc
I've noticed there were still some strange indentations in the last
patch ... however, I think I've finally figured out a sane formatting
solution for me (fingers crossed). I will address them in the
follow-up patch at the same time as adding more test coverage.

---

In case, anyone else using VSCode has been having issues with
formatting according to GNU/GCC conventions, these are the relevant
formatting settings that I've found work for me. Assuming the C/C++
extension is installed, then in settings.json:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 8 }"

Just setting the base style to GNU formats everything correctly except
for the fact that indentation defaults to spaces (which is what I've
been struggling with fixing manually in the last few patches). The
rest of the settings are for replacing blocks of 8 spaces with tabs
(which is a requirement in check_GNU_style). In combination, this
works for everything except for header files for some reason, but I'll
defer that battle to another day.

On Fri, Aug 11, 2023 at 1:47 PM Eric Feng  wrote:
>
> Thanks for the feedback! I've incorporated the changes (aside from
> expanding test coverage, which I plan on releasing in a follow-up),
> rebased, and performed a bootstrap and regtest on
> aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
> with nits fixed and no problems after rebase, the patch has now been pushed.
>
> Best,
> Eric
>
> ---
>
> This patch adds known function subclasses for Python/C API functions
> PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
> optional parameters for
> region_model::get_or_create_region_for_heap_alloc, allowing for the
> newly allocated region to immediately transition from the start state to
> the assumed non-null state in the malloc state machine if desired.
> Finally, it adds a new procedure, dg-require-python-h, intended as a
> directive in Python-related analyzer tests, to append necessary Python
> flags during the tests' build process.
>
> The main warnings we gain in this patch with respect to the known function
> subclasses mentioned are leak related. For example:
>
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
> │
>21 |   return list;
>   │
>   |  ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
> |
> │
> |4 |   PyObject* item = PyLong_FromLong(10);
> │
> |  |^~~
> │
> |  ||
> │
> |  |(1) allocated here
> │
> |  |(2) when ‘PyLong_FromLong’ succeeds
> │
> |5 |   PyObject* list = PyList_New(2);
> │
> |  |~
> │
> |  ||
> │
> |  |(3) when ‘PyList_New’ fails
> │
> |..
> │
> |   21 |   return list;
> │
> |  |  
> │
> |  |  |
> │
> |  |  (4) ‘item’ leaks here; was allocated at (1)
> │
>
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual 
> implementation
> follows.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
> * call-details.h: New function.
> * region-model.cc (region_model::get_or_create_region_for_heap_alloc):
> New optional parameters.
> * region-model.h (class region_model): New optional parameters.
> * sm-malloc.cc (on_realloc_with_move): New function.
> (region_model::transition_ptr_sval_non_null): New function.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
> PyList_New, PyList_Append, PyLong_FromLong
> * gcc.dg/plugin/plugin.exp: New test.
> * lib/target-supports.exp: New procedure.
> * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
>
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/call-details.h   |   4 +
>  gcc/analyzer/region-model.cc  |  17 +-
>  gcc/analyzer/region-model.h   |  14 +-
>  gcc/analyzer/sm-malloc.cc |  42 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
>  gcc/testsuite/lib/target-supports.exp |  25 +
>  8 files changed, 899 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>
> 

Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc-patches
I've noticed there were still some strange indentations in the last
patch ... however, I think I've finally figured out a sane formatting
solution for me (fingers crossed). I will address them in the
follow-up patch at the same time as adding more test coverage.

---

In case, anyone else using VSCode has been having issues with
formatting according to GNU/GCC conventions, these are the relevant
formatting settings that I've found work for me. Assuming the C/C++
extension is installed, then in settings.json:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 8 }"

Just setting the base style to GNU formats everything correctly except
for the fact that indentation defaults to spaces (which is what I've
been struggling with fixing manually in the last few patches). The
rest of the settings are for replacing blocks of 8 spaces with tabs
(which is a requirement in check_GNU_style). In combination, this
works for everything except for header files for some reason, but I'll
defer that battle to another day.

On Fri, Aug 11, 2023 at 1:47 PM Eric Feng  wrote:
>
> Thanks for the feedback! I've incorporated the changes (aside from
> expanding test coverage, which I plan on releasing in a follow-up),
> rebased, and performed a bootstrap and regtest on
> aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
> with nits fixed and no problems after rebase, the patch has now been pushed.
>
> Best,
> Eric
>
> ---
>
> This patch adds known function subclasses for Python/C API functions
> PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
> optional parameters for
> region_model::get_or_create_region_for_heap_alloc, allowing for the
> newly allocated region to immediately transition from the start state to
> the assumed non-null state in the malloc state machine if desired.
> Finally, it adds a new procedure, dg-require-python-h, intended as a
> directive in Python-related analyzer tests, to append necessary Python
> flags during the tests' build process.
>
> The main warnings we gain in this patch with respect to the known function
> subclasses mentioned are leak related. For example:
>
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
> │
>21 |   return list;
>   │
>   |  ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
> |
> │
> |4 |   PyObject* item = PyLong_FromLong(10);
> │
> |  |^~~
> │
> |  ||
> │
> |  |(1) allocated here
> │
> |  |(2) when ‘PyLong_FromLong’ succeeds
> │
> |5 |   PyObject* list = PyList_New(2);
> │
> |  |~
> │
> |  ||
> │
> |  |(3) when ‘PyList_New’ fails
> │
> |..
> │
> |   21 |   return list;
> │
> |  |  
> │
> |  |  |
> │
> |  |  (4) ‘item’ leaks here; was allocated at (1)
> │
>
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual 
> implementation
> follows.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
> * call-details.h: New function.
> * region-model.cc (region_model::get_or_create_region_for_heap_alloc):
> New optional parameters.
> * region-model.h (class region_model): New optional parameters.
> * sm-malloc.cc (on_realloc_with_move): New function.
> (region_model::transition_ptr_sval_non_null): New function.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
> PyList_New, PyList_Append, PyLong_FromLong
> * gcc.dg/plugin/plugin.exp: New test.
> * lib/target-supports.exp: New procedure.
> * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
>
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/call-details.h   |   4 +
>  gcc/analyzer/region-model.cc  |  17 +-
>  gcc/analyzer/region-model.h   |  14 +-
>  gcc/analyzer/sm-malloc.cc |  42 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
>  gcc/testsuite/lib/target-supports.exp |  25 +
>  8 files changed, 899 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>
> 

[COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc
Thanks for the feedback! I've incorporated the changes (aside from
expanding test coverage, which I plan on releasing in a follow-up),
rebased, and performed a bootstrap and regtest on
aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
with nits fixed and no problems after rebase, the patch has now been pushed. 

Best,
Eric

---

This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* call-details.h: New function.
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/call-details.h   |   4 +
 gcc/analyzer/region-model.cc  |  17 +-
 gcc/analyzer/region-model.h   |  14 +-
 gcc/analyzer/sm-malloc.cc |  42 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 8 files changed, 899 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 24be2247e63..bf2601151ea 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -49,6 +49,10 @@ public:
 return POINTER_TYPE_P (get_arg_type (idx));
   }
   bool arg_is_size_p (unsigned idx) const;
+  bool arg_is_integral_p (unsigned idx) const
+  {
+return INTEGRAL_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 094b7af3dbc..aa9fe008b9d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-  

[COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc-patches
Thanks for the feedback! I've incorporated the changes (aside from
expanding test coverage, which I plan on releasing in a follow-up),
rebased, and performed a bootstrap and regtest on
aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
with nits fixed and no problems after rebase, the patch has now been pushed. 

Best,
Eric

---

This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* call-details.h: New function.
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/call-details.h   |   4 +
 gcc/analyzer/region-model.cc  |  17 +-
 gcc/analyzer/region-model.h   |  14 +-
 gcc/analyzer/sm-malloc.cc |  42 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 8 files changed, 899 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 24be2247e63..bf2601151ea 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -49,6 +49,10 @@ public:
 return POINTER_TYPE_P (get_arg_type (idx));
   }
   bool arg_is_size_p (unsigned idx) const;
+  bool arg_is_integral_p (unsigned idx) const
+  {
+return INTEGRAL_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 094b7af3dbc..aa9fe008b9d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-  

[COMMITTED] MAINTAINERS: Add myself to write after approval

2023-08-11 Thread Eric Feng via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself.

Signed-off-by: Eric Feng 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1e54844c905..7a3ad68bc42 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -411,6 +411,7 @@ Chris Fairles   

 Alessandro Fanfarillo  
 Changpeng Fang 
 Sam Feifer 
+Eric Feng  
 Li Feng
 Thomas Fitzsimmons 
 Alexander Fomin

-- 
2.30.2



[PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-09 Thread Eric Feng via Gcc-patches
Thank you for your help in getting dg-require-python-h working! I can
confirm that the FAILs are related to differences between the --cflags
affecting the gimple seen by the analyzer. For this reason, I have
changed it to --includes for now. To be sure, I tested on Python 3.8 as
well and it works as expected. I have also addressed the following
comments on the WIP patch as you described.

-- Update Changelog entry to list new functions being simulated.
-- Update region_model::get_or_create_region_for_heap_alloc leading
comment.
-- Change register_alloc to update_state_machine.
-- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
-- Static helper functions for:
-- Initializing ob_refcnt field.
-- Setting ob_type field.
-- Getting ob_base field.
-- Initializing heap allocated region for PyObjects.
-- Incrementing a field by one.
-- Change arg_is_long_p to arg_is_integral_p.
-- Extract common failure scenario for reusability.

The initial WIP patch using 

/* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */

have been bootstrapped and regtested on aarch64-unknown-linux-gnu. Since
we did not change any core logic in the revision and the only changes
within the analyzer core are changing variable names, is it OK for
trunk. In the mean time, the revised patch is currently going through
bootstrap and regtest process.

Best,
Eric

---
This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/region-model.cc  |  20 +-
 gcc/analyzer/region-model.h   |  10 +-
 gcc/analyzer/sm-malloc.cc |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 711 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 7 files changed, 881 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c338f045d92 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing 

[PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-09 Thread Eric Feng via Gcc
Thank you for your help in getting dg-require-python-h working! I can
confirm that the FAILs are related to differences between the --cflags
affecting the gimple seen by the analyzer. For this reason, I have
changed it to --includes for now. To be sure, I tested on Python 3.8 as
well and it works as expected. I have also addressed the following
comments on the WIP patch as you described.

-- Update Changelog entry to list new functions being simulated.
-- Update region_model::get_or_create_region_for_heap_alloc leading
comment.
-- Change register_alloc to update_state_machine.
-- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
-- Static helper functions for:
-- Initializing ob_refcnt field.
-- Setting ob_type field.
-- Getting ob_base field.
-- Initializing heap allocated region for PyObjects.
-- Incrementing a field by one.
-- Change arg_is_long_p to arg_is_integral_p.
-- Extract common failure scenario for reusability.

The initial WIP patch using 

/* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */

have been bootstrapped and regtested on aarch64-unknown-linux-gnu. Since
we did not change any core logic in the revision and the only changes
within the analyzer core are changing variable names, is it OK for
trunk. In the mean time, the revised patch is currently going through
bootstrap and regtest process.

Best,
Eric

---
This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/region-model.cc  |  20 +-
 gcc/analyzer/region-model.h   |  10 +-
 gcc/analyzer/sm-malloc.cc |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 711 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 7 files changed, 881 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c338f045d92 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing 

[PATCH] WIP for dg-require-python-h [PR107646]

2023-08-08 Thread Eric Feng via Gcc
Unfortunately, there doesn’t seem to be any ERRORs in the .log nor any of the 
debug print statements which I’ve scattered within proc dg-require-python-h 
when run. I’ve attached the WIP below; thank you! Please note that in this 
version of the patch, I’ve removed the other (non Python) test cases in 
plugin.exp for convenience. 

Aside from issues with dg-require-python-h, everything works as expected (when 
using /* { dg-options "-fanalyzer -I/usr/include/python3.9" }. The patch 
includes support for PyList_New, PyLong_FromLong, PyList_Append and also the 
optional parameters for get_or_create_region_for_heap_alloc as we previously 
discussed. I will submit the version of the patch sans dg-require-python-h to 
gcc-patches for review as soon as I confirm regtests pass as expected; perhaps 
we can first push these changes to trunk and later push a separate patch for 
dg-require-python-h. 

Best,
Eric
>
> > > >
> > > > Maybe someone else on the list can see a less hackish way to get
> > > > this
> > > > to work?
> > > >
> > > > Let me know if any of the above is unclear.
> > > > Dave
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Eric
> > > > > > >
> > > > > > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > > > > > 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > > > > >
> > > > > > > > > > My guess is that you were trying to do it from the
> > > > > > > > > > PLUGIN_ANALYZER_INIT
> > > > > > > > > > hook rather than from the plugin_init function, but
> > > > > > > > > > it's
> > > > > > > > > > hard
> > > > > > > > > > to be
> > > > > > > > > > sure without seeing the code.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks Dave, you are entirely right — I made the
> > > > > > > > > mistake of
> > > > > > > > > trying to
> > > > > > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > > > > > plugin_init
> > > > > > > > > function. After following your suggestion, the
> > > > > > > > > callbacks
> > > > > > > > > are
> > > > > > > > > getting
> > > > > > > > > registered as expected.
> > > > > > > >
> > > > > > > > Ah, good.
> > > > > > > >
> > > > > > > > > I submitted a patch to review for this feature
> > > > > > > > > on gcc-patches; please let me know if it looks OK.
> > > > > > > >
> > > > > > > > Thanks Eric; I've posted a reply to your email there, so
> > > > > > > > let's
> > > > > > > > discuss
> > > > > > > > the details there.
> > > > > > > >
> > > > > > > > Dave
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
---
This patch adds known function subclasses for the following Python/C
API: PyList_New, PyLong_FromLong, PyList_Append. It also adds new
optional parameters to region_model::get_or_create_region_for_heap_alloc
so that the newly allocated region may transition from the start state
to the assumed non null state on the malloc state machine immediately if
desired.

The main warnings we gain in this patch with respect to the known function 
subclasses
mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
  New optional parameters.
* region-model.h (class region_model): Likewise.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::move_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: New features for plugin.
* gcc.dg/plugin/plugin.exp: New test.
* 

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-07 Thread Eric Feng via Gcc
On Fri, Aug 4, 2023 at 6:46 PM David Malcolm  wrote:
>
> On Fri, 2023-08-04 at 18:42 -0400, David Malcolm wrote:
> > On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> > > On Fri, Aug 4, 2023 at 11:39 AM David Malcolm 
> > > wrote:
> > > >
> > > > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > > > Hi Dave,
> > > > >
> > > > > Tests related to our plugin which depend on Python-specific
> > > > > definitions have been run by including /* { dg-options "-
> > > > > fanalyzer
> > > > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal;
> > > > > is
> > > > > it
> > > > > best to approach this problem by adapting a subset of relevant
> > > > > definitions like in gil.h?
> > > >
> > > > That might be acceptable in the very short-term, but to create a
> > > > plugin
> > > > that's useful to end-user (authors of CPython extension modules)
> > > > we
> > > > want to be testing against real Python headers.
> > > >
> > > > As I understand it, https://peps.python.org/pep-0394/ allows for
> > > > distributors of Python to symlink "python3-config" in the PATH to
> > > > a
> > > > python3.X-config script (for some X).
> > > >
> > > > So on such systems running:
> > > >   python3-config --includes
> > > > should emit the correct -I option.  On my box it emits:
> > > >
> > > > -I/usr/include/python3.8 -I/usr/include/python3.8
> > > >
> > > >
> > > > It's more complicated, but I believe:
> > > >   python3-config --cflags
> > > > should emit the build flags that C/C++ extensions ought to use
> > > > when
> > > > building.  On my box this emits:
> > > >
> > > > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > > > result -
> > > > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > > > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -
> > > > fstack-
> > > > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > > > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > > > protection -
> > > > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > > > DNDEBUG  -
> > > > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> > > > -
> > > > Wp,-
> > > > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > > > grecord-
> > > > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > > > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -
> > > > fwrapv
> > > >
> > > > and it's likely going to vary from distribution to distribution.
> > > > Some
> > > > of those options *are* going to affect the gimple that -fanalyzer
> > > > "sees".
> > > >
> > > > Does your installation of Python have such a script?
> > > >
> > > > So in the short term you could hack in a minimal subset of the
> > > > decls/defns from Python.h, but I'd prefer it if target-
> > > > supports.exp
> > > > gained a DejaGnu directive that invokes python3-config, captures
> > > > the
> > > > result (or fails with UNSUPPORTED for systems without python3
> > > > development headers), and then adds the result to the build flags
> > > > of
> > > > the file being tested.  The .exp files are implemented in Tcl,
> > > > alas;
> > > > let me know if you want help with that.
> > > >
> > > > Dave
> > > Sounds good; thanks! Following existing examples in
> > > target-supports.exp, the following works as expected in terms of
> > > extracting the build flags we are interested in.
> > >
> > > In target-supports.exp:
> > > proc check_python_flags { } {
> > > set result [remote_exec host "python3-config --cflags"]
> > > set status [lindex $result 0]
> > > if { $status == 0 } {
> > > return [lindex $result 1]
> > > } else {
> > > return "UNSUPPORTED"
> > > }
> > > }
> > >
> > > However, I'm having some trouble figuring out the specifics as to
> > > how
> > > we may add the build flags to our test cases. My intuition looks
> > > like
> > > something like the following:
> > >
> > > In plugin.exp:
> > > foreach plugin_test $plugin_test_list {
> > > if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
> > > set python_flags [check_python_flags]
> > > if { $python_flags ne "UNSUPPORTED" } {
> > >// append $python_flags to build flags here
> > > }
> > > }
> > > 
> > > }
> > >
> > > How might we do so?
> >
> > Good question.
> >
> > Looking at plugin.exp I see it uses plugin-test-execute, which is
> > defined in gcc/testsuite/lib/plugin-support.exp.
> >
> > Looking there, I see it attempts to build the plugin, and then if it
> > succeeds, it calls
> >   dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
> > where $plugin_tests is the list of source files to be compiled using
> > the plugin.  So one way to do this would be to modify that code from
> > plugin.exp to pass in a different value, rather than $default_flags.
> > Though it seems hackish to special-case this.
>
> Sorry, I think I misspoke here; that line that uses $default_flags is

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-04 Thread Eric Feng via Gcc
On Fri, Aug 4, 2023 at 11:39 AM David Malcolm  wrote:
>
> On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Tests related to our plugin which depend on Python-specific
> > definitions have been run by including /* { dg-options "-fanalyzer
> > -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
> > best to approach this problem by adapting a subset of relevant
> > definitions like in gil.h?
>
> That might be acceptable in the very short-term, but to create a plugin
> that's useful to end-user (authors of CPython extension modules) we
> want to be testing against real Python headers.
>
> As I understand it, https://peps.python.org/pep-0394/ allows for
> distributors of Python to symlink "python3-config" in the PATH to a
> python3.X-config script (for some X).
>
> So on such systems running:
>   python3-config --includes
> should emit the correct -I option.  On my box it emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8
>
>
> It's more complicated, but I believe:
>   python3-config --cflags
> should emit the build flags that C/C++ extensions ought to use when
> building.  On my box this emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-result -
> Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
> protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -
> D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG  -
> O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-
> D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-
> gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv
>
> and it's likely going to vary from distribution to distribution.  Some
> of those options *are* going to affect the gimple that -fanalyzer
> "sees".
>
> Does your installation of Python have such a script?
>
> So in the short term you could hack in a minimal subset of the
> decls/defns from Python.h, but I'd prefer it if target-supports.exp
> gained a DejaGnu directive that invokes python3-config, captures the
> result (or fails with UNSUPPORTED for systems without python3
> development headers), and then adds the result to the build flags of
> the file being tested.  The .exp files are implemented in Tcl, alas;
> let me know if you want help with that.
>
> Dave
Sounds good; thanks! Following existing examples in
target-supports.exp, the following works as expected in terms of
extracting the build flags we are interested in.

In target-supports.exp:
proc check_python_flags { } {
set result [remote_exec host "python3-config --cflags"]
set status [lindex $result 0]
if { $status == 0 } {
return [lindex $result 1]
} else {
return "UNSUPPORTED"
}
}

However, I'm having some trouble figuring out the specifics as to how
we may add the build flags to our test cases. My intuition looks like
something like the following:

In plugin.exp:
foreach plugin_test $plugin_test_list {
if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
set python_flags [check_python_flags]
if { $python_flags ne "UNSUPPORTED" } {
   // append $python_flags to build flags here
}
}

}

How might we do so?
>
>
> >
> > Best,
> > Eric
> >
> > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm 
> > wrote:
> > >
> > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > >
> > > > > My guess is that you were trying to do it from the
> > > > > PLUGIN_ANALYZER_INIT
> > > > > hook rather than from the plugin_init function, but it's hard
> > > > > to be
> > > > > sure without seeing the code.
> > > > >
> > > >
> > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > trying to
> > > > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > > > function. After following your suggestion, the callbacks are
> > > > getting
> > > > registered as expected.
> > >
> > > Ah, good.
> > >
> > > > I submitted a patch to review for this feature
> > > > on gcc-patches; please let me know if it looks OK.
> > >
> > > Thanks Eric; I've posted a reply to your email there, so let's
> > > discuss
> > > the details there.
> > >
> > > Dave
> > >
> >
>


Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-04 Thread Eric Feng via Gcc
Hi Dave,

Tests related to our plugin which depend on Python-specific
definitions have been run by including /* { dg-options "-fanalyzer
-I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
best to approach this problem by adapting a subset of relevant
definitions like in gil.h?

Best,
Eric

On Tue, Aug 1, 2023 at 1:06 PM David Malcolm  wrote:
>
> On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > >
> > > My guess is that you were trying to do it from the
> > > PLUGIN_ANALYZER_INIT
> > > hook rather than from the plugin_init function, but it's hard to be
> > > sure without seeing the code.
> > >
> >
> > Thanks Dave, you are entirely right — I made the mistake of trying to
> > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > function. After following your suggestion, the callbacks are getting
> > registered as expected.
>
> Ah, good.
>
> > I submitted a patch to review for this feature
> > on gcc-patches; please let me know if it looks OK.
>
> Thanks Eric; I've posted a reply to your email there, so let's discuss
> the details there.
>
> Dave
>


Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-03 Thread Eric Feng via Gcc-patches
On Wed, Aug 2, 2023 at 5:09 PM David Malcolm  wrote:
>
> On Wed, 2023-08-02 at 14:46 -0400, Eric Feng wrote:
> > On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek 
> > wrote:
> > >
> > > On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> > > >
>
> [Dropping Joseph and Marek from the CC]
>
> [...snip...]
>
> >
> >
> > Thank you, everyone. I've submitted a new patch with the described
> > changes.
>
> Thanks.
>
> > As I do not yet have write access, could someone please help
> > me commit it?
>
> I've pushed the v3 trunk to patch, as r14-2933-gfafe2d18f791c6; you can
> see it at [1], so you're now officially a GCC contributor,
> congratulation!
>
> FWIW I had to do a little whitespace fixing on the ChangeLog entries
> before the server-side hooks.commit-extra-checker would pass, as they
> were indented with spaces, rather than tabs, so it complained thusly:
>
> remote: *** The following commit was rejected by your 
> hooks.commit-extra-checker script (status: 1)
> remote: *** commit: 0a4a2dc7dad1dfe22be0b48fe0d8c50d216c8349
> remote: *** ChangeLog format failed:
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* 
> analyzer-language.cc (run_callbacks): New function."
> remote: *** ERR: line should start with a tab: "
> (on_finish_translation_unit): New function."
> remote: *** ERR: line should start with a tab: "* analyzer-language.h 
> (GCC_ANALYZER_LANGUAGE_H): New include."
> remote: *** ERR: line should start with a tab: "(class 
> translation_unit): New vfuncs."
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* c-parser.cc: New 
> functions on stashing values for the"
> remote: *** ERR: line should start with a tab: "  analyzer."
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/plugin.exp: Add new plugin and test."
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin."
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/cpython-plugin-test-1.c: New test."
> remote: *** ERR: PR 107646 in subject but not in changelog: "analyzer: stash 
> values for CPython plugin [PR107646]"
> remote: ***
> remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
> remote: ***
> remote: error: hook declined to update refs/heads/master
> To git+ssh://gcc.gnu.org/git/gcc.git
>  ! [remote rejected] master -> master (hook declined)
> error: failed to push some refs to 
> 'git+ssh://dmalc...@gcc.gnu.org/git/gcc.git'
>
> ...but this was a trivial fix.  You can test that patches are properly
> formatted by running:
>
>   ./contrib/gcc-changelog/git_check_commit.py HEAD
>
> locally.
Sorry about that — will do. Thanks!
>
>
> >  Otherwise, please let me know if I should request write
> > access first (the GettingStarted page suggested requesting someone
> > commit the patch for the first few patches before requesting write
> > access).
>
> Please go ahead and request write access now; we should have done this
> in the "community bonding" phase of GSoC; sorry for not catching this.
Sounds good.
>
> Thanks again for the patch.  How's the followup work?  Are you close to
> being able to post one or more of the simpler known_function
> subclasses?
Yes, I will submit another patch for review very soon. Thank you for
helping me push this one!

Best,
Eric
>
> Dave
>
> [1]
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fafe2d18f791c6b97b49af7c84b1b5703681c3af
>


Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek  wrote:
>
> On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> >
> > Hi Eric, thanks for the updated patch.
> >
> > Overall, looks good to me, although I'd drop the "Exited." from the
> > "sorry" message (and thus from the dg-message directive), since the
> > compiler is not exiting, it's just the particular plugin that's giving
> > up (but let's not hold up the patch with a "bikeshed" discussion on the
> > precise wording).
> >
> > If Joseph or Marek approves the C parts of the patch, this will be OK
> > to push to trunk.
>
Sounds good. Revised.
>
> > > index cf82b0306d1..617111b0f0a 100644
> > > --- a/gcc/c/c-parser.cc
> > > +++ b/gcc/c/c-parser.cc
> > > @@ -1695,6 +1695,32 @@ public:
> > >  return NULL_TREE;
> > >}
> > >
> > > +  tree
> > > +  lookup_type_by_id (tree id) const final override
> > > +  {
> > > +if (tree type_decl = lookup_name (id))
> > > +  {
> > > +   if (TREE_CODE (type_decl) == TYPE_DECL)
> > > + {
> > > +   tree record_type = TREE_TYPE (type_decl);
> > > +   if (TREE_CODE (record_type) == RECORD_TYPE)
> > > + return record_type;
> > > + }
> > > +  }
>
> I'd drop this set of { }, like below.  OK with that adjusted, thanks.
Sounds good — fixed.
>
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > > +  tree
> > > +  lookup_global_var_by_id (tree id) const final override
> > > +  {
> > > +if (tree var_decl = lookup_name (id))
> > > +  if (TREE_CODE (var_decl) == VAR_DECL)
> > > +   return var_decl;
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > >  private:
> > >/* Attempt to get an INTEGER_CST from MACRO.
> > >   Only handle the simplest cases: where MACRO's definition is a
>
> Marek
>

Thank you, everyone. I've submitted a new patch with the described
changes. As I do not yet have write access, could someone please help
me commit it? Otherwise, please let me know if I should request write
access first (the GettingStarted page suggested requesting someone
commit the patch for the first few patches before requesting write
access).

Best,
Eric


[PATCH v3] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Revised:
-- Remove superfluous { }
-- Reword diagnostic

---

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  24 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 295 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..a3f216d90f8 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,30 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+   return var_decl;
+
+return NULL_TREE;
+  }
+
 private:
   /* Attempt to get an INTEGER_CST from MACRO.
  Only handle the simplest cases: where MACRO's definition is a single
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 

Re: [PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Hi Dave,

Thank you for the feedback! I've incorporated the changes and sent a
revised version of the patch.

On Tue, Aug 1, 2023 at 1:02 PM David Malcolm  wrote:
>
> On Tue, 2023-08-01 at 09:52 -0400, Eric Feng wrote:
> > Hi all,
> >
> > This patch adds a hook to the end of ana::on_finish_translation_unit
> > which calls relevant stashing-related callbacks registered during
> > plugin
> > initialization. This feature is used to stash named types and global
> > variables for a CPython analyzer plugin [PR107646].
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look
> > okay?
>
> Hi Eric, thanks for the patch.
>
> The patch touches the C frontend, so those parts would need approval
> from the C FE maintainers/reviewers; I've CCed them.
>
> Overall, I like the patch, but it's not ready for trunk yet; various
> comments inline below...
>
> >
> > ---
> >
> > gcc/analyzer/ChangeLog:
>
> You could add: PR analyzer/107646 to these ChangeLog entries; have a
> look at how other ChangeLog entries refer to such bugzilla entries.
>
> >
> > * analyzer-language.cc (run_callbacks): New function.
> > (on_finish_translation_unit): New function.
> > * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
> > (class translation_unit): New vfuncs.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc: New functions.
>
> I think this ChangeLog entry needs more detail.
Added in revised version of the patch.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/plugin/analyzer_cpython_plugin.c: New test.
> >
> > Signed-off-by: Eric Feng 
> > ---
> >  gcc/analyzer/analyzer-language.cc |  22 ++
> >  gcc/analyzer/analyzer-language.h  |   9 +
> >  gcc/c/c-parser.cc |  26 ++
> >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224
> > ++
> >  4 files changed, 281 insertions(+)
> >  create mode 100644
> > gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > diff --git a/gcc/analyzer/analyzer-language.cc
> > b/gcc/analyzer/analyzer-language.cc
> > index 2c8910906ee..fc41b9c17b8 100644
> > --- a/gcc/analyzer/analyzer-language.cc
> > +++ b/gcc/analyzer/analyzer-language.cc
> > @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> > *analyzer_stashed_constants;
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > +static vec
> > +*finish_translation_unit_callbacks;
> > +
> > +void
> > +register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback)
> > +{
> > +  if (!finish_translation_unit_callbacks)
> > +vec_alloc (finish_translation_unit_callbacks, 1);
> > +  finish_translation_unit_callbacks->safe_push (callback);
> > +}
> > +
> > +void
> > +run_callbacks (logger *logger, const translation_unit )
>
> This function could be "static" since it's not needed outside of
> analyzer-language.cc
>
> > +{
> > +  for (auto const  : finish_translation_unit_callbacks)
> > +{
> > +  cb (logger, tu);
> > +}
> > +}
> >
> >  /* Call into TU to try to find a value for NAME.
> > If found, stash its value within analyzer_stashed_constants.  */
> > @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> > translation_unit )
> >  the_logger.set_logger (new logger (logfile, 0, 0,
> >  *global_dc->printer));
> >stash_named_constants (the_logger.get_logger (), tu);
> > +
> > +  run_callbacks (the_logger.get_logger (), tu);
> >  }
> >
> >  /* Lookup NAME in the named constants stashed when the frontend TU
> > finished.
> > diff --git a/gcc/analyzer/analyzer-language.h
> > b/gcc/analyzer/analyzer-language.h
> > index 00f85aba041..8deea52d627 100644
> > --- a/gcc/analyzer/analyzer-language.h
> > +++ b/gcc/analyzer/analyzer-language.h
> > @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #ifndef GCC_ANALYZER_LANGUAGE_H
> >  #define GCC_ANALYZER_LANGUAGE_H
> >
> > +#include "analyzer/analyzer-logging.h"
> > +
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > @@ -35,8 +37,15 @@ class translation_unit
> >   have been seen).  If it is defined and an integer (e.g. either
> > as a
> >   macro or enum), return the INTEGER_CST value, otherwise return
> > NULL.  */
> >virtual tree lookup_constant_by_id (tree id) const = 0;
> > +  virtual tree lookup_type_by_id (tree id) const = 0;
> > +  virtual tree lookup_global_var_by_id (tree id) const = 0;
> >  };
> >
> > +typedef void (*finish_translation_unit_callback)
> > +   (logger *, const translation_unit &);
> > +void register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback);
> > +
> >  /* Analyzer hook for frontends to call at the end of the TU.  */
> >
> >  void on_finish_translation_unit (const translation_unit );
> > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > index 80920b31f83..f0ee55e416b 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1695,6 +1695,32 @@ public:
> >  return NULL_TREE;
> >}
> >
> > 

[PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Revised:
-- Fix indentation problems
-- Add more detail to Changelog
-- Add new test on handling non-CPython code case
-- Turn off debugging inform by default
-- Make on_finish_translation_unit() static
-- Remove superfluous null checks in init_py_structs()

Changes have been bootstrapped and tested against trunk on 
aarch64-unknown-linux-gnu.

---
This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  26 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 297 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..617111b0f0a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,32 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+  {
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+  }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+   return 

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-01 Thread Eric Feng via Gcc
>
> My guess is that you were trying to do it from the PLUGIN_ANALYZER_INIT
> hook rather than from the plugin_init function, but it's hard to be
> sure without seeing the code.
>

Thanks Dave, you are entirely right — I made the mistake of trying to
do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
function. After following your suggestion, the callbacks are getting
registered as expected. I submitted a patch to review for this feature
on gcc-patches; please let me know if it looks OK.

Best,
Eric


[PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-01 Thread Eric Feng via Gcc-patches
Hi all,

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look okay?

---

gcc/analyzer/ChangeLog:

* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:

* c-parser.cc: New functions.

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/analyzer_cpython_plugin.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  26 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224 ++
 4 files changed, 281 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

diff --git a/gcc/analyzer/analyzer-language.cc
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..fc41b9c17b8 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map 
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER

 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}

 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
 *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }

 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H

+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER

 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };

+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */

 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 80920b31f83..f0ee55e416b 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,32 @@ public:
 return NULL_TREE;
   }

+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+  {
+ if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+ tree record_type = TREE_TYPE (type_decl);
+ if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+  }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+ return var_decl;
+
+return NULL_TREE;
+  }
+
 private:
   /* Attempt to get an INTEGER_CST from MACRO.
  Only handle the simplest cases: where MACRO's definition is a single
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
new file mode 100644
index 000..285da102edb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -0,0 +1,224 @@
+/* -fanalyzer plugin for CPython extension modules  */
+/* { dg-options "-g" } */
+
+#define INCLUDE_MEMORY
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "diagnostic-core.h"
+#include 

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-07-30 Thread Eric Feng via Gcc
[...]
> As noted in our chat earlier, I don't think we can easily make these
> work.  Looking at CPython's implementation: PyList_Type's initializer
> here:
> https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
> initializes tp_flags with the flags, but:
> (a) we don't see that code when compiling a user's extension module
> (b) even if we did, PyList_Type is non-const, so the analyzer has to
> assume that tp_flags could have been written to since it was
> initialized
>
> In theory we could specialcase such lookups, so that, say, a plugin
> could register assumptions into the analyzer about the value of bits
> within (PyList_Type.tp_flags).
>
> However, this seems like a future feature.

I agree that it is more appropriate as a future feature.

Recently, in preparation for a patch, I have been focusing on
migrating as much of our plugin-specific functionality as possible,
which is currently scattered across core analyzer files for
convenience, into the plugin itself. Specifically, I am currently
trying to transfer the code related to stashing Python-specific types
and global variables into analyzer_cpython_plugin.c. This approach has
three main benefits, among which some I believe we have previously
discussed:

1) We only need to search for these values when initializing our
plugin, instead of every time the analyzer is enabled.
2) We can extend the values that we stash by modifying only our
plugin, avoiding changes to core analyzer files such as
analyzer-language.cc, which seems a safer and more resilient approach.
3) Future analyzer plugins will have an easier time stashing values
relevant to their respective projects.

Let me know if my concerns or reasons appear unfounded.

My initial approach involved adding a hook to the end of
ana::on_finish_translation_unit which calls the relevant
stashing-related callbacks registered during plugin initialization.
Here's a rough sketch:

void
on_finish_translation_unit (const translation_unit )
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  do_finish_translation_unit_callbacks(the_logger.get_logger (), tu);
}

Inside do_finish_translation_unit_callbacks we have a loop like so:

for (auto& callback : finish_translation_unit_callbacks)
{
callback(logger, tu);
}

Where finish_translation_unit_callbacks is a vector defined as follows:
typedef void (*finish_translation_unit_callback) (logger *, const
translation_unit &);
vec *finish_translation_unit_callbacks;

To register a callback, we use:

void
register_finish_translation_unit_callback (
finish_translation_unit_callback callback)
{
  if (!finish_translation_unit_callbacks)
vec_alloc (finish_translation_unit_callbacks, 1);
  finish_translation_unit_callbacks->safe_push (callback);
}

And finally, from our plugin (or any other plugin), we can register
callbacks like so:
ana::register_finish_translation_unit_callback (_named_types);
ana::register_finish_translation_unit_callback (_global_vars);

However, on_finish_translation_unit runs before plugin initialization
occurs, so, unfortunately, we would be registering our callbacks after
on_finish_translation_unit with this method. As a workaround, I tried
saving the translation unit like this:

void
on_finish_translation_unit (const translation_unit )
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  saved_tu = 
}

Then in our plugin:
ana::register_finish_translation_unit_callback (_named_types);
ana::register_finish_translation_unit_callback (_global_vars);
ana:: do_finish_translation_unit_callbacks();

With do_finish_translation_units passing the stored_tu to the callbacks.

Unfortunately, with this method, it seems like we encounter a
segmentation fault when trying to call the lookup functions within
translation_unit at the time of plugin initialization, even though the
translation unit is stored correctly. So it seems like the solution
may not be quite so simple.

I'm currently investigating this issue, but if there's an obvious
solution that I might be missing or any general suggestions, please
let me know!

Thanks as always,
Eric


Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-07-27 Thread Eric Feng via Gcc
Hi Dave,

Thanks for the comments!

[...]
> Do you have any DejaGnu tests for this functionality?  For example,
> given PyList_New
>   https://docs.python.org/3/c-api/list.html#c.PyList_New
> there could be a test like:
>
> /* { dg-require-effective-target python_h } */
>
> #define PY_SSIZE_T_CLEAN
> #include 
> #include "analyzer-decls.h"
>
> PyObject *
> test_PyList_New (Py_ssize_t len)
> {
>   PyObject *obj = PyList_New (len);
>   if (obj)
> {
>  __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
>  __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */
>  __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> }
>   else
> __analyzer_dump_path (); /* { dg-warning "path" } */
>   return obj;
> }
>
> ...or similar, to verify that we simulate that the call can both
> succeed and fail, and to verify properties of the store along the
> "success" path.  Caveat: I didn't look at exactly what properties
> you're simulating, so the above tests might need adjusting.
>

I am currently in the process of developing more tests. Specific to
the test you provided as an example, we are passing all cases except
for PyList_Check. PyList_Check does not pass because I have not yet
added support for the various definitions of tp_flags. I also
encountered a minor hiccup where PyList_CheckExact appeared to give
"UNKNOWN" rather than "TRUE", but this has since been fixed. The
problem was caused by accidentally using the tree representation of
struct PyList_Type as opposed to struct PyList_Type * when creating a
pointer sval to the region for Pylist_Type.

[...]
>
> > Let's consider the following example which lacks error checking:
> >
> > PyObject* foo() {
> > PyObject item = PyLong_FromLong(10);
> > PyObject list = PyList_New(5);
> > return list;
> > }
> >
> > The states for when PyLong_FromLong fails and when PyLong_FromLong
> > succeeds are merged before the call to PyObject* list =
> > PyList_New(5).
>
> Ideally we would emit a leak warning about the "success" case of
> PyLong_FromLong here.  I think you're running into the problem of the
> "store" part of the program_state being separate from the "malloc"
> state machine part of program_state - I'm guessing that you're creating
> a heap_allocated_region for the new python object, but the "malloc"
> state machine isn't transitioning the pointer from "start" to "assumed-
> non-null".  Such state machine states inhibit state-merging, and so
> this might solve your state-merging problem.
>
> I think we need a way to call malloc_state_machine::on_allocator_call
> from outside of sm-malloc.cc.  See region_model::on_realloc_with_move
> for an example of how to do something similar.
>

Thank you for the suggestion — this worked great and has solved the issue!

Best,
Eric


Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-07-24 Thread Eric Feng via Gcc
Hi all,

I would like to update everyone on the progress of the static analyzer
plugin for CPython extension module code. Since the last update, I
have implemented known function subclasses for PyList_New and
PyList_Append. The existing known function subclasses have also been
enhanced to provide more information. For instance, we are now
simulating object type specific fields in addition to just ob_refcnt
and ob_type, which are shared by all PyObjects.

Regarding reference count checking, I have implemented a naive
traversal of the store to count the actual reference count of
PyObjects, allowing us to compare it against the ob_refcnt fields of
the same PyObjects. Although we can compare the actual reference count
and the ob_refcnt field, I am still working on implementing a
diagnostic to alert about this issue.

In addition to the progress update, I have some implementation related
questions and would appreciate any input. The current moment at which
we run the algorithm for reference count checking, and thereby also
the moment at which we may want to issue
impl_region_model_context::warn, is within region_model::pop_frame.
However, it appears that m_stmt and m_stmt_finder are NULL at the time
of region_model::pop_frame, which results in the diagnostic for the
reference count getting rejected. I am having trouble finding a
workaround for this issue, so any ideas would be welcome.

I am also currently examining some issues related to state merging.
Let's consider the following example which lacks error checking:

PyObject* foo() {
PyObject item = PyLong_FromLong(10);
PyObject list = PyList_New(5);
return list;
}

The states for when PyLong_FromLong fails and when PyLong_FromLong
succeeds are merged before the call to PyObject* list = PyList_New(5).
I suspect this may be related to me not correctly handling behavior
that arises due to the analyzer deterministically selecting the IDs
for heap allocations. Since there is a heap allocation for PyList_New
following PyLong_FromLong, the success and fail cases for
PyLong_FromLong are merged. I believe this is so that in the scenario
where PyLong_FromLong fails and PyList_New succeeds, the ID for the
region allocated for PyList_New wouldn't be the same as the
PyLong_FromLong success case. Whatever the cause, due to this state
merge, the heap allocated region representing PyObject *item has all
its fields set to UNKNOWN, making it impossible to perform the
reference count checking functionality. I attempted to fix this by
wrapping the svalue representing PyLongObject with
get_or_create_unmergeable, but it didn't seem to help. However, this
issue doesn't occur in all situations. For instance:

PyObject* foo() {
PyObject item = PyLong_FromLong(10);
PyObject list = PyList_New(5);
PyList_Append(list, item);
return list;
}

The above scenario is simulated as expected. I will continue to search
for a solution, but any suggestions would be highly appreciated. Thank
you!

Best,
Eric


Re: Query status of GSoC project: CPyChecker

2023-06-28 Thread Eric Feng via Gcc
Hi Steven,

Thanks for reaching out. The project is still in very early stages. So
far we have taught the analyzer the basic behavior for
PyLong_FromLong, PyList_New, and Py_DECREF via known function
subclassing. Additionally, Py_INCREF is supported out of the box.
Reference count checking functionality remains the priority, but it is
not yet fully implemented.

Regarding CPython versions, the goal is to just get things working on
one version first. I arbitrarily picked 3.9, but happy to consider
another version as an initial goal if it’s more helpful to the CPython
community.

Feel free to check out the repo at
https://github.com/efric/gcc-cpython-analyzer for details on the
project and to follow along and help out where you are interested.

Best,
Eric


On Tue, Jun 27, 2023 at 6:03 AM Steven Sun  wrote:
>
> Hi Eric, I am Steven (now) from the CPython team.
>
> How is the project going? Do you have any prototypes
> or ideas that can be discussed? Which part will you start at?
>
>
> I recently debugged dozens of Python bugs, some involving
> C APIs. I can provide some test cases for you.
>
>
> For the ref count part:
>
> A major change (immortal objects) is introduced in Python 3.12.
> Basically, immortal objects will have the ref count fixed at
> a very large number (depending on `sizeof(void*)` ). But I
> don't think it is necessary to implement this in the early
> stages.
>
> Some stable API steals reference conditionally (on success),
> thus its behavior cannot be simply described by one attribute.
>
>
> For CPython versions:
>
> Some stable CPython API behavior varied across the minor
> release. (eg. 3.10 -> 3.11) For instance, some API accepted
> NULL as args for <3.8, but not >=3.8.
>
> Considering both "GCC" and "CPython" are hard for users to
> upgrade, we might want to consider how to live with these
> behavioral differences in the first place.
>
> Versions older than 3 minor releases cannot be touched. (3.13
> now in active development, 3.12, 3.11 for bug fixes, 3.10, 3.9
> security fixes only) So, versions <= 3.10 can be treated as frozen.


Re: On inform diagnostics in plugins, support scripts for gdb and modeling creation of PyObjects for static analysis

2023-06-07 Thread Eric Feng via Gcc
Hi Dave,

> If that's the code, does it work if you get rid of the "if (0)"
> conditional, or change it to "if (1)"?  As written, that guard is
> false, so that call to "inform" will never be executed.

Woops! Somehow I missed that but yes, it works now. Thanks!

>  Are you invoking gcc from an installed copy, or from the build
> directory?  I think my instructions assume the latter.

Ah gotcha, thanks! It loads as expected when invoking gcc from the
build directory.

> Don't attempt to build the struct by hand; we want to look up the
> struct from the user's headers.  There are at least two ABIs for
> PyObject, so we want to be sure we're using the correct one.
>
> IIRC, to look things up by name, that's generally a frontend thing,
> since every language has its own concept of scopes/namespaces/etc.
>
> It sounds like you want to look for a type in the global scope of the
> C/C++ FE with the name "PyObject".
>
> We currently have some hooks in the analyzer for getting constants from
> the frontends; see analyzer-language.cc, where the frontend calls
> on_finish_translation_unit, where the analyzer queries the FE for the
> named constants that will be of interest during analysis.  Maybe we can
> extend this so that we have a way to look up named types there, and
> stash the tree for later use, and thus your plugin could ask the
> frontend which tree is the PyObject RECORD_TYPE before the frontend is
> cleaned up (in on_finish_translation_unit).

Sounds good, I will look into that. Thanks for the suggestion!

Best,
Eric


On Wed, Jun 7, 2023 at 5:55 PM David Malcolm  wrote:
>
> On Wed, 2023-06-07 at 16:21 -0400, Eric Feng wrote:
> > Hi everyone,
> >
> > I am one of the GSoC participants this year — in particular, I am
> > working on a static analyzer plugin for CPython extension module
> > code.
> > I'm encountering a few challenges and would appreciate any guidance
> > on
> > the following issues:
> >
> > 1) Issue with "inform" diagnostics in the plugin:
> > I am currently unable to see any "inform" messages from my plugin
> > when
> > compiling test programs with the plugin enabled. As per the structure
> > of existing analyzer plugins, I have included the following code in
> > the plugin_init function:
> >
> > #if ENABLE_ANALYZER
> > const char *plugin_name = plugin_info->base_name;
> > if (0)
> > inform(input_location, "got here; %qs", plugin_name);
>
> If that's the code, does it work if you get rid of the "if (0)"
> conditional, or change it to "if (1)"?  As written, that guard is
> false, so that call to "inform" will never be executed.
>
> > register_callback(plugin_info->base_name,
> >   PLUGIN_ANALYZER_INIT,
> >   ana::cpython_analyzer_init_cb,
> >   NULL);
> > #else
> > sorry_no_analyzer();
> > #endif
> > return 0;
> >
> > I expected to see the "got here" message (among others in other areas
> > of the plugin) when compiling test programs but haven't observed any
> > output. I also did not observe the "sorry" diagnostic. I am compiling
> > a simple CPython extension module with the plugin loaded like so:
> >
> > gcc-dev -S -fanalyzer -fplugin=/path/to/cpython_plugin.so
> > -I/usr/include/python3.9 -lpython3.9 -x c refcount6.c
>
> Looks reasonable.
>
> >
> > Additionally, I compiled the plugin following the steps outlined in
> > the GCC documentation for plugin building
> > (https://gcc.gnu.org/onlinedocs/gccint/Plugins-building.html):
> >
> > g++-dev -shared -I/home/flappy/gcc_/gcc/gcc
> > -I/usr/local/lib/gcc/aarch64-unknown-linux-gnu/14.0.0/plugin/include
> > -fPIC -fno-rtti -O2 analyzer_cpython_plugin.c -o cpython_plugin.so
> >
> > Please let me know if I missed any steps or if there is something
> > else
> > I should consider. I have no trouble seeing inform calls when they
> > are
> > added to the core GCC.
> >
> > 2) gdb not detecting .gdbinit in build/gcc:
> > Following Dave's GCC newbies guide, I ran gcc/configure within the
> > gcc
> > subdirectory of the build directory to generate a .gdbinit file.
> > Dave's guide suggested that this file would be automatically detected
> > and run by gdb. However, it appears that GDB is not detecting this
> > .gdbinit file, even after I added the following line to my ~/.gdbinit
> > file:
> >
> > add-auto-load-safe-path /absolute/path/to/build/gcc
>
> Are you invoking gcc from an installed copy, or from the build
> directory?  I think my instructions assume the latter.
>
> >
> > 3) Modeling creation of a new PyObject:
> > Many CPython API calls involve the creation of a new PyObject. To
> > model the creation of a simple PyObject, we can allocate a new heap
> > region using get_or_create_region_for_heap_alloc. We can then create
> > field_regions using get_field_region to associate the newly allocated
> > region to represent fields such as ob_refcnt and ob_type in the
> > PyObject struct. However, one of the parameters to get _field_region
> > is a tree 

On inform diagnostics in plugins, support scripts for gdb and modeling creation of PyObjects for static analysis

2023-06-07 Thread Eric Feng via Gcc
Hi everyone,

I am one of the GSoC participants this year — in particular, I am
working on a static analyzer plugin for CPython extension module code.
I'm encountering a few challenges and would appreciate any guidance on
the following issues:

1) Issue with "inform" diagnostics in the plugin:
I am currently unable to see any "inform" messages from my plugin when
compiling test programs with the plugin enabled. As per the structure
of existing analyzer plugins, I have included the following code in
the plugin_init function:

#if ENABLE_ANALYZER
const char *plugin_name = plugin_info->base_name;
if (0)
inform(input_location, "got here; %qs", plugin_name);
register_callback(plugin_info->base_name,
  PLUGIN_ANALYZER_INIT,
  ana::cpython_analyzer_init_cb,
  NULL);
#else
sorry_no_analyzer();
#endif
return 0;

I expected to see the "got here" message (among others in other areas
of the plugin) when compiling test programs but haven't observed any
output. I also did not observe the "sorry" diagnostic. I am compiling
a simple CPython extension module with the plugin loaded like so:

gcc-dev -S -fanalyzer -fplugin=/path/to/cpython_plugin.so
-I/usr/include/python3.9 -lpython3.9 -x c refcount6.c

Additionally, I compiled the plugin following the steps outlined in
the GCC documentation for plugin building
(https://gcc.gnu.org/onlinedocs/gccint/Plugins-building.html):

g++-dev -shared -I/home/flappy/gcc_/gcc/gcc
-I/usr/local/lib/gcc/aarch64-unknown-linux-gnu/14.0.0/plugin/include
-fPIC -fno-rtti -O2 analyzer_cpython_plugin.c -o cpython_plugin.so

Please let me know if I missed any steps or if there is something else
I should consider. I have no trouble seeing inform calls when they are
added to the core GCC.

2) gdb not detecting .gdbinit in build/gcc:
Following Dave's GCC newbies guide, I ran gcc/configure within the gcc
subdirectory of the build directory to generate a .gdbinit file.
Dave's guide suggested that this file would be automatically detected
and run by gdb. However, it appears that GDB is not detecting this
.gdbinit file, even after I added the following line to my ~/.gdbinit
file:

add-auto-load-safe-path /absolute/path/to/build/gcc

3) Modeling creation of a new PyObject:
Many CPython API calls involve the creation of a new PyObject. To
model the creation of a simple PyObject, we can allocate a new heap
region using get_or_create_region_for_heap_alloc. We can then create
field_regions using get_field_region to associate the newly allocated
region to represent fields such as ob_refcnt and ob_type in the
PyObject struct. However, one of the parameters to get _field_region
is a tree representing the field (e.g ob_refcnt). I'm currently
wondering how we may retrieve this information. My intuition is that
it would be fairly easy if we can first get a tree representation of
the PyObject struct. Since we include the relevant headers when
compiling CPython extension modules (e.g., -I/usr/include/python3.9),
I wonder if there is a way to "look up" the tree representation of
PyObject from the included headers. This information may also be
important for obtaining a svalue representing the size of the PyObject
in get_or_create_region_for_heap_alloc. If there is no way to "look
up" a tree representation of PyObject as described in the included
Python header files, does it make sense for us to just create a tree
representation manually for this task? Please let me know if this
approach makes sense and if so where I could look into to get the
required information.

Thanks all.

Best,
Eric


Re: Re: GSoC: want to take part in `Extend the static analysis pass for CPython Extension`

2023-04-04 Thread Eric Feng via Gcc
Thanks Martin! Sounds good; I submitted my proposal unchanged for now
(i.e assuming an independent project), but as Dave mentioned in
another thread, it may be divided into several logical components,
perhaps with certain features extended, to be suited for
collaboration.

Best,
Eric





On Mon, Apr 3, 2023 at 11:29 AM Martin Jambor  wrote:
>
> Hello,
>
> On Mon, Apr 03 2023, Eric Feng via Gcc wrote:
> > Hi Steven,
> >
> > I’m happy to collaborate on this project together — it would be great
> > to have your experience with CPython internals on the team.
> >
>
> While I normally welcome collaboration, please note that GSoC rules and
> reasonable caution dictate that the two project must not be dependent on
> one another.  I.e. if one of you, for any reason, could not finish your
> bits in time, it must not have severe adverse effects on the other.
>
> If mostly independent collaboration is possible (and David agrees),
> then sure, go ahead.
>
> Thanks for understanding,
>
> Martin


Re: Re: GSoC: want to take part in `Extend the static analysis pass for CPython Extension`

2023-04-03 Thread Eric Feng via Gcc
Hi Steven,

I’m happy to collaborate on this project together — it would be great
to have your experience with CPython internals on the team.

> And by the way, I can get to work long before the start-coding time point of 
> GSoC timeline.


I can be involved in some capacity before the start-coding period as
well (I originally planned to spend the time getting well acquainted
so as to hit the ground running) but I would prefer if we leave the
more involved tasks (e.g reference count checking, format string
checking) to the start-coding time point as I can’t work in a full
time manner until late May due to commitments in school before then.
Perhaps we can begin with the more low hanging fruits such as
Error-handling checking, errors in exception handling, and
verification of PyMethodDef tables in the time before the start-coding
period? It might be good for us to start with these smaller tasks
first to be more efficient in tackling the more involved tasks
anyways. It would also be easy to divvy these tasks up as well.

Best,
Eric


Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin

2023-04-03 Thread Eric Feng via Gcc
Thanks for bringing this to my attention Dave! I’m happy to
collaborate on this project with Steven. I will reply in more detail
in the other thread.

Best,
Eric


On Sun, Apr 2, 2023 at 7:28 PM David Malcolm  wrote:
>
> On Sat, 2023-04-01 at 19:49 -0400, Eric Feng wrote:
> > > For the task above, I think it's almost all there, it's "just" a
> > > case
> > > of implementing the special-case knowledge about the CPython API,
> > > mostly via known_function subclasses.
> >
> > Sounds good.
> >
> >
> > > In cpychecker I added some custom function attributes:
> > >
> > > https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
> > > which were:
> > >   __attribute__((cpychecker_returns_borrowed_ref))
> > >   __attribute__((cpychecker_steals_reference_to_arg(n)))
> > >
> > [...]
> > >
> > > But exactly what these macros would look like would be a decision
> > > for
> > > the CPython community (hence do it via PEP, based on a sample
> > > implementation).
> >
> > Ok, I see what you mean now. Thanks for clarifying!
> >
> >
> > > Yeah, this sounds like a big project.  Fortunately there are a lot
> > > of
> > > possible subtasks in this one, and the project has benefits to GCC
> > > and
> > > to CPython even if you only get a subset of the ideas done in the
> > > time
> > > available (refcount checking being probably the highest-value
> > > subtask).
> >
> > Sounds good.
> >
> > I refactored the project description and timeline sections of the
> > proposal according to our conversation. Notably, I moved format
> > string
> > checking to task #2 in the timeline since its subtasks are
> > particularly beneficial. I also suggest in the timeline section to
> > reach out to the CPython community via PEP about the specifics of new
> > attributes in week 9/10 since I think we should have a somewhat
> > mature
> > prototype by that point. Let me know if you think it should be done
> > earlier/later. Please find the changed sections below (I omitted
> > unchanged sections for brevity)
> > ___
> >
> > Describe the project and clearly define its goals:
> > One pertinent use case of the gcc-python plugin was as a static
> > analysis tool for CPython extension modules. The main goal of the
> > plugin was to help programmers writing extensions identify common
> > coding errors. The gcc-python-plugin has bitrotted over the years
> > and,
> > in particular, cpychecker stopped working some GCC releases ago.
> > Broadly, the goal of this project is to port the functionalities of
> > cpychecker to a -fanalyzer plugin.
> >
> > Below is a brief description of the functionalities of the static
> > analysis tool for which I will work on porting over to a -fanalyzer
> > plugin. The structure of the objectives is based on the
> > gcc-python-plugin documentation:
> >
> > Reference count checking: 
> >
> > Format string checking: Some CPython APIs such as PyArgs_ParseTuple,
> > PyArg_ParseTupleAndKeywords, etc take format strings as arguments.
> > This check involves verifying that the format strings taken in by
> > these APIs are correct with respect to the number and types of
> > arguments passed in. In particular, I will work on integrating the
> > analyzer with -Wformat
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107017) and adding
> > plugin support for -Wformat
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100121) . We should
> > then
> > be able to specify our own archetype which reflects the format string
> > syntax for the relevant CPython APIs and take advantage of the
> > integrated analyzer to check them.
> >
> > Associating PyTypeObject instances with compile-time-types:
> >  > from original proposal>
> >
> > Error-handling checking (including errors in exception handling):
> > Common errors such as dereferencing a NULL value are already checked
> > by the analyzer. I will extend this functionality by implementing
> > special-case knowledge about the CPython API.
> >
> > Verification of PyMethodDef tables:  > proposal>
> >
> > Provide an expected timeline:
> > Please find a rough estimate of the weekly progress in relation to
> > the
> > features described below. Tasks that I expect to take longer than one
> > week are broken down in more detail. In addition to what’s described,
> > each task also involves adding test coverage pertaining its specific
> > feature to a regression test suite.
> >
> > Week 1 - 7: Reference counting checking
> > Week 1: Set up the overall infrastructure of the plugin and begin
> > building core functionality
> > Week 1 - 6: Core reference counting functionality
> > Week 7: Refine prototype
> > Week 8 - 10.5: Format string checking (including associating
> > PyTypeObject instances with compile-time-types)
> > Week 8 - ~9: RFE: support printf-style formatted functions in -
> > fanalyzer
> > Week ~9 - 10.5: RFE: plugin support for -Wformat via
> > __attribute__((format()))
> > Additionally, begin conversing with CPython community via PEP
> > about the 

Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin

2023-04-01 Thread Eric Feng via Gcc
f so, this part of the project would involve
> > > working
> > > with the CPython development community, perhaps writing a PEP:
> > >   https://peps.python.org/pep-0001/
> > > Again, this would be an ambitious goal, probably better done after
> > > there's a working prototype.
> >
> >
> > That would be very exciting. However, I'm not sure if I fully
> > understand what you mean. Can you clarify by giving an example of
> > what
> > the new attributes you had in mind might look like and how they would
> > help (for example with respect to reference counting semantics)?
>
> In cpychecker I added some custom function attributes:
>   https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
> which were:
>   __attribute__((cpychecker_returns_borrowed_ref))
>   __attribute__((cpychecker_steals_reference_to_arg(n)))
>
> I imagine that the plugin could provide similar attributes, and that
> CPython could be patched so that the autoconf checks detect the
> attributes, and the headers have some macros that optionally use them.
> For example, consider e.g. list objects, where the API declarations
> here:
> https://github.com/python/cpython/blob/main/Include/listobject.h
>
> are simply:
>
>   PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *);
>   PyAPI_FUNC(PyObject *) PyList_GetItem(PyObject *, Py_ssize_t);
>
> etc, as documented here:
> https://docs.python.org/3/c-api/list.html
>
> and implemented in here:
> https://github.com/python/cpython/blob/main/Objects/listobject.c
>
> Given that the implementation of both PyList_Size and PyList_GetItem
> begin with:
>
> if (!PyList_Check(op)) {
>
> and PyList_Check is a macro that unconditionally dereferences its
> pointer operand:
>
> #define PyList_Check(op) \
> PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)
>
> these declarations require non-NULL args, and thus perhaps we could add
> some new macros making the above decls look like:
>
>   PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *)
> PyAPI_NonNullArg(1);
>
>   PyAPI_FUNC(PyObject *) PyList_GetItem(PyObject *, Py_ssize_t)
> PyAPI_NonNullArg(1))
> PyAPI_ReturnsBorrowedRef();
>
> or somesuch, documenting the required non-nullness of the args, and
> that if PyList_GetItem returns successfully, the caller is borrowing a
> reference to the object.
>
> But exactly what these macros would look like would be a decision for
> the CPython community (hence do it via PEP, based on a sample
> implementation).
>
> >
> > Incidentally, I forgot to mention in my previous email but I believe
> > the 350-hour option is the one that is more appropriate for this
> > project. Please let me know otherwise.
>
> Yeah, this sounds like a big project.  Fortunately there are a lot of
> possible subtasks in this one, and the project has benefits to GCC and
> to CPython even if you only get a subset of the ideas done in the time
> available (refcount checking being probably the highest-value subtask).
>
> Hope this all makes sense
> Dave
>
> >
> > Best,
> > Eric
> >
> > On Sun, Mar 26, 2023 at 11:58 AM David Malcolm 
> > wrote:
> > >
> > > On Sat, 2023-03-25 at 15:38 -0400, Eric Feng via Gcc wrote:
> > > > Hi GCC community,
> > > >
> > > > For GSoC, I am extremely interested in working on the selected
> > > > project
> > > > idea with respect to extending the static analysis pass. In
> > > > particular, porting gcc-python-plugin's cpychecker to a plugin
> > > > for
> > > > GCC
> > > > -fanalyzer as described in
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646.
> > >
> > > Hi Eric, welcome to the GCC commmunity.
> > >
> > > I'm the author/maintainer of GCC's static analysis pass.  I'm also
> > > the
> > > author of gcc-python-plugin and its erstwhile "cpychecker" code, so
> > > I'm
> > > pleased that you're interested in the project.
> > >
> > > I wrote gcc-python-plugin and cpychecker over a decade ago when I
> > > was
> > > focused on CPython development (before I switched to GCC
> > > development),
> > > but it's heavily bitrotted over the years, as I didn't have enough
> > > cycles to keep it compatible with changes in both GCC and CPython
> > > whilst working on GCC itself.  In particular, the cpychecker code
> > > stopped working a number of GCC releases ago.  However, the
> > > cpychecker
> > > code inspired much of my work on GCC's static analysis pa

Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin

2023-03-28 Thread Eric Feng via Gcc
My apologies. Forgot to CC the mailing list in my previous e-mail.
Original reply below:

___


Hi David,

Thank you for your feedback!

> Also, the Python community would continue to find
> static analysis of CPython extension modules useful, so it would be
> good to have the idea live on as a GCC plugin on top of -fanalyzer.


I hope so!

> You should try building GCC from source, and hack in a trivial warning
> that emits "hello world, I'm compiling function 'foo'".  I wrote a
> guide to GCC for new contributors here that should get you started:
>   https://gcc-newbies-guide.readthedocs.io/en/latest/
> This will help you get familiar with GCC's internals, and although the
> plan is to write a plugin, I expect that you'll run into places where a
> patch to GCC itself is more appropriate (bugs and missing functionality
> ), so having your own debug build of GCC is a good idea.
>
> You should become familiar with CPython's extension and embedding API.
> See the excellent documentation here:
>   https://docs.python.org/3/extending/extending.html
> It's probably a good exercise to write your own trivial CPython
> extension module.
>
> You can read the old cpychecker code inside the gcc-python-plugin
> repository, and I gave a couple of talks on it as PyCon a decade ago:


Sounds good.

> > Error-handling checking: Various checks for common errors such as
> > dereferencing a NULL value.
>
> Yes.  This is already done by -fanalyzer, but we need some way for it
> to know the outcomes of specific functions: e.g. one common pattern is
> that API function "PyFoo_Bar" could either:
> (a) succeed, returning a PyObject * that the caller "owns" a reference
> to, or
> (b) fail, returning NULL, and setting an exception on the thread-local
> interpreter state object


Sounds good. In other words, the infrastructure of this check is
already there and our job is to retrofit CPython API-specific
knowledge for this task. Please correct me if my understanding here is
wrong.

> > Errors in exception-handling: Checks for situations in which
> > functions
> > returning PyObject* that is NULL are not gracefully handled.
>
> Yes; detection of this would automatically happen if we implemented
> known_function subclasses e.g. for the pattern above.


Sounds good. I will merge this task with the previous one in the next
iteration of this proposal since it will be handled as a side effect
of implementing the previous task.


> You don't mention testing.  I'd expect part of the project to be the
> creation of a regression test suite, with each step adding test
> coverage for the features it adds.  There are lots of test cases in the
> existing cpychecker test suite that you could reuse  - though beware,
> the test harness there is very bad - I made multiple mistakes:
> - expecting "gold" outputs from test cases - specific stderr strings,
> which make the tests very brittle
> - external scripts associated with .c files, to tell it how to invoke
> the compiler, which make the tests a pain to maintain and extend.
>
> GCC's own test suite handles this much better using DejaGnu where:
> - we test for specific properties of interest in the behavior of each
> test (rather than rigidly specifying everything about the behavior of
> each test)
> - the tests are expressed as .c files with "magic" comments containing
> directives for the test harness
>
> That said DejaGnu is implemented in Tcl, which is a pain to deal with;
> you could reuse DejaGnu, or maybe Python might be a better choice; I'm
> not sure.


You're right, I forgot to mention that in the initial draft. Thank you
for pointing that out. I agree with the bottom-up approach with
respect to building a comprehensive regression test suite. In terms of
specifically what to implement the suite in, I'll explore DejaGnu/Tcl
in more detail before making a more informed decision.

> It might be good to add new attributes to CPython's headers so that the
> function declarations become self-descriptive about e.g. refererence-
> counting semantics (in a way readable both to humans and to static
> analysis tools).  If so, this part of the project would involve working
> with the CPython development community, perhaps writing a PEP:
>   https://peps.python.org/pep-0001/
> Again, this would be an ambitious goal, probably better done after
> there's a working prototype.


That would be very exciting. However, I'm not sure if I fully
understand what you mean. Can you clarify by giving an example of what
the new attributes you had in mind might look like and how they would
help (for example with respect to reference counting semantics)?

Incidentally, I forgot to mention in my previous email but I believe
the 350-hour option is the one th

[GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin

2023-03-25 Thread Eric Feng via Gcc
Hi GCC community,

For GSoC, I am extremely interested in working on the selected project
idea with respect to extending the static analysis pass. In
particular, porting gcc-python-plugin's cpychecker to a plugin for GCC
-fanalyzer as described in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Please find an
initial draft of my proposal below and let me know if it is a
reasonable starting point. Please also correct me if I am
misunderstanding any particular tasks and let me know what areas I
should add more information for or what else I may do in preparation.

___

Describe the project and clearly define its goals:
One pertinent use case of the gcc-python plugin is as a static
analysis tool for CPython extension modules. The main goal is to help
programmers writing extensions identify common coding errors. Broadly,
the goal of this project is to port the functionalities of cpychecker
to a -fanalyzer plugin.

Below is a brief description of the functionalities of the static
analysis tool for which I will work on porting over to a -fanalyzer
plugin. The structure of the objectives is taken from the
gcc-python-plugin documentation:

Reference count checking: Manipulation of PyObjects is done via the
CPython API and in particular with respect to the objects' reference
count. When the reference count belonging to an object drops to zero,
we should free all resources associated with it. This check helps
ensure programmers identify problems with the reference count
associated with an object. For example, memory leaks with respect to
forgetting to decrement the reference count of an object (analogous to
malloc() without corresponding free()) or perhaps object access after
the object's reference count is zero (analogous to access after
free()).

Error-handling checking: Various checks for common errors such as
dereferencing a NULL value.

Errors in exception-handling: Checks for situations in which functions
returning PyObject* that is NULL are not gracefully handled.

Format string checking: Verify that arguments to various CPython APIs
which take format strings are correct.

Associating PyTypeObject instances with compile-time-types: Verify
that the run-time type of a PyTypeObject matches its corresponding
compile-time type for inputs where both are provided.

Verification of PyMethodDef tables: Verify that the function in
PyMethodDef tables matches the calling convention of the ml_flags set.

I suspect a good starting point would be existing proof-of-concept
-fanalyzer plugins such as the CPython GIL example
(analyzer_gil_plugin). Please let me know of any additional pointers.
If there is time to spare, I think it is reasonable to extend the
capabilities of the original checker as well (more details in the
expected timeline below).

Provide an expected timeline:
I suspect the first task to take the longest since it is relatively
involved and it also includes getting the initial infrastructure of
the plugin up. From the experience of the first task, I hope the rest
of the tasks would be implemented faster. Additionally, I understand
that the current timeline outline below is too vague. I wished to
check in with the community for some feedback on whether I am in the
right ballpark before committing to more details.

Week 1 - 7: Reference counting checking
Week 8: Error-handling checking
Week 9: Errors in exception handling
Week 10: Format string checking
Week 11: Verification of PyMethodDef tables
Week 12: I am planning the last week to be safety in case any of the
above tasks take longer than initially expected. In case everything
goes smoothly and there is time to spare, I think it is reasonable to
spend the time extending the capabilities of the original pass. Some
ideas include extending the subset of CPython API that cpychecker
currently support, matching up similar traces to solve the issue of
duplicate error reports, and/or addressing any of the other caveats
currently mentioned in the cpychecker documentation. Additional ideas
are welcome of course.

Briefly introduce yourself and your skills and/or accomplishments:
I am a current Masters in Computer Science student at Columbia
University. I did my undergraduates at Bates College (B.A Math) and
Columbia University (B.S Computer Science) respectively. My interests
are primarily in systems, programming languages, and compilers.

At Columbia, I work in the group led by Professor Stephen Edwards on
the SSLANG programming language: a language built atop the Sparse
Synchronous Model. For more formal information on the Sparse
Synchronous Model, please take a look at "The Sparse Synchronous Model
on Real Hardware" (2022). Please find our repo, along with my
contributions, here: https://github.com/ssm-lang/sslang (my GitHub
handle is @efric). My main contribution to the compiler so far
involved adding a static inlining optimization pass with another
SSLANG team member. Our implementation is mostly based on the work
shown in "Secrets of the Glasgow