Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

David Malcolm via Gcc Tue, 25 Jul 2023 07:41:41 -0700

On Tue, 2023-07-25 at 00:49 -0400, Eric Feng wrote:
> Hi all,

Hi Eric, thanks for the update.


Various comments inline below...

> 
> I would like to update everyone on the progress of the static
> analyzer
> plugin for CPython extension module code.
>  Since the last update, I
> have implemented known function subclasses for PyList_New and
> PyList_Append. The existing known function subclasses have also been
> enhanced to provide more information. For instance, we are now
> simulating object type specific fields in addition to just ob_refcnt
> and ob_type, which are shared by all PyObjects.

Do you have any DejaGnu tests for this functionality?  For example,
given PyList_New
  https://docs.python.org/3/c-api/list.html#c.PyList_New
there could be a test like:

/* { dg-require-effective-target python_h } */

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "analyzer-decls.h"

PyObject *
test_PyList_New (Py_ssize_t len)
{
  PyObject *obj = PyList_New (len);
  if (obj)
    {
     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
     __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */
     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
    }
  else
    __analyzer_dump_path (); /* { dg-warning "path" } */
  return obj;
}

...or similar, to verify that we simulate that the call can both
succeed and fail, and to verify properties of the store along the
"success" path.  Caveat: I didn't look at exactly what properties
you're simulating, so the above tests might need adjusting.

The:
  /* { dg-require-effective-target python_h } */
allows the testcase to bail out with "UNSUPPORTED" on hosts that don't
have a suitable Python.h header file installed, so that all this
Python-specific functionality is optional.  You could implement this
via a new function "check_effective_target_python_h" in
gcc/testsuite/lib/target-supports.exp, similar to the existing
check_effective_target_ functions in that Tcl file.


> 
> Regarding reference count checking, I have implemented a naive
> traversal of the store to count the actual reference count of
> PyObjects, allowing us to compare it against the ob_refcnt fields of
> the same PyObjects. Although we can compare the actual reference count
> and the ob_refcnt field, I am still working on implementing a
> diagnostic to alert about this issue.

Sounds promising.

> 
> In addition to the progress update, I have some implementation
> related
> questions and would appreciate any input. The current moment at which
> we run the algorithm for reference count checking, and thereby also
> the moment at which we may want to issue
> impl_region_model_context::warn, is within region_model::pop_frame.
> However, it appears that m_stmt and m_stmt_finder are NULL at the
> time
> of region_model::pop_frame, which results in the diagnostic for the
> reference count getting rejected. I am having trouble finding a
> workaround for this issue, so any ideas would be welcome.

FWIW I've run into a similar issue, and so did Tim last year.  The
whole stmt vs stmt_finder thing is rather ugly, and I have a local
branch with a part-written reworking of it that I hope will solve these
issues (adding a "class pending_location"), but it's very messy and far
from being ready.  Sorry about this.

In theory you can provide an instance of your own custom stmt_finder
subclass when you save the pending_diagnostic.  There's an example of
doing this in engine.cc: impl_region_model_context::on_state_leak,
where the leak is going to be reported at the end of a function
(similar to your diagnostic); it uses a leak_stmt_finder class, which
simply scans backwards once it has an exploded_path to find the last
stmt that had a useful location_t value.  This is a nasty hack, but
probably does what you need.

> 
> I am also currently examining some issues related to state merging.

Note that you can use -fno-analyzer-state-merge to turn off the state
merging code, which can be very useful when debugging this kind of
thing.

> Let's consider the following example which lacks error checking:
> 
> PyObject* foo() {
>     PyObject item = PyLong_FromLong(10);
>     PyObject list = PyList_New(5);
>     return list;
> }
> 
> The states for when PyLong_FromLong fails and when PyLong_FromLong
> succeeds are merged before the call to PyObject* list =
> PyList_New(5).

Ideally we would emit a leak warning about the "success" case of
PyLong_FromLong here.  I think you're running into the problem of the
"store" part of the program_state being separate from the "malloc"
state machine part of program_state - I'm guessing that you're creating
a heap_allocated_region for the new python object, but the "malloc"
state machine isn't transitioning the pointer from "start" to "assumed-
non-null".  Such state machine states inhibit state-merging, and so
this might solve your state-merging problem.

I think we need a way to call malloc_state_machine::on_allocator_call
from outside of sm-malloc.cc.  See region_model::on_realloc_with_move
for an example of how to do something similar.


> I suspect this may be related to me not correctly handling behavior
> that arises due to the analyzer deterministically selecting the IDs
> for heap allocations. Since there is a heap allocation for PyList_New
> following PyLong_FromLong, the success and fail cases for
> PyLong_FromLong are merged. I believe this is so that in the scenario
> where PyLong_FromLong fails and PyList_New succeeds, the ID for the
> region allocated for PyList_New wouldn't be the same as the
> PyLong_FromLong success case. Whatever the cause, due to this state
> merge, the heap allocated region representing PyObject *item has all
> its fields set to UNKNOWN, making it impossible to perform the
> reference count checking functionality. I attempted to fix this by
> wrapping the svalue representing PyLongObject with
> get_or_create_unmergeable, but it didn't seem to help. 

Strange; I would have thought that would have fixed it.  Can you post
the specific code you tried?

> However, this
> issue doesn't occur in all situations. For instance:
> 
> PyObject* foo() {
>     PyObject item = PyLong_FromLong(10);
>     PyObject list = PyList_New(5);
>     PyList_Append(list, item);
>     return list;
> }
> 
> The above scenario is simulated as expected. I will continue to
> search
> for a solution, but any suggestions would be highly appreciated.
> Thank
> you!

Thanks again for the update; hope the above is helpful
Dave

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

Reply via email to