On Tue, 2023-07-25 at 00:49 -0400, Eric Feng wrote: > Hi all, Hi Eric, thanks for the update.
Various comments inline below... > > I would like to update everyone on the progress of the static > analyzer > plugin for CPython extension module code. > Since the last update, I > have implemented known function subclasses for PyList_New and > PyList_Append. The existing known function subclasses have also been > enhanced to provide more information. For instance, we are now > simulating object type specific fields in addition to just ob_refcnt > and ob_type, which are shared by all PyObjects. Do you have any DejaGnu tests for this functionality? For example, given PyList_New https://docs.python.org/3/c-api/list.html#c.PyList_New there could be a test like: /* { dg-require-effective-target python_h } */ #define PY_SSIZE_T_CLEAN #include <Python.h> #include "analyzer-decls.h" PyObject * test_PyList_New (Py_ssize_t len) { PyObject *obj = PyList_New (len); if (obj) { __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */ __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ } else __analyzer_dump_path (); /* { dg-warning "path" } */ return obj; } ...or similar, to verify that we simulate that the call can both succeed and fail, and to verify properties of the store along the "success" path. Caveat: I didn't look at exactly what properties you're simulating, so the above tests might need adjusting. The: /* { dg-require-effective-target python_h } */ allows the testcase to bail out with "UNSUPPORTED" on hosts that don't have a suitable Python.h header file installed, so that all this Python-specific functionality is optional. You could implement this via a new function "check_effective_target_python_h" in gcc/testsuite/lib/target-supports.exp, similar to the existing check_effective_target_ functions in that Tcl file. > > Regarding reference count checking, I have implemented a naive > traversal of the store to count the actual reference count of > PyObjects, allowing us to compare it against the ob_refcnt fields of > the same PyObjects. Although we can compare the actual reference count > and the ob_refcnt field, I am still working on implementing a > diagnostic to alert about this issue. Sounds promising. > > In addition to the progress update, I have some implementation > related > questions and would appreciate any input. The current moment at which > we run the algorithm for reference count checking, and thereby also > the moment at which we may want to issue > impl_region_model_context::warn, is within region_model::pop_frame. > However, it appears that m_stmt and m_stmt_finder are NULL at the > time > of region_model::pop_frame, which results in the diagnostic for the > reference count getting rejected. I am having trouble finding a > workaround for this issue, so any ideas would be welcome. FWIW I've run into a similar issue, and so did Tim last year. The whole stmt vs stmt_finder thing is rather ugly, and I have a local branch with a part-written reworking of it that I hope will solve these issues (adding a "class pending_location"), but it's very messy and far from being ready. Sorry about this. In theory you can provide an instance of your own custom stmt_finder subclass when you save the pending_diagnostic. There's an example of doing this in engine.cc: impl_region_model_context::on_state_leak, where the leak is going to be reported at the end of a function (similar to your diagnostic); it uses a leak_stmt_finder class, which simply scans backwards once it has an exploded_path to find the last stmt that had a useful location_t value. This is a nasty hack, but probably does what you need. > > I am also currently examining some issues related to state merging. Note that you can use -fno-analyzer-state-merge to turn off the state merging code, which can be very useful when debugging this kind of thing. > Let's consider the following example which lacks error checking: > > PyObject* foo() { > PyObject item = PyLong_FromLong(10); > PyObject list = PyList_New(5); > return list; > } > > The states for when PyLong_FromLong fails and when PyLong_FromLong > succeeds are merged before the call to PyObject* list = > PyList_New(5). Ideally we would emit a leak warning about the "success" case of PyLong_FromLong here. I think you're running into the problem of the "store" part of the program_state being separate from the "malloc" state machine part of program_state - I'm guessing that you're creating a heap_allocated_region for the new python object, but the "malloc" state machine isn't transitioning the pointer from "start" to "assumed- non-null". Such state machine states inhibit state-merging, and so this might solve your state-merging problem. I think we need a way to call malloc_state_machine::on_allocator_call from outside of sm-malloc.cc. See region_model::on_realloc_with_move for an example of how to do something similar. > I suspect this may be related to me not correctly handling behavior > that arises due to the analyzer deterministically selecting the IDs > for heap allocations. Since there is a heap allocation for PyList_New > following PyLong_FromLong, the success and fail cases for > PyLong_FromLong are merged. I believe this is so that in the scenario > where PyLong_FromLong fails and PyList_New succeeds, the ID for the > region allocated for PyList_New wouldn't be the same as the > PyLong_FromLong success case. Whatever the cause, due to this state > merge, the heap allocated region representing PyObject *item has all > its fields set to UNKNOWN, making it impossible to perform the > reference count checking functionality. I attempted to fix this by > wrapping the svalue representing PyLongObject with > get_or_create_unmergeable, but it didn't seem to help. Strange; I would have thought that would have fixed it. Can you post the specific code you tried? > However, this > issue doesn't occur in all situations. For instance: > > PyObject* foo() { > PyObject item = PyLong_FromLong(10); > PyObject list = PyList_New(5); > PyList_Append(list, item); > return list; > } > > The above scenario is simulated as expected. I will continue to > search > for a solution, but any suggestions would be highly appreciated. > Thank > you! Thanks again for the update; hope the above is helpful Dave
