On Mon, 2026-03-16 at 07:17 +0530, Saksham Gupta wrote: > Hi David, > > Thanks for the incredibly detailed review! Challenge absolutely > accepted. :) > > You were totally right about the state_machine approach. Given how > the > Py_INCREF/DECREF macros expand, tracking pointers directly in the > region_model and validating against ob_refcnt during stack frame pops > makes > way more sense. I completely rewrote Phase 2 to reflect this. > > I also folded all of your other notes into the attached V2 draft: > > - Added the context on Eric's 2023 work and clarified the scope > around > legacy C extensions and Python 3.14. > - Dropped the brittle ascii-art tests in favor of strictly using > dg-warning and dg-message. > - Moved the .editorconfig setup into the community bonding period. > - Updated the timeline to focus on categorizing the API early and > using > helper attributes instead of trying to hardcode everything. > - Added the real-world integration testing (psycopg2, etc.) and > the > stretch goals from the wiki. > > I've attached the updated PDF. Let me know if this new region_model > architecture aligns better with what you have in mind!
Thanks; this is a big improvement. Note that the proposal of 350 hours over 12 weeks is almost 30 hours a week, which is a big chunk of time. Is that going to be achievable? e.g. do you have any exams or other things scheduled over the summer? Dave > > Best regards, > Saksham > > > On Mon, 16 Mar 2026 at 06:24, David Malcolm <[email protected]> > wrote: > > > On Sun, 2026-03-15 at 12:23 +0530, Saksham Gupta wrote: > > > Hi David, > > > > > > I’ve attached the draft of my GSoC proposal for the CPython API > > > checker. I > > > haven't submitted it to the official portal yet—I wanted to run > > > it by > > > you > > > first to catch any mistakes and make sure the technical direction > > > actually > > > makes sense. > > > > > > I made sure to include your recent advice. The scope now > > > explicitly > > > targets > > > Python 3.11+ to handle the PEP 683 changes. My Compile Farm > > > account > > > (am-saksham) is also fully set up, so I added that to the testing > > > strategy, > > > along with a quick example of handling CFG bifurcation for > > > PyList_New > > > failures. > > > > > > > Hi Saksham > > > > > If you have a few minutes next week, I’d love your brutal honesty > > > on > > > this. > > > > Challenge accepted :) > > > > One thing that might not be mentioned yet on the wiki page is that > > the > > existing plugin is the result of a previous GSoC project (by Eric > > Feng, > > in 2023): > > https://summerofcode.withgoogle.com/archive/2023/projects/EzIUWs5x > > https://gist.github.com/efric/9faa9cb19fe829b97a54d5c7eabf5e72 > > > > (I've added a link to the wiki) > > > > You should update the wording of your proposal to mention this (and > > e.g. how 3.11 broke the old code). > > > > Re: 1. Abstract; probably worth noting that there are multiple ways > > to > > interface CPython with C: using libffi, using a binding generator > > (such > > as Cython), or writing C by hand. This project is focusing on the > > "writing C by hand" case, but we don't recommend people use this > > approach; this is more about supporting legacy code. > > > > Re 2. Motivation & Background: > > > > "Crucially, the analysis will explicitly target CPython 3.11+ > > headers > > as a baseline. This ensures accurate struct layouts,": a nitpick: > > note > > that we don't want to have to care about precise in-memory layouts, > > GCC's C frontend does this for us; what we care about is what > > fields > > there are and what their types are. The region_model/store.cc code > > does track things in terms of bit offsets, so we'll see those when > > debugging, but the plugin should be written in terms of types and > > fields. > > > > "this project will integrate Python-specific domain knowledge > > directly > > into the analyzer core." Really? I was thinking that it's best to > > keep this as a plugin, albeit an in-tree plugin. > > > > "Crucially, the analysis will explicitly target CPython 3.11+ > > headers > > as a baseline." note that there have been other recent changes > > beyond > > PEP 683 as CPython developers have tried to optimize more > > aggressively > > than in the past (e.g. for JIT compilation). The most recent > > release > > is 3.14, and that might well have other changes that the plugin > > needs > > to be aware of. The ideal would be to support a wide range of 3.* > > headers, but it's good to pick one and get that working first, to > > avoid > > getting swamped by compatibility concerns. > > > > "Illustrative Example: The Silent Leak": looks good. > > > > Re 3.2. Phase 2: Implementing the Reference Count State Machine: > > Your implementation plan is rather different to what we tried > > before, > > in that you're proposing using a state_machine subclass to > > associate > > state with a pointer. What we tried in 2023 is to count the number > > of > > pointers being stored pointing at each PyObject, and then compare > > against the ob_refcnt, and complain at certain points when they got > > out-of-sync (e.g. when the stack frame is popped). This was > > working > > purely with the region_model/state code and didn't need a new > > state_machine. That approach did seem to work with the pre-PEP-683 > > implementation, but IIRC Eric got stuck spending a lot of his time > > on > > PyList_Append, and thus we only got a tiny subset of the API > > covered - > > but it did work. Py_INCREF and PyDECREF are typically macros, and > > so > > by the time the analyzer "sees" the user's code, all we see are > > reference count increments, decrements, and conditionals, and this > > is > > captured for us in the store by the region_model code; I think it > > would > > be hard to implement using a state_machine (though maybe I'm > > wrong). > > > > Note that there's huge amounts of repetition in the API (e.g. > > "succeeds, returning a new reference, or fails, returning null" is > > a > > very common pattern). So please make plenty of use of helper > > subroutines, or the attributes idea described on the project wiki > > page. > > > > re "DejaGnu Regression Suite": re"the ascii-art execution paths" > > note > > that these tests tend to be "brittle" so we don't want many tests > > expressed this way, if any at all - dg-warning and dg-message tend > > to > > be much more robust. > > > > re "5. Timeline & Milestones (350 Hours)": I suggest dropping the > > mentions of the state_machine approach, and this suggests a rewrite > > of > > this section. I like the idea of building up a suite of buggy > > extensions. You'll want most of them to be as simple as possible, > > along with some larger examples for "integration testing". I > > recommend > > early on categorizing the API into the various patterns of > > ownership/borrowing/stealing etc, and identifying examples of each, > > and > > trying a simple example of each early on, to verify that the > > overall > > approach will work on all the cases. > > > > I don't like "strict formatting to GNU coding standard" being done > > at > > the end. Better to set up your editor early on to adhere to these, > > and > > then have this happen throughout. IIRC we have a .editorconfig > > file, > > so this should be trivial. So this should be in the "community > > bonding" phase. > > > > The other thing you might like to try is some of the other > > subprojects > > within https://gcc.gnu.org/wiki/StaticAnalyzer/CPython ; some of > > these > > are relatively easy compared to reference count checking, e.g. > > "Verification of PyMethodDef tables" and "Checking arguments of > > "call" > > calls" (though note the word "relatively" here). > > > > Hope this makes sense; let me know if you have questions. I need > > to > > move on, but note I may have missed some things, so consider > > running an > > update past me. > > > > Dave > > > > > > > I really want to make sure my plan for the state machine over > > > GIMPLE > > > aligns > > > with the new class api. If my approach is off base anywhere, > > > please > > > let me > > > know so I can rewrite it before the deadline. > > > > > > Working on this project is my absolute top priority right now, so > > > I'm > > > ready > > > to iterate on this draft as much as needed to get it right. > > > > > > Thanks again for the atoi patch review earlier this week! > > > > > > Best, > > > Saksham Gupta > > > >
