Hi David,

Glad to hear the V2 is looking better!

Regarding the time: 30 hours a week is definitely doable. My summer break
starts on May 1st, so I’ll be completely free of classes and exams for the
entire coding period. I’ve specifically kept my schedule clear so I can
treat this as my main focus for the summer.

Also, just a heads up that I’ve officially submitted the proposal to the
GSoC portal. You should be able to find it under the username
am.sakshamgupta (from India).

Let me know if you need any other details from my side.

Best regards,
Saksham

On Wed, 18 Mar 2026 at 02:52, David Malcolm <[email protected]> wrote:

> On Mon, 2026-03-16 at 07:17 +0530, Saksham Gupta wrote:
> > Hi David,
> >
> > Thanks for the incredibly detailed review! Challenge absolutely
> > accepted. :)
> >
> > You were totally right about the state_machine approach. Given how
> > the
> > Py_INCREF/DECREF macros expand, tracking pointers directly in the
> > region_model and validating against ob_refcnt during stack frame pops
> > makes
> > way more sense. I completely rewrote Phase 2 to reflect this.
> >
> > I also folded all of your other notes into the attached V2 draft:
> >
> >    - Added the context on Eric's 2023 work and clarified the scope
> > around
> >    legacy C extensions and Python 3.14.
> >    - Dropped the brittle ascii-art tests in favor of strictly using
> >    dg-warning and dg-message.
> >    - Moved the .editorconfig setup into the community bonding period.
> >    - Updated the timeline to focus on categorizing the API early and
> > using
> >    helper attributes instead of trying to hardcode everything.
> >    - Added the real-world integration testing (psycopg2, etc.) and
> > the
> >    stretch goals from the wiki.
> >
> > I've attached the updated PDF. Let me know if this new region_model
> > architecture aligns better with what you have in mind!
>
> Thanks; this is a big improvement.
>
> Note that the proposal of 350 hours over 12 weeks is almost 30 hours a
> week, which is a big chunk of time.  Is that going to be achievable?
> e.g. do you have any exams or other things scheduled over the summer?
>
> Dave
>
> >
> > Best regards,
> > Saksham
> >
> >
> > On Mon, 16 Mar 2026 at 06:24, David Malcolm <[email protected]>
> > wrote:
> >
> > > On Sun, 2026-03-15 at 12:23 +0530, Saksham Gupta wrote:
> > > > Hi David,
> > > >
> > > > I’ve attached the draft of my GSoC proposal for the CPython API
> > > > checker. I
> > > > haven't submitted it to the official portal yet—I wanted to run
> > > > it by
> > > > you
> > > > first to catch any mistakes and make sure the technical direction
> > > > actually
> > > > makes sense.
> > > >
> > > > I made sure to include your recent advice. The scope now
> > > > explicitly
> > > > targets
> > > > Python 3.11+ to handle the PEP 683 changes. My Compile Farm
> > > > account
> > > > (am-saksham) is also fully set up, so I added that to the testing
> > > > strategy,
> > > > along with a quick example of handling CFG bifurcation for
> > > > PyList_New
> > > > failures.
> > > >
> > >
> > > Hi Saksham
> > >
> > > > If you have a few minutes next week, I’d love your brutal honesty
> > > > on
> > > > this.
> > >
> > > Challenge accepted :)
> > >
> > > One thing that might not be mentioned yet on the wiki page is that
> > > the
> > > existing plugin is the result of a previous GSoC project (by Eric
> > > Feng,
> > > in 2023):
> > > https://summerofcode.withgoogle.com/archive/2023/projects/EzIUWs5x
> > > https://gist.github.com/efric/9faa9cb19fe829b97a54d5c7eabf5e72
> > >
> > > (I've added a link to the wiki)
> > >
> > > You should update the wording of your proposal to mention this (and
> > > e.g. how 3.11 broke the old code).
> > >
> > > Re: 1. Abstract; probably worth noting that there are multiple ways
> > > to
> > > interface CPython with C: using libffi, using a binding generator
> > > (such
> > > as Cython), or writing C by hand.  This project is focusing on the
> > > "writing C by hand" case, but we don't recommend people use this
> > > approach; this is more about supporting legacy code.
> > >
> > > Re 2. Motivation & Background:
> > >
> > > "Crucially, the analysis will explicitly target CPython 3.11+
> > > headers
> > > as a baseline. This ensures accurate struct layouts,":  a nitpick:
> > > note
> > > that we don't want to have to care about precise in-memory layouts,
> > > GCC's C frontend does this for us; what we care about is what
> > > fields
> > > there are and what their types are.  The region_model/store.cc code
> > > does track things in terms of bit offsets, so we'll see those when
> > > debugging, but the plugin should be written in terms of types and
> > > fields.
> > >
> > > "this project will integrate Python-specific domain knowledge
> > > directly
> > > into the analyzer core."  Really?  I was thinking that it's best to
> > > keep this as a plugin, albeit an in-tree plugin.
> > >
> > > "Crucially, the analysis will explicitly target CPython 3.11+
> > > headers
> > > as a baseline."  note that there have been other recent changes
> > > beyond
> > > PEP 683 as CPython developers have tried to optimize more
> > > aggressively
> > > than in the past (e.g. for JIT compilation).  The most recent
> > > release
> > > is 3.14, and that might well have other changes that the plugin
> > > needs
> > > to be aware of.  The ideal would be to support a wide range of 3.*
> > > headers, but it's good to pick one and get that working first, to
> > > avoid
> > > getting swamped by compatibility concerns.
> > >
> > > "Illustrative Example: The Silent Leak": looks good.
> > >
> > > Re 3.2. Phase 2: Implementing the Reference Count State Machine:
> > > Your implementation plan is rather different to what we tried
> > > before,
> > > in that you're proposing using a state_machine subclass to
> > > associate
> > > state with a pointer.  What we tried in 2023 is to count the number
> > > of
> > > pointers being stored pointing at each PyObject, and then compare
> > > against the ob_refcnt, and complain at certain points when they got
> > > out-of-sync (e.g. when the stack frame is popped).  This was
> > > working
> > > purely with the region_model/state code and didn't need a new
> > > state_machine.  That approach did seem to work with the pre-PEP-683
> > > implementation, but IIRC Eric got stuck spending a lot of his time
> > > on
> > > PyList_Append, and thus we only got a tiny subset of the API
> > > covered -
> > > but it did work.  Py_INCREF and PyDECREF are typically macros, and
> > > so
> > > by the time the analyzer "sees" the user's code, all we see are
> > > reference count increments, decrements, and conditionals, and this
> > > is
> > > captured for us in the store by the region_model code; I think it
> > > would
> > > be hard to implement using a state_machine (though maybe I'm
> > > wrong).
> > >
> > > Note that there's huge amounts of repetition in the API (e.g.
> > > "succeeds, returning a new reference, or fails, returning null" is
> > > a
> > > very common pattern).  So please make plenty of use of helper
> > > subroutines, or the attributes idea described on the project wiki
> > > page.
> > >
> > > re "DejaGnu Regression Suite": re"the ascii-art execution paths"
> > > note
> > > that these tests tend to be "brittle" so we don't want many tests
> > > expressed this way, if any at all - dg-warning and dg-message tend
> > > to
> > > be much more robust.
> > >
> > > re "5. Timeline & Milestones (350 Hours)": I suggest dropping the
> > > mentions of the state_machine approach, and this suggests a rewrite
> > > of
> > > this section.  I like the idea of building up a suite of buggy
> > > extensions.  You'll want most of them to be as simple as possible,
> > > along with some larger examples for "integration testing".  I
> > > recommend
> > > early on categorizing the API into the various patterns of
> > > ownership/borrowing/stealing etc, and identifying examples of each,
> > > and
> > > trying a simple example of each early on, to verify that the
> > > overall
> > > approach will work on all the cases.
> > >
> > > I don't like "strict formatting to GNU coding standard" being done
> > > at
> > > the end.  Better to set up your editor early on to adhere to these,
> > > and
> > > then have this happen throughout.  IIRC we have a .editorconfig
> > > file,
> > > so this should be trivial.  So this should be in the "community
> > > bonding" phase.
> > >
> > > The other thing you might like to try is some of the other
> > > subprojects
> > > within https://gcc.gnu.org/wiki/StaticAnalyzer/CPython ; some of
> > > these
> > > are relatively easy compared to reference count checking, e.g.
> > > "Verification of PyMethodDef tables" and "Checking arguments of
> > > "call"
> > > calls" (though note the word "relatively" here).
> > >
> > > Hope this makes sense; let me know if you have questions.  I need
> > > to
> > > move on, but note I may have missed some things, so consider
> > > running an
> > > update past me.
> > >
> > > Dave
> > >
> > >
> > > > I really want to make sure my plan for the state machine over
> > > > GIMPLE
> > > > aligns
> > > > with the new class api. If my approach is off base anywhere,
> > > > please
> > > > let me
> > > > know so I can rewrite it before the deadline.
> > > >
> > > > Working on this project is my absolute top priority right now, so
> > > > I'm
> > > > ready
> > > > to iterate on this draft as much as needed to get it right.
> > > >
> > > > Thanks again for the atoi patch review earlier this week!
> > > >
> > > > Best,
> > > > Saksham Gupta
> > >
> > >
>
>

Reply via email to