Re: GSoC 2026: Extend the static analysis pass

David Malcolm via Gcc Wed, 18 Mar 2026 08:00:25 -0700

On Wed, 2026-03-18 at 19:55 +0530, Ridham Khurana wrote:
> Hi Dave,
> 
> Thanks again for your guidance and confirmation regarding the project
> size
> 
> As you suggested, I have submitted a patch to
> [email protected]. This
> patch adds a new *kf_getenv* handler, which bifurcates on the return
> value
> into NULL and non-NULL paths, so that the analyzer can now warn on
> unchecked *getenv* dereferences, and also checks the argument for
> null-termination.
> 
> The new getenv-1.c test passes (9/9), and I also ran the full
> analyzer.exp
> suite to check for regressions, and there were no new failures.
> 
> This was a helpful exercise for getting more familiar with analyzer
> internals, the known-function modelling and the gcc test workflow, as
> you
> suggested in last email.


Excellent; thanks

One other thing that's worth getting familiar with is doing a bootstrap
build of gcc.  Ideally each patch should be tested with a full
bootstrap build, and a full run of the testsuite on that bootstrapped
build.  Have you tried that yet?  We have a farm of fast machines we
can get you access to if you need one for that.

> 
> I have started drafting the official gsoc proposal in parallel, and I
> will
> be sharing it for review once its ready. Also, I would appreciate
> your
> feedback on the patch

(nods)

Thanks again
Dave

> 
> Best,
> Ridham Khurana
> 
> On Sat, Mar 14, 2026 at 5:24 AM David Malcolm <[email protected]>
> wrote:
> 
> > On Fri, 2026-03-13 at 12:23 +0530, Ridham Khurana wrote:
> > > Hi Dave,
> > > 
> > > Thanks for the confirmation about the expected type argument, I
> > > will
> > > add it
> > > in the shared layer.
> > > 
> > > While going through the current analyzer implementation, I
> > > noticed
> > > that
> > > arguments to the function calls are retrieved through
> > > *call_details::get_arg_svalue()* and then handles as const
> > > svalue*,
> > > rather
> > > than *tree* nodes like in the frontend and GIMPLE passes. From
> > > what I
> > > can
> > > understand, the library calls behaviour is modelled through
> > > *known_function* handlers interacting with the *region_model*(for
> > > example
> > > through *impl_call_pre in kf_ handlers*), and then the existing
> > > checks for
> > > functions like printf are mostly driven by format attribute and
> > > the
> > > validation of format string arguments(for example using
> > > *check_for_null_terminated_string_arg()*), instead of
> > > interpreting
> > > the
> > > individual directives.
> > 
> > That's correct.
> > 
> > > 
> > > But one thing that I am not sure about is where the shared
> > > string-
> > > parser
> > > show be integrated on the analyzer side. Maybe it should be
> > > triggered
> > > through the attribute-based path , or it is better to use it
> > > inside
> > > the
> > > individual kf_* handlers for the functions like printf-style.
> > 
> > I'm not sure.  I think we want a subroutine inside the analyzer
> > that
> > can be called from either place, and then see how well each
> > approach
> > works.
> > 
> > On the subject of known_function handlers, some other GSoC
> > candidates
> > have had success in making patches that add new known_function
> > subclasses for specific POSIX/C stdlib entrypoints.  This is a
> > relatively easy and self-contained way to improve -fanalyzer, and
> > it's
> > a good way to demonstrate technical prowess, and to shake out any
> > problems that a candidate might run into building/debugging gcc on
> > their hardware.  It overlaps with the format-string support, so
> > would
> > be a useful learning experience - but you'd have to choose a
> > simpler
> > API entrypoint (obviously we don't have the format-string parsing
> > in
> > convenient modular form yet).
> > 
> > > 
> > > Also, before starting to draft the official proposal, I wanted to
> > > confirm
> > > the expected size of this project. From my current understanding,
> > > it
> > > would
> > > be 350 hours,
> > 
> > I think 350 hours is the better choice; this is a rather ambitious
> > project.
> > 
> > 
> > > dividing this project into 2 major phases, the first phase of
> > > the project to unify the parsing logic among all 3 subsystems
> > 
> > (it would be the *2* subsystems at this time, since the analyzer
> > doesn't yet support format strings)
> > 
> > > and the
> > > second phase to be the actual work on the analyzer part. Please
> > > let
> > > me know
> > > if it matches your expectations or would you prefer 175 hour
> > > scope?
> > 
> > FWIW I'm always a bit sceptical of timetables that rigidly divide
> > projects into phases - it feels too much like the "waterfall" model
> > of
> > development.  But yes, splitting out the parsing logic from the
> > other 2
> > subsystems is a prerequisite before using it in -fanalyzer (I
> > suppose
> > you could have a proof-of-concept that recognizes hardcoded strings
> > and
> > provides the analyzer with the (hardcoded) action list, but that's
> > probably wasted effort compared to simply doing the refactoring
> > work).
> > 
> > A useful exercise would be to get familiar with running gcc's full
> > test
> > suite, and verifying that a patch doesn't regress anything, since
> > that's very important during the refactoring of the existing code.
> > 
> > Hope this is helpful and makes sense; let me know if you have any
> > questions
> > Dave
> > 
> > 
> > 
> >

Re: GSoC 2026: Extend the static analysis pass

Reply via email to