On Wed, 2026-03-18 at 19:55 +0530, Ridham Khurana wrote: > Hi Dave, > > Thanks again for your guidance and confirmation regarding the project > size > > As you suggested, I have submitted a patch to > [email protected]. This > patch adds a new *kf_getenv* handler, which bifurcates on the return > value > into NULL and non-NULL paths, so that the analyzer can now warn on > unchecked *getenv* dereferences, and also checks the argument for > null-termination. > > The new getenv-1.c test passes (9/9), and I also ran the full > analyzer.exp > suite to check for regressions, and there were no new failures. > > This was a helpful exercise for getting more familiar with analyzer > internals, the known-function modelling and the gcc test workflow, as > you > suggested in last email.
Excellent; thanks One other thing that's worth getting familiar with is doing a bootstrap build of gcc. Ideally each patch should be tested with a full bootstrap build, and a full run of the testsuite on that bootstrapped build. Have you tried that yet? We have a farm of fast machines we can get you access to if you need one for that. > > I have started drafting the official gsoc proposal in parallel, and I > will > be sharing it for review once its ready. Also, I would appreciate > your > feedback on the patch (nods) Thanks again Dave > > Best, > Ridham Khurana > > On Sat, Mar 14, 2026 at 5:24 AM David Malcolm <[email protected]> > wrote: > > > On Fri, 2026-03-13 at 12:23 +0530, Ridham Khurana wrote: > > > Hi Dave, > > > > > > Thanks for the confirmation about the expected type argument, I > > > will > > > add it > > > in the shared layer. > > > > > > While going through the current analyzer implementation, I > > > noticed > > > that > > > arguments to the function calls are retrieved through > > > *call_details::get_arg_svalue()* and then handles as const > > > svalue*, > > > rather > > > than *tree* nodes like in the frontend and GIMPLE passes. From > > > what I > > > can > > > understand, the library calls behaviour is modelled through > > > *known_function* handlers interacting with the *region_model*(for > > > example > > > through *impl_call_pre in kf_ handlers*), and then the existing > > > checks for > > > functions like printf are mostly driven by format attribute and > > > the > > > validation of format string arguments(for example using > > > *check_for_null_terminated_string_arg()*), instead of > > > interpreting > > > the > > > individual directives. > > > > That's correct. > > > > > > > > But one thing that I am not sure about is where the shared > > > string- > > > parser > > > show be integrated on the analyzer side. Maybe it should be > > > triggered > > > through the attribute-based path , or it is better to use it > > > inside > > > the > > > individual kf_* handlers for the functions like printf-style. > > > > I'm not sure. I think we want a subroutine inside the analyzer > > that > > can be called from either place, and then see how well each > > approach > > works. > > > > On the subject of known_function handlers, some other GSoC > > candidates > > have had success in making patches that add new known_function > > subclasses for specific POSIX/C stdlib entrypoints. This is a > > relatively easy and self-contained way to improve -fanalyzer, and > > it's > > a good way to demonstrate technical prowess, and to shake out any > > problems that a candidate might run into building/debugging gcc on > > their hardware. It overlaps with the format-string support, so > > would > > be a useful learning experience - but you'd have to choose a > > simpler > > API entrypoint (obviously we don't have the format-string parsing > > in > > convenient modular form yet). > > > > > > > > Also, before starting to draft the official proposal, I wanted to > > > confirm > > > the expected size of this project. From my current understanding, > > > it > > > would > > > be 350 hours, > > > > I think 350 hours is the better choice; this is a rather ambitious > > project. > > > > > > > dividing this project into 2 major phases, the first phase of > > > the project to unify the parsing logic among all 3 subsystems > > > > (it would be the *2* subsystems at this time, since the analyzer > > doesn't yet support format strings) > > > > > and the > > > second phase to be the actual work on the analyzer part. Please > > > let > > > me know > > > if it matches your expectations or would you prefer 175 hour > > > scope? > > > > FWIW I'm always a bit sceptical of timetables that rigidly divide > > projects into phases - it feels too much like the "waterfall" model > > of > > development. But yes, splitting out the parsing logic from the > > other 2 > > subsystems is a prerequisite before using it in -fanalyzer (I > > suppose > > you could have a proof-of-concept that recognizes hardcoded strings > > and > > provides the analyzer with the (hardcoded) action list, but that's > > probably wasted effort compared to simply doing the refactoring > > work). > > > > A useful exercise would be to get familiar with running gcc's full > > test > > suite, and verifying that a patch doesn't regress anything, since > > that's very important during the refactoring of the existing code. > > > > Hope this is helpful and makes sense; let me know if you have any > > questions > > Dave > > > > > > > >
