On Mon, 2026-03-16 at 10:58 +0500, Islombek Ismoilov wrote:
> Hi Dave
> I’ve spent more time digging into the analyzer and
> c-family/c-format internals. I want to share my findings on the
> specific
> problems I’ve identified and my proposed approach to solving them for
> Option A.
> 
> Currently, we have two disparate worlds:
> 
>    1.
> 
>    Frontend-based (с-format.cc): Excellent at parsing complex ISO
> C/POSIX
>    format strings but tied to the frontend tree structures and
> location_t.
>    2.
> 
>    Middle-end based (gimple-ssa-sprintf.cc): Good at range-based
> overflow
>    estimation during optimization, but lacking the path-sensitive
> depth of the
>    analyzer.
> 
> My Proposed Approach:
> 
> 1. Unified Format Parser (The "Middle-end Library") My primary goal
> is to
> extract the core parsing logic from с-format.cc into a new,
> frontend-independent component (e.g, gcc/format-parser.cc).

Sounds reasonable, but why "Middle-end Library"? - presumably this code
would be used by both c-format.cc, gimple-ssa-sprintf.cc, and by the
analyzer.

> 
>    -
> 
>    I plan to define a shared format-_string_spec structure that
> represents
>    the "intent" of a format specifier (type, width, precision, flags)
> without
>    relying on frontend-specific data.

You might want to also associate a direction with format string specs,
to handle things like *scanf which write back through pointer
arguments, rather than reading through them.

>    -
> 
>    This will allow -fanalyzer to invoke the parser on GIMPLE strings
> and
>    receive a structured representation of what to expect.

Sounds good.

Ideally we should provide source location information when we complain
about a format string operation.  There's some awkward logic around
getting at location_t values within a string literal; see the substring
location code.  We'd probably want to capture the range of chars within
the string for each format-string-spec entry, and get a location_t for
that on demand when issuing diagnostics.

> 
> 2. Path-Sensitive Range Analysis Integrating this with
> -wanalyzer-out-of-bounds is where the real value lies.
> 
>    -
> 
>    By leveraging the region_model , I want to map the svalue of
> arguments
>    to the constraints defined by the format string.
>    -
> 
>    For example, if the analyzer knows a variable n is in the range
> [1000,
>    9999] and it's being printed into a 4-byte buffer via sprintf , we
> can
>    emit a precise path-sensitive overflow warning that the current
> middle-end
>    might miss.

(nods)

> 
> 3. Implementation Strategy
> 
>    -
> 
>    Phase 1: Identify and isolate the "state-machine" part of the
> existing
>    format parser.

Would this essentially be a refactoring of c-format.cc to extract an
iterator class?

You'll want to get familiar with running the regression tests,
particular for -Wformat, to make sure that the refactorings don't
change behavior.

>    -
> 
>    Phase 2: Implement a format_string_checker class within the
> analyzer
>    that uses this isolated parser.
>    -
> 
>    Phase 3: Create a bridge between the parser's requirements and the
>    analyzer’s range_query and region_model .

I'm not quite sure about phase 3 - what is a range_query here?

Hope this is constructive
Dave

> вт, 10 мар. 2026 г. в 05:54, David Malcolm <[email protected]>:
> 
> > On Mon, 2026-03-09 at 19:33 +0500, Islombek Ismoilov wrote:
> > > Hi Dave,
> > > 
> > > Thank you for your message.
> > > 
> > > Yes, I am able to make changes to GCC, rebuild it, and step
> > > through
> > > the
> > > modified code using a debugger. I have already tested this
> > > workflow
> > > and
> > > confirmed that everything works as expected.
> > 
> > Good.
> > 
> > > 
> > > Could you also briefly describe the project and how it will work
> > > technically?
> > 
> > Have a look at the relevant parts of the SummerOfCode wiki page,
> > and
> > have a look at
> > https://gcc.gnu.org/wiki/StaticAnalyzer
> > 
> > Dave
> > 
> > > 
> > > Best regards,
> > > Islom
> > > 
> > > пн, 9 мар. 2026 г., 18:33 David Malcolm <[email protected]>:
> > > 
> > > > On Sun, 2026-03-08 at 15:17 +0500, Islombek Ismoilov wrote:
> > > > > Hi Dave
> > > > > Thanks for the advise. I've the fixed the issue by performing
> > > > > clean
> > > > > build.
> > > > > I removed the old GCC source directory entirely, and re-
> > > > > downloaded
> > > > > the
> > > > > source, and reapplied my changes. It is working correctly
> > > > > now.
> > > > 
> > > > Excellent.
> > > > 
> > > > Are you able to make changes to gcc, rebuild it, and step
> > > > through
> > > > the
> > > > changed code in a debugger?  That's a good prerequisite that we
> > > > want to
> > > > get applicants to achieve.
> > > > 
> > > > Dave
> > > > 
> > > > 
> > > > > 
> > > > > best regards,
> > > > > Islom
> > > > > 
> > > > > вс, 8 мар. 2026 г. в 05:32, David Malcolm
> > > > > <[email protected]>:
> > > > > 
> > > > > > On Thu, 2026-03-05 at 00:10 +0100, Martin Jambor wrote:
> > > > > > > Hello Islombek,
> > > > > > > 
> > > > > > > On Tue, Mar 03 2026, Islombek Ismoilov via Gcc wrote:
> > > > > > > > Dear David Malcolm
> > > > > > > > 
> > > > > > > > I would like to share my progress on building and
> > > > > > > > modifying
> > > > > > > > the
> > > > > > > > GNU
> > > > > > > > compiler from source.
> > > > > > > > 
> > > > > > > > I successfully built GCC from the source code. During
> > > > > > > > the
> > > > > > > > process,
> > > > > > > > I
> > > > > > > > resolved dependency and configuration issues that
> > > > > > > > arose.
> > > > > > > > 
> > > > > > > > After the build was completed, I tested the compiled
> > > > > > > > compiler
> > > > > > > > using
> > > > > > > > a
> > > > > > > > simple test.c file.
> > > > > > > > 
> > > > > > > > int main(){
> > > > > > > > 
> > > > > > > > return 0;
> > > > > > > > 
> > > > > > > > }
> > > > > > > > 
> > > > > > > > The compilation and execution worked correctly,
> > > > > > > > confirming
> > > > > > > > that
> > > > > > > > the
> > > > > > > > build
> > > > > > > > was functioning as expected.
> > > > > > > > 
> > > > > > > > Then I started experimenting with modifications in the
> > > > > > > > source
> > > > > > > > code.
> > > > > > > > I
> > > > > > > > edited the file c-parser.cc , specifically the function
> > > > > > > > "c_parser_translation_unit"  and added the following
> > > > > > > > line:
> > > > > > > > 
> > > > > > > > warning (0, "Good Job");
> > > > > > > > 
> > > > > > > > My goal was to introduce a  warning that would appear
> > > > > > > > during
> > > > > > > > each
> > > > > > > > compilation.
> > > > > > > 
> > > > > > > When I want to check that a code gets executed in the
> > > > > > > most
> > > > > > > simple
> > > > > > > way, I
> > > > > > > just resort to fprintf.  The trick is to direct the
> > > > > > > output to
> > > > > > > stderr.
> > > > > > > Putting
> > > > > > > 
> > > > > > >   fprintf (stderr, "Good job!\n");
> > > > > > > 
> > > > > > > at the beginning of c_parser_translation_unit does what
> > > > > > > you'd
> > > > > > > expect
> > > > > > > it
> > > > > > > to do.
> > > > > > > 
> > > > > > > > 
> > > > > > > > However, after making the changes and rebuilding, the
> > > > > > > > cc1
> > > > > > > > binary
> > > > > > > > was not
> > > > > > > > generated. The build process completes the
> > > > > > > > configuration
> > > > > > > > stage
> > > > > > > > but
> > > > > > > > fails to
> > > > > > > > produce the main compiler binary. I restored c-
> > > > > > > > parser.cc to
> > > > > > > > its
> > > > > > > > original
> > > > > > > > state, yet the issue still persists , the build still
> > > > > > > > finishes
> > > > > > > > without
> > > > > > > > generating cc1.
> > > > > > > 
> > > > > > > This is of course strange.  What were the commands you
> > > > > > > issued
> > > > > > > (in
> > > > > > > which
> > > > > > > directories) and what were the error messages?  There
> > > > > > > should
> > > > > > > be
> > > > > > > no
> > > > > > > need
> > > > > > > to re-run configuration after such small change.  Did
> > > > > > > make
> > > > > > > exit
> > > > > > > with
> > > > > > > exit code zero?
> > > > > > > 
> > > > > > > Did you disable bootstrap during the first configuration
> > > > > > > step?
> > > > > > 
> > > > > > This is very important when preparing to make changes to
> > > > > > GCC,
> > > > > > otherwise
> > > > > > making edits is very tedious.  Islombek, did you check
> > > > > > this?
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > what do you advise?
> > > > > > > 
> > > > > > > I'm afraid we need more details, after you restore the
> > > > > > > file,
> > > > > > > all
> > > > > > > should
> > > > > > > be as before, of course.
> > > > > > > 
> > > > > > > Good luck debugging this and with GSoC in general.
> > > > > > 
> > > > > > Islombek: did you get any further with this, or are you
> > > > > > stuck?
> > > > > > 
> > > > > > Hope this is constructive
> > > > > > Dave
> > > > > > 
> > > > > > 
> > > > 
> > > > 
> > 
> > 

Reply via email to