Hi Dave
Thanks for the clarifications and the detailed pointers. Here is my
understanding of the design architecture and progress so far:

I have been going through both files c-format.cc and gimple-ssa-sprintf.cc,
and I think I understand the structure/architecture you are suggesting. So,
the idea is to introduce a format_parser class (that you mentioned in last
email), that takes the raw format string as input and breaks it into
different but sequential components, for example literal text segments or
directives like %d or %f. I have been calling this sequential structure the
"action-map", just to address it with a name. The main point of this shared
approach is to avoid having the frontend, GIMPLE and analyzer(upcoming) to
run their own logic to decode it, but use this common parser, and instead
have them all start from the same parsed action-map. The
"format_string_iterator" will then provide us an easy and consistent way
for different files to walk through the action-map, at the same time make
sure the internals of the action-map are not exposed. From this, every
subsystem that requires going through the action-map can use it with its
own logic as required, like argument checking in frontend, sprintf folding
in GIMPLE, and mainly modelling memory-effects in analyzer, while keeping
all the existing semantic algorithms in their own respective files.

One more thing that I was thinking is, where should the shared
format_parser and format_string_iterator classes live in the tree? Right
now, c-format.cc is under gcc/c-family/, gimple-ssa-sprintf.cc is directly
under gcc, and the analyzer is under gcc/analyzer/. Since the shared code
needs to be accessible for all three, I am guessing it should go directly
under gcc, but I wanted to confirm if you have some other approach to deal
with this before I start thinking about the layout.

On the region_model vs sm_state_map distinction, I think I follow what you
mean. I am looking at how both appear in the .dot dumps (e.g. the rmodel:
part showing memory state, and the malloc: part showing checker states like
unchecked → freed). I am still going through the callbacks to understand
how the two sides connect, but I will keep this in mind as I continue
studying more about the analyzer internals.

My next step is to continue my deep dive into both implementations and
start planning out how the shared parser API could look, depending on where
it will live.

Best,
Ridham Khurana

Reply via email to