Hi Dave Thanks for the clarifications and the detailed pointers. Here is my understanding of the design architecture and progress so far:
I have been going through both files c-format.cc and gimple-ssa-sprintf.cc, and I think I understand the structure/architecture you are suggesting. So, the idea is to introduce a format_parser class (that you mentioned in last email), that takes the raw format string as input and breaks it into different but sequential components, for example literal text segments or directives like %d or %f. I have been calling this sequential structure the "action-map", just to address it with a name. The main point of this shared approach is to avoid having the frontend, GIMPLE and analyzer(upcoming) to run their own logic to decode it, but use this common parser, and instead have them all start from the same parsed action-map. The "format_string_iterator" will then provide us an easy and consistent way for different files to walk through the action-map, at the same time make sure the internals of the action-map are not exposed. From this, every subsystem that requires going through the action-map can use it with its own logic as required, like argument checking in frontend, sprintf folding in GIMPLE, and mainly modelling memory-effects in analyzer, while keeping all the existing semantic algorithms in their own respective files. One more thing that I was thinking is, where should the shared format_parser and format_string_iterator classes live in the tree? Right now, c-format.cc is under gcc/c-family/, gimple-ssa-sprintf.cc is directly under gcc, and the analyzer is under gcc/analyzer/. Since the shared code needs to be accessible for all three, I am guessing it should go directly under gcc, but I wanted to confirm if you have some other approach to deal with this before I start thinking about the layout. On the region_model vs sm_state_map distinction, I think I follow what you mean. I am looking at how both appear in the .dot dumps (e.g. the rmodel: part showing memory state, and the malloc: part showing checker states like unchecked → freed). I am still going through the callbacks to understand how the two sides connect, but I will keep this in mind as I continue studying more about the analyzer internals. My next step is to continue my deep dive into both implementations and start planning out how the shared parser API could look, depending on where it will live. Best, Ridham Khurana
