On Sun, 2023-07-30 at 13:52 -0400, Eric Feng wrote: > [...] > > As noted in our chat earlier, I don't think we can easily make > > these > > work. Looking at CPython's implementation: PyList_Type's > > initializer > > here: > > https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101 > > initializes tp_flags with the flags, but: > > (a) we don't see that code when compiling a user's extension module > > (b) even if we did, PyList_Type is non-const, so the analyzer has > > to > > assume that tp_flags could have been written to since it was > > initialized > > > > In theory we could specialcase such lookups, so that, say, a plugin > > could register assumptions into the analyzer about the value of > > bits > > within (PyList_Type.tp_flags). > > > > However, this seems like a future feature. > > I agree that it is more appropriate as a future feature. > > Recently, in preparation for a patch, I have been focusing on > migrating as much of our plugin-specific functionality as possible, > which is currently scattered across core analyzer files for > convenience, into the plugin itself. Specifically, I am currently > trying to transfer the code related to stashing Python-specific types > and global variables into analyzer_cpython_plugin.c. This approach > has > three main benefits, among which some I believe we have previously > discussed: > > 1) We only need to search for these values when initializing our > plugin, instead of every time the analyzer is enabled. > 2) We can extend the values that we stash by modifying only our > plugin, avoiding changes to core analyzer files such as > analyzer-language.cc, which seems a safer and more resilient > approach. > 3) Future analyzer plugins will have an easier time stashing values > relevant to their respective projects.
Sounds good, though I don't mind if the initial version of your patch adds CPython-specific stuff to the core, if there are unexpected hurdles in converting things to be more purely plugin based. > > Let me know if my concerns or reasons appear unfounded. > > My initial approach involved adding a hook to the end of > ana::on_finish_translation_unit which calls the relevant > stashing-related callbacks registered during plugin initialization. > Here's a rough sketch: > > void > on_finish_translation_unit (const translation_unit &tu) > { > // ... existing code > stash_named_constants (the_logger.get_logger (), tu); > > do_finish_translation_unit_callbacks(the_logger.get_logger (), tu); > } > > Inside do_finish_translation_unit_callbacks we have a loop like so: > > for (auto& callback : finish_translation_unit_callbacks) > { > callback(logger, tu); > } > > Where finish_translation_unit_callbacks is a vector defined as > follows: > typedef void (*finish_translation_unit_callback) (logger *, const > translation_unit &); > vec<finish_translation_unit_callback> > *finish_translation_unit_callbacks; Seems reasonable. > > To register a callback, we use: > > void > register_finish_translation_unit_callback ( > finish_translation_unit_callback callback) > { > if (!finish_translation_unit_callbacks) > vec_alloc (finish_translation_unit_callbacks, 1); > finish_translation_unit_callbacks->safe_push (callback); > } > > And finally, from our plugin (or any other plugin), we can register > callbacks like so: > ana::register_finish_translation_unit_callback (&stash_named_types); > ana::register_finish_translation_unit_callback (&stash_global_vars); > > However, on_finish_translation_unit runs before plugin initialization > occurs, so, unfortunately, we would be registering our callbacks > after > on_finish_translation_unit with this method. Really? I thought the plugin_init callback is called from initialize_plugins, which is called from toplev::main fairly early on; I though on_finish_translation_unit is called from deep within do_compile, which is called later on from toplev::main. What happens if you put breakpoints on both the plugin_init hook and on on_finish_translation_unit, and have a look at the backtrace at each? Note that this is the "plugin_init" code, not the PLUGIN_ANALYZER_INIT callback. The latter *is* called after on_finish_translation_unit, when the analyzer runs. You'll need to put your code in the former. > As a workaround, I tried > saving the translation unit like this: > > void > on_finish_translation_unit (const translation_unit &tu) > { > // ... existing code > stash_named_constants (the_logger.get_logger (), tu); > > saved_tu = &tu; > } That's not going to work; the "tu" is a reference to an on-stack object, i.e. essentially a pointer to a temporary on the stack. If saved_tu is a pointer, then it's going to be pointing at garbage when the function returns; if it's an object, then it's going to take a copy of just the base class, which isn't going to be usable either ("object slicing"). > > Then in our plugin: > ana::register_finish_translation_unit_callback (&stash_named_types); > ana::register_finish_translation_unit_callback (&stash_global_vars); > ana:: do_finish_translation_unit_callbacks(); > > With do_finish_translation_units passing the stored_tu to the > callbacks. > > Unfortunately, with this method, it seems like we encounter a > segmentation fault when trying to call the lookup functions within > translation_unit at the time of plugin initialization, even though > the > translation unit is stored correctly. I don't think the tu is getting stored correctly, due to the reasons described above. > So it seems like the solution > may not be quite so simple. > > I'm currently investigating this issue, but if there's an obvious > solution that I might be missing or any general suggestions, please > let me know! My guess is that you were trying to do it from the PLUGIN_ANALYZER_INIT hook rather than from the plugin_init function, but it's hard to be sure without seeing the code. Hope this is helpful Dave