On Sun, 2023-07-30 at 13:52 -0400, Eric Feng wrote:
> [...]
> > As noted in our chat earlier, I don't think we can easily make
> > these
> > work.  Looking at CPython's implementation: PyList_Type's
> > initializer
> > here:
> > https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
> > initializes tp_flags with the flags, but:
> > (a) we don't see that code when compiling a user's extension module
> > (b) even if we did, PyList_Type is non-const, so the analyzer has
> > to
> > assume that tp_flags could have been written to since it was
> > initialized
> > 
> > In theory we could specialcase such lookups, so that, say, a plugin
> > could register assumptions into the analyzer about the value of
> > bits
> > within (PyList_Type.tp_flags).
> > 
> > However, this seems like a future feature.
> 
> I agree that it is more appropriate as a future feature.
> 
> Recently, in preparation for a patch, I have been focusing on
> migrating as much of our plugin-specific functionality as possible,
> which is currently scattered across core analyzer files for
> convenience, into the plugin itself. Specifically, I am currently
> trying to transfer the code related to stashing Python-specific types
> and global variables into analyzer_cpython_plugin.c. This approach
> has
> three main benefits, among which some I believe we have previously
> discussed:
> 
> 1) We only need to search for these values when initializing our
> plugin, instead of every time the analyzer is enabled.
> 2) We can extend the values that we stash by modifying only our
> plugin, avoiding changes to core analyzer files such as
> analyzer-language.cc, which seems a safer and more resilient
> approach.
> 3) Future analyzer plugins will have an easier time stashing values
> relevant to their respective projects.

Sounds good, though I don't mind if the initial version of your patch
adds CPython-specific stuff to the core, if there are unexpected
hurdles in converting things to be more purely plugin based.

> 
> Let me know if my concerns or reasons appear unfounded.
> 
> My initial approach involved adding a hook to the end of
> ana::on_finish_translation_unit which calls the relevant
> stashing-related callbacks registered during plugin initialization.
> Here's a rough sketch:
> 
> void
> on_finish_translation_unit (const translation_unit &tu)
> {
>   // ... existing code
>   stash_named_constants (the_logger.get_logger (), tu);
> 
>   do_finish_translation_unit_callbacks(the_logger.get_logger (), tu);
> }
> 
> Inside do_finish_translation_unit_callbacks we have a loop like so:
> 
> for (auto& callback : finish_translation_unit_callbacks)
> {
>     callback(logger, tu);
> }
> 
> Where finish_translation_unit_callbacks is a vector defined as
> follows:
> typedef void (*finish_translation_unit_callback) (logger *, const
> translation_unit &);
> vec<finish_translation_unit_callback>
> *finish_translation_unit_callbacks;

Seems reasonable.

> 
> To register a callback, we use:
> 
> void
> register_finish_translation_unit_callback (
>     finish_translation_unit_callback callback)
> {
>   if (!finish_translation_unit_callbacks)
>     vec_alloc (finish_translation_unit_callbacks, 1);
>   finish_translation_unit_callbacks->safe_push (callback);
> }
> 
> And finally, from our plugin (or any other plugin), we can register
> callbacks like so:
> ana::register_finish_translation_unit_callback (&stash_named_types);
> ana::register_finish_translation_unit_callback (&stash_global_vars);
> 
> However, on_finish_translation_unit runs before plugin initialization
> occurs, so, unfortunately, we would be registering our callbacks
> after
> on_finish_translation_unit with this method.

Really?   I thought the plugin_init callback is called from
initialize_plugins, which is called from toplev::main fairly early on;
I though on_finish_translation_unit is called from deep within
do_compile, which is called later on from toplev::main.

What happens if you put breakpoints on both the plugin_init hook and on
on_finish_translation_unit, and have a look at the backtrace at each?

Note that this is the "plugin_init" code, not the PLUGIN_ANALYZER_INIT
callback.  The latter *is* called after on_finish_translation_unit,
when the analyzer runs.  You'll need to put your code in the former.


>  As a workaround, I tried
> saving the translation unit like this:
> 
> void
> on_finish_translation_unit (const translation_unit &tu)
> {
>   // ... existing code
>   stash_named_constants (the_logger.get_logger (), tu);
> 
>   saved_tu = &tu;
> }

That's not going to work; the "tu" is a reference to an on-stack
object, i.e. essentially a pointer to a temporary on the stack.  If
saved_tu is a pointer, then it's going to be pointing at garbage when
the function returns; if it's an object, then it's going to take a copy
of just the base class, which isn't going to be usable either ("object
slicing").

> 
> Then in our plugin:
> ana::register_finish_translation_unit_callback (&stash_named_types);
> ana::register_finish_translation_unit_callback (&stash_global_vars);
> ana:: do_finish_translation_unit_callbacks();
> 
> With do_finish_translation_units passing the stored_tu to the
> callbacks.
> 
> Unfortunately, with this method, it seems like we encounter a
> segmentation fault when trying to call the lookup functions within
> translation_unit at the time of plugin initialization, even though
> the
> translation unit is stored correctly. 

I don't think the tu is getting stored correctly, due to the reasons
described above.


> So it seems like the solution
> may not be quite so simple.
> 
> I'm currently investigating this issue, but if there's an obvious
> solution that I might be missing or any general suggestions, please
> let me know!

My guess is that you were trying to do it from the PLUGIN_ANALYZER_INIT
hook rather than from the plugin_init function, but it's hard to be
sure without seeing the code.

Hope this is helpful
Dave

Reply via email to