[...]
> As noted in our chat earlier, I don't think we can easily make these
> work.  Looking at CPython's implementation: PyList_Type's initializer
> here:
> https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
> initializes tp_flags with the flags, but:
> (a) we don't see that code when compiling a user's extension module
> (b) even if we did, PyList_Type is non-const, so the analyzer has to
> assume that tp_flags could have been written to since it was
> initialized
>
> In theory we could specialcase such lookups, so that, say, a plugin
> could register assumptions into the analyzer about the value of bits
> within (PyList_Type.tp_flags).
>
> However, this seems like a future feature.

I agree that it is more appropriate as a future feature.

Recently, in preparation for a patch, I have been focusing on
migrating as much of our plugin-specific functionality as possible,
which is currently scattered across core analyzer files for
convenience, into the plugin itself. Specifically, I am currently
trying to transfer the code related to stashing Python-specific types
and global variables into analyzer_cpython_plugin.c. This approach has
three main benefits, among which some I believe we have previously
discussed:

1) We only need to search for these values when initializing our
plugin, instead of every time the analyzer is enabled.
2) We can extend the values that we stash by modifying only our
plugin, avoiding changes to core analyzer files such as
analyzer-language.cc, which seems a safer and more resilient approach.
3) Future analyzer plugins will have an easier time stashing values
relevant to their respective projects.

Let me know if my concerns or reasons appear unfounded.

My initial approach involved adding a hook to the end of
ana::on_finish_translation_unit which calls the relevant
stashing-related callbacks registered during plugin initialization.
Here's a rough sketch:

void
on_finish_translation_unit (const translation_unit &tu)
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  do_finish_translation_unit_callbacks(the_logger.get_logger (), tu);
}

Inside do_finish_translation_unit_callbacks we have a loop like so:

for (auto& callback : finish_translation_unit_callbacks)
{
    callback(logger, tu);
}

Where finish_translation_unit_callbacks is a vector defined as follows:
typedef void (*finish_translation_unit_callback) (logger *, const
translation_unit &);
vec<finish_translation_unit_callback> *finish_translation_unit_callbacks;

To register a callback, we use:

void
register_finish_translation_unit_callback (
    finish_translation_unit_callback callback)
{
  if (!finish_translation_unit_callbacks)
    vec_alloc (finish_translation_unit_callbacks, 1);
  finish_translation_unit_callbacks->safe_push (callback);
}

And finally, from our plugin (or any other plugin), we can register
callbacks like so:
ana::register_finish_translation_unit_callback (&stash_named_types);
ana::register_finish_translation_unit_callback (&stash_global_vars);

However, on_finish_translation_unit runs before plugin initialization
occurs, so, unfortunately, we would be registering our callbacks after
on_finish_translation_unit with this method. As a workaround, I tried
saving the translation unit like this:

void
on_finish_translation_unit (const translation_unit &tu)
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  saved_tu = &tu;
}

Then in our plugin:
ana::register_finish_translation_unit_callback (&stash_named_types);
ana::register_finish_translation_unit_callback (&stash_global_vars);
ana:: do_finish_translation_unit_callbacks();

With do_finish_translation_units passing the stored_tu to the callbacks.

Unfortunately, with this method, it seems like we encounter a
segmentation fault when trying to call the lookup functions within
translation_unit at the time of plugin initialization, even though the
translation unit is stored correctly. So it seems like the solution
may not be quite so simple.

I'm currently investigating this issue, but if there's an obvious
solution that I might be missing or any general suggestions, please
let me know!

Thanks as always,
Eric

Reply via email to