[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2021-04-26 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko added a comment.

I'm responding here after a ping on Bugzilla. The short version is I've stopped 
working on this patch. While I like the revised design that involves storing a 
list of tokens to make plugins even more powerful, this turned out to be a much 
bigger implementation effort than I initially anticipated. I was unable to find 
help or additional resources for this work at Microsoft; the internal project 
that might have benefited from this work turned out to not need it *that* bad 
after all. So, regrettably, that meant there was little incentive left for me 
to pursue this patch.

I'll keep monitoring this as well as Bugzilla for discussions, and I'll provide 
information/context on design in whichever way that I can. Thanks again Aaron 
for all your precious help throughout these discussions -- without your 
guidance I probably wouldn't have gotten anywhere at all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86169/new/

https://reviews.llvm.org/D86169

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2020-08-27 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko added a comment.

In D86169#2241830 , @aaron.ballman 
wrote:

>> - Upon encountering `[[ attr(X) ]]` where `attr` is a plugin-handled 
>> attribute: clang parses `X` as token soup, only caring about finding a 
>> matching parenthesis
>
> Just to be clear, what you mean is to lex all of the tokens comprising `X` 
> and store them off for later?

Correct.

>> Does this stand any chance of working?
>
> This is along the same lines of what I was thinking, but there is one 
> potential snag. Clang layers our library components and Sema should never 
> refer back to the Parser (it doesn't link the library in). Because you have 
> an odd requirement where you need to do further parsing during the semantics 
> stage, you may run into awkwardness from this. Also, you may have to worry 
> about needing this functionality recursively. e.g., `int x [[your_attr(int y 
> [[your_attr(int z;)]];)]];` which could be... interesting.

I guess I'd have to i) call the action to create, say, a Decl, then later ii) 
call the plugin and extend the Decl with fresh attributes, assuming the 
attribute vector on the Decl can be grown *after* the Decl is created -- 
hopefully this can be driven from the parser, without a back-edge in the 
dependency graph? Or maybe I'm missing something?

Do you have any example somewhere of how to store tokens and re-fill the lex 
buffer with them later on? This is much more involved than my current patch so 
may take me a while to figure out.

>> Bonus question: seeing `diagAppertainsToDecl`, the name of this function led 
>> me to believe that for now, plugins could only handle attributes attached to 
>> declarations. Is this the case?
>
> Correct, the plugin system does not yet handle type or statement attributes 
> (it was focusing on declaration attributes first, which are the lion's share 
> of attributes in Clang).

Ok, I was asking because if there's no plans to handle type or statement 
attributes via a plugin, then we're somewhat over-engineering this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86169/new/

https://reviews.llvm.org/D86169

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2020-08-26 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko added a comment.

> I'm hoping we can devise a way to not require the entity being attributed to 
> be passed when parsing the attribute

That sounds nice. My only concern with this proposal is to make sure this 
doesn't rule out the interesting use-cases I was mentioning, like the 
`managed_string` example I presented in the motivation section. These use-cases 
would need to have some contextual information to perform meaningful parsing. 
Note that I'm not particularly committed to struct field declarations, one 
could imagine interesting use-cases with function arguments, such as:

  void f (int *arr [[ alloc_size(arr) == 8 ]], size_t idx [[ idx <= 4 ]])

But I agree about the parsing problems you mention, and being unable to 
meaningfully pass a `Declarator` (or anything else) based on incomplete parsing 
information. Here's a solution that might makes us both happy :), but you'd 
have to tell me if this is workable.

- Upon encountering `[[ attr(X) ]]` where `attr` is a plugin-handled attribute: 
clang parses `X` as token soup, only caring about finding a matching parenthesis
- The complete construct (declaration, type, statement, or whatever the 
attribute appertains to) is parsed.
- Plugin's `diagAppertainsToDecl` gets called (as before). Plugin is free to 
store in its internal state whatever information from the `Decl` is relevant.
- Plugin's `parseAttributePayload` gets called in a state that consumes the 
token soup (not sure if that's feasible?). The `parseAttributePayload` method 
does not receive a `Declarator` or anything else. Plugins can rely on the fact 
that if `diagAppertainsToDecl` returns `true`, `parseAttributePayload` will be 
called next, and refer to its internal state to fetch whatever information from 
the `Decl` it needs.

Does this stand any chance of working?

Bonus question: seeing `diagAppertainsToDecl`, the name of this function led me 
to believe that for now, plugins could only handle attributes attached to 
declarations. Is this the case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86169/new/

https://reviews.llvm.org/D86169

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2020-08-25 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko added a comment.

I also seem to have broken a bunch of tests, although I'm not sure why. If you 
see an obvious cause, I'm happy to get some pointers, otherwise, I'll just dig 
in an debug as I extend the test suite once the patch stabilizes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86169/new/

https://reviews.llvm.org/D86169

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2020-08-25 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko added a comment.

Thanks for the review!

Regarding Declarator: I briefly mentioned something related in the details 
section... actually maybe you can tell me what's the best approach here. I was 
thinking of just passing in a void* and expecting plugins to do

  if (const Declarator *D = isa_and_nonnull(Context))

This would allow a future-proof API where we can, in the future, pass in more 
classes for the "Context" argument (not just a Declarator), relying on clients 
doing dynamic casts. If I understood 
https://llvm.org/docs/ProgrammersManual.html#isa this `void*` cast would work?

My only hesitation is I'm not sure about the right technical device to use for 
this polymorphism pattern, as I've seen a couple related things in the clang 
code base:

- I've seen a pattern where there's a wrapper class that has a Kind() method 
and then allows downcasting based on the results of the Kind (seems overkill 
here)
- I've also seen references to `opaque_ptr` somewhere with a `get` method... 
(maybe these are helpers to do what I'm doing better?)

If you can confirm the void* is the right way to go (or point me to a suitable 
pattern I can take inspiration from), I'll submit a revised patch, along with 
the extra spelling argument (good point, thanks!).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86169/new/

https://reviews.llvm.org/D86169

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86169: Initial support for letting plugins perform custom parsing of attribute arguments.

2020-08-18 Thread Jonathan Protzenko via Phabricator via cfe-commits
jonathan.protzenko created this revision.
jonathan.protzenko added a reviewer: aaron.ballman.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.
jonathan.protzenko requested review of this revision.

This is a preliminary patch that I hope can serve as a basis for discussion. The
design summary below is copy/pasted from 
https://bugs.llvm.org/show_bug.cgi?id=46446#c7

(I'm intentionally posting a work-in-progress so that I can make sure that 
things
are roughly looking good; I'd like to defer details (clang-format'ing my
changes, naming conventions, tests) to later once the design is settled.)

The patch essentially adds the ability for a plugin to handle the parsing of a
CXX11 attribute's arguments. This means that with this patch, plugin-defined
CXX11 attributes are no longer limited to constant attributes that take no
arguments.

The patch intentionally gives the plugin a fair amount of freedom, to enable
interesting use-cases (see below).

The patch does not (yet) tackle the ability for out-of-tree plugins to extend
Attr.td. I'd be happy to work on that once the present patch makes progress.

Design
==

I chose to extend the ParsedAttrInfo class. This allows reusing the logic that
searches for an attribute plugin that has registered a matching spelling, and
reusing the existing registry of plugins. (I contemplated a separate registry,
but thought it would duplicate the spelling-search logic and would also lead to
code duplication in the plugin itself.)

By default, ParsedAttrInfo's (most of which do *not* come from plugins) are
happy to let the built-in clang parsing logic take effect, enforcing tweaking
knobs such as NumArgs and OptArgs. Plugins can, if they are so inclined, elect
to bypass clang's parsing logic (which, as we saw on Bugzilla, ignores CXX11
attribute arguments when they're not known to clang). In that case, plugins are
on their own; they are expected to handle all of the parsing, as well as pushing
the resulting ParsedAttr into the ParsedAttributes. They then report whether the
arguments are syntactically valid or not and the rest of the parsing code
handles error reporting.

(Side-note: this means that it's a three-state return value, and I'm currently
reusing the existing AttrHandling enum, but it could be improved with either
better names or a separate enum.)

The plugin receives, in addition to location information, two key parameters:

- the parser, obviously
- a possibly-null Declarator representing the partially parsed declaration that 
the attribute is attached to, if any.

The second parameter may be the most surprising part of this patch, so let me
try to provide some motivation and context.

My goal is to enable plugins to perform advanced static analysis and code
generation, relying on rich CXX11 attributes capturing deep semantic information
relating, say, a function's parameters, or a struct's fields.

You can easily imagine someone authoring a static analyzer that recognizes:

  typedef struct
  {
size_t len;
uint8_t *chars;
  [[ my_analyzer::invariant(strlen(chars) == len) ]];
  }
  managed_string;

The point here being to leverage a nice CXX11 attribute that benefits from
typo-checking, proper name and macro resolution, etc. rather than using some
unreliable syntax embedded in comments like so many other tools are doing, or
worse, a custom C++ parser.

In our case, we're auto-generating parsers and serializers from a struct type
definition (see https://www.usenix.org/system/files/sec19-ramananandro_0.pdf for
more info), so I'd like to author:

  typedef struct {
uint32_t min_version;
uint32_t max_version
  [[ everparse::constraint (max_version >= min_version && min_version >= 2 
&& max_version <= 4) ]];
  } my_network_format;
  `

Of course, our actual network formats are much more complicated, but this is
just for the sake of example. My point is that allowing plugins a substantial
degree of freedom is likely to enable a flurry of novel and exciting use-cases
based on clang.

Back to the design of the patch, receiving the Declarator allows me to do things
like this (in the plugin):

  Sema  = P->getActions();
  TypeSourceInfo *T = S.GetTypeForDeclarator(*D, P->getCurScope());
  QualType R = T->getType();
  VarDecl *VD = VarDecl::Create(S.getASTContext(), S.CurContext,
  D->getBeginLoc(), D->getIdentifierLoc(), D->getIdentifier(), R,
  T,
  SC_None);
  // Doing this makes sure Sema is aware of the new scope entry, meaning this 
name
  // will be in scope when parsing the expression. (Parsing and scope
  // resolution are intertwined.)
  VD->setImplicit(true);
  S.PushOnScopeChains(VD, P->getCurScope());

This allows my plugin to successfully parse and recognize complex arguments that
refer to the current declarator, in effect letting my plugin define its own
extended syntax and scoping rules.

I hope this helps, I'll keep working on the patch (this is the first version
that seems to do something useful) but