Re: [cfe-dev] Dumping AST information to other formats

2018-11-29 Thread George Karpenkov via cfe-commits
Hi Aaron,

You might find useful the recent work we have done on stable identifiers for 
AST:
now Stmt and Decl classes have a “getID” method,
which returns an identifier stable across different runs (at least on the same 
architecture, probably not the same for different ones).

George

> On Nov 27, 2018, at 6:22 AM, Aaron Ballman via cfe-dev 
>  wrote:
> 
> On Tue, Nov 27, 2018 at 4:50 AM Stephen Kelly via cfe-commits
> mailto:cfe-commits@lists.llvm.org>> wrote:
>> 
>> On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:
>>> Once upon a time, there was -ast-print-xml. This -cc1 option was
>>> dropped because it was frequently out of sync with the AST data. It is
>>> right to ask: why would JSON, etc be any different? This is still an
>>> open question, but a goal of this implementation will be to ensure
>>> it's easier to maintain as the AST evolves. However, this feature is
>>> intended to output a safe subset of AST information, so I don't think
>>> this feature will require any more burden to support than -ast-dump
>>> already requires (which is extremely limited).
>> 
>>> I wanted to see if there were concerns or implementation ideas the
>>> community wanted to share before beginning the implementation phase of
>>> this feature.
>> 
>> Hi Aaron,
>> 
>> As you know, I've already done some work in this area.
>> 
>> I split the ASTDumper.cpp into multiple classes so that the traversal of
>> the AST is separate to the printing of it to the output stream. You can
>> see the proof of concept here:
>> 
>>  https://github.com/steveire/clang/commits/extract-AST-dumping
>> 
>> though it is not ready for a real review. I just extracted it to a
>> branch for the purpose of this mailing list reply.
>> 
>> In my branch there are two implementations of outputter - one detailed
>> and one 'simplified' AST. You can see the difference here:
>> 
>>  http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JuAvs8
>> 
>> Because the traversal is in a separate class, it should be possible to
>> port ASTMatchFinder.cpp to use it, which will eliminate the class of
>> bugs that arise due to ASTMatchFinder.cpp currently using RAV. Here is
>> one such bug:
>> 
>>  https://bugs.llvm.org/show_bug.cgi?id=37629
>> 
>> but there are others for example relating to getting to a
>> CXXConstructorDecl from a CXXCtorInitializer, so it is a class of bug
>> rather than a single bug.
>> 
>> Because the traversal is in a separate class, you should be able to also
>> implement it for different output formats without a significant
>> maintenance burden.
>> 
>> Using this approach, the output formats will not get out of sync and
>> fall to the same fate as the XML output feature.
>> 
>> Let me know if you're interested.
> 
> Thank you for passing this along -- it's actually somewhat aligned
> with what I was envisioning. I very much like splitting out the
> traversal and the printing mechanisms.
> 
> Would you like to be included on the review thread when I submit a patch?
> 
> ~Aaron
> ___
> cfe-dev mailing list
> cfe-...@lists.llvm.org 
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev 
> 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: Dumping AST information to other formats

2018-11-28 Thread Stephen Kelly via cfe-commits

On 27/11/2018 09:49, Stephen Kelly via cfe-commits wrote:

On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:

Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited). 



I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.


Hi Aaron,

As you know, I've already done some work in this area.

I split the ASTDumper.cpp into multiple classes so that the traversal of 
the AST is separate to the printing of it to the output stream. You can 
see the proof of concept here:


  https://github.com/steveire/clang/commits/extract-AST-dumping



For those following along at home, I implemented a Proof of Concept JSON 
dumper on that github branch in order to show the benefit of splitting 
the traversal logic which is central to my design.


You can play with it here:

 http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JGBcRH

 http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/fIJjV1

This shows how simple it would be to maintain alternative output formats 
such as JSON and CBOR, which may interest clangd folks and others who 
want a binary AST dump.


Thanks,

Steve.

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: Dumping AST information to other formats

2018-11-27 Thread Aaron Ballman via cfe-commits
On Tue, Nov 27, 2018 at 4:50 AM Stephen Kelly via cfe-commits
 wrote:
>
> On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:
> > Once upon a time, there was -ast-print-xml. This -cc1 option was
> > dropped because it was frequently out of sync with the AST data. It is
> > right to ask: why would JSON, etc be any different? This is still an
> > open question, but a goal of this implementation will be to ensure
> > it's easier to maintain as the AST evolves. However, this feature is
> > intended to output a safe subset of AST information, so I don't think
> > this feature will require any more burden to support than -ast-dump
> > already requires (which is extremely limited).
>
> > I wanted to see if there were concerns or implementation ideas the
> > community wanted to share before beginning the implementation phase of
> > this feature.
>
> Hi Aaron,
>
> As you know, I've already done some work in this area.
>
> I split the ASTDumper.cpp into multiple classes so that the traversal of
> the AST is separate to the printing of it to the output stream. You can
> see the proof of concept here:
>
>   https://github.com/steveire/clang/commits/extract-AST-dumping
>
> though it is not ready for a real review. I just extracted it to a
> branch for the purpose of this mailing list reply.
>
> In my branch there are two implementations of outputter - one detailed
> and one 'simplified' AST. You can see the difference here:
>
>   http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JuAvs8
>
> Because the traversal is in a separate class, it should be possible to
> port ASTMatchFinder.cpp to use it, which will eliminate the class of
> bugs that arise due to ASTMatchFinder.cpp currently using RAV. Here is
> one such bug:
>
>   https://bugs.llvm.org/show_bug.cgi?id=37629
>
> but there are others for example relating to getting to a
> CXXConstructorDecl from a CXXCtorInitializer, so it is a class of bug
> rather than a single bug.
>
> Because the traversal is in a separate class, you should be able to also
> implement it for different output formats without a significant
> maintenance burden.
>
> Using this approach, the output formats will not get out of sync and
> fall to the same fate as the XML output feature.
>
> Let me know if you're interested.

Thank you for passing this along -- it's actually somewhat aligned
with what I was envisioning. I very much like splitting out the
traversal and the printing mechanisms.

Would you like to be included on the review thread when I submit a patch?

~Aaron
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: Dumping AST information to other formats

2018-11-27 Thread Stephen Kelly via cfe-commits

On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:

Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited). 



I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.


Hi Aaron,

As you know, I've already done some work in this area.

I split the ASTDumper.cpp into multiple classes so that the traversal of 
the AST is separate to the printing of it to the output stream. You can 
see the proof of concept here:


 https://github.com/steveire/clang/commits/extract-AST-dumping

though it is not ready for a real review. I just extracted it to a 
branch for the purpose of this mailing list reply.


In my branch there are two implementations of outputter - one detailed 
and one 'simplified' AST. You can see the difference here:


 http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JuAvs8

Because the traversal is in a separate class, it should be possible to 
port ASTMatchFinder.cpp to use it, which will eliminate the class of 
bugs that arise due to ASTMatchFinder.cpp currently using RAV. Here is 
one such bug:


 https://bugs.llvm.org/show_bug.cgi?id=37629

but there are others for example relating to getting to a 
CXXConstructorDecl from a CXXCtorInitializer, so it is a class of bug 
rather than a single bug.


Because the traversal is in a separate class, you should be able to also 
implement it for different output formats without a significant 
maintenance burden.


Using this approach, the output formats will not get out of sync and 
fall to the same fate as the XML output feature.


Let me know if you're interested.

Thanks,

Stephen.


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [cfe-dev] Dumping AST information to other formats

2018-11-26 Thread Aaron Ballman via cfe-commits
On Mon, Nov 26, 2018 at 2:27 PM Keane, Erich via cfe-commits
 wrote:
>
> I'd be tentative to commit to any stability guarantees, particularly as the 
> AST tends to change reasonably often.

Understood. This presupposes that there are stability guarantees to be
made surrounding the information provided by the AST, but I really
think those guarantees exist. We add new things to the AST, but we
rarely remove entire AST nodes or refactor AST nodes such that
critical information is no longer exposed. Most often, we reformulate
how information is exposed in the AST, modify enumerations, change
underlying data types, etc, but that should (ideally) not impact the
output of this feature.

>  That said, I can see the value of this.
>
> Additionally, it would be preferential I suspect if we could make the 
> standard ast-dump just another (albeit default) "format" in whatever plugin 
> architecture you plan.  I presume that would result in a lesser maintenance 
> burden (again, depending on the plugin architecture).

The syntax I was proposing is: -ast-dump, -ast-dump=format where
format can either be "default" or "json" (but maybe someday we decide
we need "xml" or "s-expr", etc). I wasn't envisioning a plugin for
this; I was proposing to modify the existing -ast-dump option to
optionally support specifying the format, then threading that format
information through to the action that dumps the AST out to a file in
order to pick the proper AST dumping interface.

~Aaron

>
> -Original Message-
> From: cfe-dev [mailto:cfe-dev-boun...@lists.llvm.org] On Behalf Of Aaron 
> Ballman via cfe-dev
> Sent: Monday, November 26, 2018 11:20 AM
> To: cfe-dev ; cfe-commits 
> Cc: Eric Schulte 
> Subject: [cfe-dev] Dumping AST information to other formats
>
> Clang currently supports various -cc1 options that allow displaying AST 
> information (-ast-dump, -ast-print, -ast-list, etc), but these options are 
> not convenient to consume by third-party tools. GrammaTech has ongoing 
> research efforts where we would like to output some information from the AST 
> to a more open machine-consumable format (such as JSON or s-expressions). We 
> propose adding an optional output format to the -ast-dump command allowing 
> the user to select from either the default or JSON formats. If the output 
> format is not explicitly specified, it will continue to default to the same 
> textual representation it uses today. e.g., clang -cc1 -ast-dump=json foo.c.
> This feature is intended to output a safe subset of AST information that is 
> considered crucial rather than an implementation detail (like the name of a 
> NamedDecl object and the SourceRange for the name), so the output is expected 
> to be mostly stable between releases.
>
> Once upon a time, there was -ast-print-xml. This -cc1 option was dropped 
> because it was frequently out of sync with the AST data. It is right to ask: 
> why would JSON, etc be any different? This is still an open question, but a 
> goal of this implementation will be to ensure it's easier to maintain as the 
> AST evolves. However, this feature is intended to output a safe subset of AST 
> information, so I don't think this feature will require any more burden to 
> support than -ast-dump already requires (which is extremely limited). If AST 
> information is found to be missing from the output, it seems reasonable to 
> have a discussion as to whether it is stable information or an implementation 
> detail, so missing information is to be expected rather than concerned by. 
> That said, GrammaTech is able to commit to maintaining this code for at least 
> the next 1-2 years and possibly beyond as it useful functionality for our 
> research efforts.
>
> I wanted to see if there were concerns or implementation ideas the community 
> wanted to share before beginning the implementation phase of this feature.
>
> ~Aaron
> ___
> cfe-dev mailing list
> cfe-...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> ___
> cfe-commits mailing list
> cfe-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


RE: [cfe-dev] Dumping AST information to other formats

2018-11-26 Thread Keane, Erich via cfe-commits
I'd be tentative to commit to any stability guarantees, particularly as the AST 
tends to change reasonably often.  That said, I can see the value of this.

Additionally, it would be preferential I suspect if we could make the standard 
ast-dump just another (albeit default) "format" in whatever plugin architecture 
you plan.  I presume that would result in a lesser maintenance burden (again, 
depending on the plugin architecture).  

-Original Message-
From: cfe-dev [mailto:cfe-dev-boun...@lists.llvm.org] On Behalf Of Aaron 
Ballman via cfe-dev
Sent: Monday, November 26, 2018 11:20 AM
To: cfe-dev ; cfe-commits 
Cc: Eric Schulte 
Subject: [cfe-dev] Dumping AST information to other formats

Clang currently supports various -cc1 options that allow displaying AST 
information (-ast-dump, -ast-print, -ast-list, etc), but these options are not 
convenient to consume by third-party tools. GrammaTech has ongoing research 
efforts where we would like to output some information from the AST to a more 
open machine-consumable format (such as JSON or s-expressions). We propose 
adding an optional output format to the -ast-dump command allowing the user to 
select from either the default or JSON formats. If the output format is not 
explicitly specified, it will continue to default to the same textual 
representation it uses today. e.g., clang -cc1 -ast-dump=json foo.c.
This feature is intended to output a safe subset of AST information that is 
considered crucial rather than an implementation detail (like the name of a 
NamedDecl object and the SourceRange for the name), so the output is expected 
to be mostly stable between releases.

Once upon a time, there was -ast-print-xml. This -cc1 option was dropped 
because it was frequently out of sync with the AST data. It is right to ask: 
why would JSON, etc be any different? This is still an open question, but a 
goal of this implementation will be to ensure it's easier to maintain as the 
AST evolves. However, this feature is intended to output a safe subset of AST 
information, so I don't think this feature will require any more burden to 
support than -ast-dump already requires (which is extremely limited). If AST 
information is found to be missing from the output, it seems reasonable to have 
a discussion as to whether it is stable information or an implementation 
detail, so missing information is to be expected rather than concerned by. That 
said, GrammaTech is able to commit to maintaining this code for at least the 
next 1-2 years and possibly beyond as it useful functionality for our research 
efforts.

I wanted to see if there were concerns or implementation ideas the community 
wanted to share before beginning the implementation phase of this feature.

~Aaron
___
cfe-dev mailing list
cfe-...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Dumping AST information to other formats

2018-11-26 Thread Aaron Ballman via cfe-commits
Clang currently supports various -cc1 options that allow displaying
AST information (-ast-dump, -ast-print, -ast-list, etc), but these
options are not convenient to consume by third-party tools. GrammaTech
has ongoing research efforts where we would like to output some
information from the AST to a more open machine-consumable format
(such as JSON or s-expressions). We propose adding an optional output
format to the -ast-dump command allowing the user to select from
either the default or JSON formats. If the output format is not
explicitly specified, it will continue to default to the same textual
representation it uses today. e.g., clang -cc1 -ast-dump=json foo.c.
This feature is intended to output a safe subset of AST information
that is considered crucial rather than an implementation detail (like
the name of a NamedDecl object and the SourceRange for the name), so
the output is expected to be mostly stable between releases.

Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited). If AST information is
found to be missing from the output, it seems reasonable to have a
discussion as to whether it is stable information or an implementation
detail, so missing information is to be expected rather than concerned
by. That said, GrammaTech is able to commit to maintaining this code
for at least the next 1-2 years and possibly beyond as it useful
functionality for our research efforts.

I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.

~Aaron
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits