Re: progress update after initial GSoC virtual meetup

2021-06-13 Thread David Malcolm via Gcc
On Sun, 2021-06-13 at 19:11 +0530, Ankur Saini wrote:
> 
> 
> > On 08-Jun-2021, at 11:24 PM, David Malcolm 
> > wrote:
> > 
> > Is there a URL for your branch?
> 
> no, currently it only local branch on my machine. Should I upload it on
> a hosting site ( like GitHub ) ? or can I create a branch on remote
> also ?

At some point we want you to be able to push patches to trunk, so as a
step towards that I think it would be good for you to have a personal
branch on the gcc git repository.

A guide to getting access is here:
  https://gcc.gnu.org/gitwrite.html

I will sponsor you.

> 
> > The issue is that the analyzer currently divides calls into
> > (a) calls where GCC's middle-end "knows" which function is called,
> > and
> > thus the call site has a cgraph_node.
> > (b) calls where GCC's middle-end doesn't "know" which function is
> > called.
> > 
> > The analyzer handles
> >  (a) by building call and return edges in the supergraph, and
> > processing them, and
> >  (b) with an "unknown call" handler, which conservatively sets lots
> > of
> > state to "unknown" to handle the effects of an arbitrary call, and
> > where the call doesn't get its own exploded_edge.
> 
> > 
> > In this bug we have a variant of (b), let's call it (c): GCC's
> > middle-
> > end doesn't know which function is called, but the analyzer's
> > region_model *does* know at a particular exploded_node.
> 
> but how will the we know this at the time of creation of supergraph?
> isn’t exploded graph and regional model created after the supergraph ?

You are correct.

What I'm thinking is that when we create the supergraph we should split
the nodes at more calls, not just at those calls that have a
cgraph_edge, but also at those that are calls to an unknown function
pointer (or maybe even split them at *all* calls).

Then, later, when engine.cc is building the exploded_graph, the
supergraph will have a superedge for those calls, and we can create an
exploded_edge representing the call.  That way if we discover the
function pointer then (rather than having it from a cgraph_edge), we
can build exploded nodes and exploded edges that are similar to the "we
had a cgraph_edge" case.  You may need to generalize some of the event-
handling code to do this.

Does that make sense?

You might want to try building some really simple examples of this, to
make it as easy as possible to see what's happening, and to debug.

> 
> >  I expect this kind of thing will also arise for virtual function
> > calls.
> 
> yes, it would be a similar case as if the call is not devirtualised,
> GCC’s middle-end would not know which function is being called but our
> regional model would know about the same.

Yes.

> 
> >  So I think you should look at supergraph.cc at where it handles
> > calls; I think we
> > need to update how it handles (b), so that it can handle the (c)
> > cases,
> > probably by splitting supernodes at all call sites, rather than
> > just
> > those with cgraph_edges, and then creating exploded_edges (with
> > custom
> > edge info) for calls where the analyzer "figured out" what the
> > function
> > pointer was in the region_model, even if there wasn't a
> > cgraph_node.
> 
> > 
> > Does that make sense?
> 
> ok so we are leaving the decision of how to handle case (b) to
> explodedgraph with the additional info from the regional model and
> create a call and return supernodes for all type of function calls
> whether or not middle-end know which function is called or not, makes
> sense. ( ok so this answers my previous question )
> 
> I went through supergraph.cc  and can see the
> splitting happening in the constructor’s (supergraph::supergraph() )
> at the end of first pass.

It sounds to me like you are on the right track.

> 
> > 
> > Or you could attack the problem from the other direction, by
> > looking at
> > what GCC generates for a vfunc call, and seeing if you can get the
> > region_model to "figure out" what the function pointer is at a
> > particular exploded_node.
> 
> I will also be looking at this after the fixing the above problem, my
> current plan is to see how GCC's devirtualiser do it.

OK.

> 
> > 
> > > 
> > > also, should I prefer discussing about this bug here( gcc mailing
> > > list) or on the bugzilla itself ?
> > 
> > Either way works for me.  Maybe on this list?  (given that this
> > feels
> > like a design question)
> 
> ok
> 
> > 
> > Hope this is helpful
> > Dave
> 
> Thanks
> 
> - Ankur

Great.

Let me know how you get on.

As I understand it, Google recommends that we're exchanging emails
about our GSoC project at least two times a week, so please do continue
to report in, whether you're making progress, or if you feel you're
stuck on something.

Hope this is constructive.
Dave




Re: progress update after initial GSoC virtual meetup

2021-06-13 Thread Ankur Saini via Gcc



> On 08-Jun-2021, at 11:24 PM, David Malcolm  wrote:
> 
> Is there a URL for your branch?

no, currently it only local branch on my machine. Should I upload it on a 
hosting site ( like GitHub ) ? or can I create a branch on remote also ?

> The issue is that the analyzer currently divides calls into
> (a) calls where GCC's middle-end "knows" which function is called, and
> thus the call site has a cgraph_node.
> (b) calls where GCC's middle-end doesn't "know" which function is
> called.
> 
> The analyzer handles
>  (a) by building call and return edges in the supergraph, and
> processing them, and
>  (b) with an "unknown call" handler, which conservatively sets lots of
> state to "unknown" to handle the effects of an arbitrary call, and
> where the call doesn't get its own exploded_edge.

> 
> In this bug we have a variant of (b), let's call it (c): GCC's middle-
> end doesn't know which function is called, but the analyzer's
> region_model *does* know at a particular exploded_node.

but how will the we know this at the time of creation of supergraph? isn’t 
exploded graph and regional model created after the supergraph ?

>  I expect this kind of thing will also arise for virtual function calls.

yes, it would be a similar case as if the call is not devirtualised, GCC’s 
middle-end would not know which function is being called but our regional model 
would know about the same.

>  So I think you should look at supergraph.cc at where it handles calls; I 
> think we
> need to update how it handles (b), so that it can handle the (c) cases,
> probably by splitting supernodes at all call sites, rather than just
> those with cgraph_edges, and then creating exploded_edges (with custom
> edge info) for calls where the analyzer "figured out" what the function
> pointer was in the region_model, even if there wasn't a cgraph_node.

> 
> Does that make sense?

ok so we are leaving the decision of how to handle case (b) to explodedgraph 
with the additional info from the regional model and create a call and return 
supernodes for all type of function calls whether or not middle-end know which 
function is called or not, makes sense. ( ok so this answers my previous 
question )

I went through supergraph.cc  and can see the splitting 
happening in the constructor’s (supergraph::supergraph() ) at the end of first 
pass.

> 
> Or you could attack the problem from the other direction, by looking at
> what GCC generates for a vfunc call, and seeing if you can get the
> region_model to "figure out" what the function pointer is at a
> particular exploded_node.

I will also be looking at this after the fixing the above problem, my current 
plan is to see how GCC's devirtualiser do it.

> 
>> 
>> also, should I prefer discussing about this bug here( gcc mailing
>> list) or on the bugzilla itself ?
> 
> Either way works for me.  Maybe on this list?  (given that this feels
> like a design question)

ok

> 
> Hope this is helpful
> Dave

Thanks

- Ankur

Re: progress update after initial GSoC virtual meetup

2021-06-08 Thread David Malcolm via Gcc
On Tue, 2021-06-08 at 21:20 +0530, Ankur Saini wrote:
> 
> 
> > On 01-Jun-2021, at 6:38 PM, David Malcolm 
> > wrote:
> > 

[...snip...]

> > Maybe it's best to have an
> > account on the GCC compile farm for this:
> >  https://gcc.gnu.org/wiki/CompileFarm
> > IIRC you already have such an account.  It might be worth trying
> > out a
> > full bootstrap and testsuite run on one of the powerful machines in
> > the
> > farm.   I tend to use "screen" in case my ssh connection drops
> > during
> > through a build, so that losing the ssh connection doesn't kill the
> > build.
> 
> I tried this, and it’s awesome :D , I was able to complete the either
> bootstrap build on one of the powerful machine there with almost
> similar time to that of what my laptop takes to build with bootstrap
> disabled.

Great.

[...snip...]

> 
> > - it might be good to create a personal branch on the gcc git
> > repository that you can push your work to.  I'm in two minds about
> > this, in that ideally you'd just commit your work to trunk once
> > each
> > patch is approved, but maybe it's good to have a public place as a
> > backup of the "under development" stuff?  Also, at some point we
> > want
> > you to be pushing changes to the trunk, so we'll want your account
> > to
> > be able to do that.
> 
> I already did that when I was fiddling around with the source code
> and track my changes seperately

Is there a URL for your branch?

> > 
> > If you're *really* eager to start, you might want to look at 
> >  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
> > This is a case where the analyzer "sees" a call through a function
> > pointer, and, despite figuring out what the function pointer
> > actually
> > points to, entirely fails to properly handle the call, since the
> 
> > supergraph and engine.cc code is looking at the static callgraph,
> > and
> > needs work to handle such calls through function pointers.  I
> > started
> > debugging this a couple of weeks ago, and realized it has a *lot*
> > of
> > similarities to the vtable case, so thought I might leave it so you
> > can
> > have a go at it once the project starts properly.
> 
> yes, looking at exploded graph, the analyzer is not able to
> understand the call to function “noReturn()” when called via a
> function pointer ( noReturnPtr.0_1 ("”); ) at all. I would be
> looking into it and will report back as soon as I find something
> useful.

The issue is that the analyzer currently divides calls into
(a) calls where GCC's middle-end "knows" which function is called, and
thus the call site has a cgraph_node.
(b) calls where GCC's middle-end doesn't "know" which function is
called.

The analyzer handles
  (a) by building call and return edges in the supergraph, and
processing them, and
  (b) with an "unknown call" handler, which conservatively sets lots of
state to "unknown" to handle the effects of an arbitrary call, and
where the call doesn't get its own exploded_edge.

In this bug we have a variant of (b), let's call it (c): GCC's middle-
end doesn't know which function is called, but the analyzer's
region_model *does* know at a particular exploded_node.  I expect this
kind of thing will also arise for virtual function calls.  So I think
you should look at supergraph.cc at where it handles calls; I think we
need to update how it handles (b), so that it can handle the (c) cases,
probably by splitting supernodes at all call sites, rather than just
those with cgraph_edges, and then creating exploded_edges (with custom
edge info) for calls where the analyzer "figured out" what the function
pointer was in the region_model, even if there wasn't a cgraph_node.

Does that make sense?

Or you could attack the problem from the other direction, by looking at
what GCC generates for a vfunc call, and seeing if you can get the
region_model to "figure out" what the function pointer is at a
particular exploded_node.

> 
> also, should I prefer discussing about this bug here( gcc mailing
> list) or on the bugzilla itself ?

Either way works for me.  Maybe on this list?  (given that this feels
like a design question)

Hope this is helpful
Dave





Re: progress update after initial GSoC virtual meetup

2021-06-08 Thread Ankur Saini via Gcc



> On 01-Jun-2021, at 6:38 PM, David Malcolm  wrote:
> 
> - able to build the analyzer from source *quickly*, for hacking on the
> code.  i.e. with --disable-bootstrap.  We want to minimize the time it
> takes to, say, hack in a print statement into a single .cc file in the
> analyzer subdirectory, rebuild, and rerun.   With bootstrapping
> disabled, if you run "make -jsome-number-of-cores" from the build
> directory's "gcc" subdirectory, it should merely rebuild the .o file
> for the .cc you touched, and do some relinking (and rerun the
> selftests); hopefully such an edit should take less than a minute
> before you're able to run the code and see the results.
> 
> It sounds like you're close to being able to do that.
> 
> (FWIW I tend to use gdb rather than putting in print statements, I tend
> to hack in gcc_unreachable into conditions that I hope are being hit,
> so that execution will stop at that point in gdb if my assumptions are
> correct, and then I can print things, inject calls, etc in gdb)
> 
> - able to build the analyzer with a full bootstrap (--enable-bootstrap
> is the default) and running the regression test suites ("make check -
> jnumber-of-cores"),  On the fastest box I have this (64 cores, 128 GB
> ram) this takes about 45 minutes to do the build and about 45 minutes
> to do the testsuites; it used to take up to three hours total when I
> was running it on a laptop (and thus was a major pain as it's no fun to
> have a hot noisy laptop for several hours).  Maybe it's best to have an
> account on the GCC compile farm for this:
>  https://gcc.gnu.org/wiki/CompileFarm
> IIRC you already have such an account.  It might be worth trying out a
> full bootstrap and testsuite run on one of the powerful machines in the
> farm.   I tend to use "screen" in case my ssh connection drops during
> through a build, so that losing the ssh connection doesn't kill the
> build.

I tried this, and it’s awesome :D , I was able to complete the either bootstrap 
build on one of the powerful machine there with almost similar time to that of 
what my laptop takes to build with bootstrap disabled.

> 
> - able to step through the code in the debugger.  IIRC you've already
> been doing that.

> 
> - copyright assignment paperwork to the FSF.  IIRC you've already done
> that.

done

> 
> - ability to run just a single test in the testsuite, rather than the
> whole lot (so that you can easily develop new tests without having to
> run everything each time you make an edit to a test).  As you say
> above, you've done that.

done

> 
> - the analyzer has testcases for C, C++ and Fortran, so you might want
> to figure out the argument you need for --enable-languages= when
> configuring GCC to enable those languages (but probably no others when
> hacking, to speed of rebuilding GCC).  Obviously you'll need C++, as
> C++ support is the point of your project.

Right now I only enable C and C++ when building.

> 
> - it might be good to create a personal branch on the gcc git
> repository that you can push your work to.  I'm in two minds about
> this, in that ideally you'd just commit your work to trunk once each
> patch is approved, but maybe it's good to have a public place as a
> backup of the "under development" stuff?  Also, at some point we want
> you to be pushing changes to the trunk, so we'll want your account to
> be able to do that.

I already did that when I was fiddling around with the source code and track my 
changes seperately

> 
> If you're *really* eager to start, you might want to look at 
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
> This is a case where the analyzer "sees" a call through a function
> pointer, and, despite figuring out what the function pointer actually
> points to, entirely fails to properly handle the call, since the

> supergraph and engine.cc code is looking at the static callgraph, and
> needs work to handle such calls through function pointers.  I started
> debugging this a couple of weeks ago, and realized it has a *lot* of
> similarities to the vtable case, so thought I might leave it so you can
> have a go at it once the project starts properly.

yes, looking at exploded graph, the analyzer is not able to understand the call 
to function “noReturn()” when called via a function pointer ( noReturnPtr.0_1 
("”); ) at all. I would be looking into it and will report back as soon 
as I find something useful.

also, should I prefer discussing about this bug here( gcc mailing list) or on 
the bugzilla itself ?

>  That said, before
> the 7th you're meant to be focusing on schoolwork, I think, so we
> really ought to be merely just sorting out accounts, ensuring your
> coding environment is set up, etc.

ok

> 
> Hope this is helpful

Thanks,

- Ankur

Re: progress update after initial GSoC virtual meetup

2021-06-01 Thread David Malcolm via Gcc
On Sun, 2021-05-30 at 20:38 +0530, Ankur Saini wrote:
> hello 

Hi Ankur, sorry about the delayed reply (it was a long weekend here in
the US)

> I was successfully able to build gcc with bootstrapping disabled and
> using xgcc directly from the build directory instead ( reducing the
> overall build time a lot, although it still takes about half an hour to
> build but it’s much faster than before ). 

Excellent.


> Also I was also able to run one single test on the built compiler.

Great.

> 
> Is there anything else I should be knowing to aid in development or
> should we start planing and preparing towards the project so that we
> can have a head start during coding phase ?

I tried brainstorming "what does a new contributor to GCC's analyzer
need to be able to do"; here's what I came up with:

- able to build the analyzer from source *quickly*, for hacking on the
code.  i.e. with --disable-bootstrap.  We want to minimize the time it
takes to, say, hack in a print statement into a single .cc file in the
analyzer subdirectory, rebuild, and rerun.   With bootstrapping
disabled, if you run "make -jsome-number-of-cores" from the build
directory's "gcc" subdirectory, it should merely rebuild the .o file
for the .cc you touched, and do some relinking (and rerun the
selftests); hopefully such an edit should take less than a minute
before you're able to run the code and see the results.

It sounds like you're close to being able to do that.

(FWIW I tend to use gdb rather than putting in print statements, I tend
to hack in gcc_unreachable into conditions that I hope are being hit,
so that execution will stop at that point in gdb if my assumptions are
correct, and then I can print things, inject calls, etc in gdb)

- able to build the analyzer with a full bootstrap (--enable-bootstrap
is the default) and running the regression test suites ("make check -
jnumber-of-cores"),  On the fastest box I have this (64 cores, 128 GB
ram) this takes about 45 minutes to do the build and about 45 minutes
to do the testsuites; it used to take up to three hours total when I
was running it on a laptop (and thus was a major pain as it's no fun to
have a hot noisy laptop for several hours).  Maybe it's best to have an
account on the GCC compile farm for this:
  https://gcc.gnu.org/wiki/CompileFarm
IIRC you already have such an account.  It might be worth trying out a
full bootstrap and testsuite run on one of the powerful machines in the
farm.   I tend to use "screen" in case my ssh connection drops during
through a build, so that losing the ssh connection doesn't kill the
build.

- able to step through the code in the debugger.  IIRC you've already
been doing that.

- copyright assignment paperwork to the FSF.  IIRC you've already done
that.

- ability to run just a single test in the testsuite, rather than the
whole lot (so that you can easily develop new tests without having to
run everything each time you make an edit to a test).  As you say
above, you've done that.

- the analyzer has testcases for C, C++ and Fortran, so you might want
to figure out the argument you need for --enable-languages= when
configuring GCC to enable those languages (but probably no others when
hacking, to speed of rebuilding GCC).  Obviously you'll need C++, as
C++ support is the point of your project.

- it might be good to create a personal branch on the gcc git
repository that you can push your work to.  I'm in two minds about
this, in that ideally you'd just commit your work to trunk once each
patch is approved, but maybe it's good to have a public place as a
backup of the "under development" stuff?  Also, at some point we want
you to be pushing changes to the trunk, so we'll want your account to
be able to do that.

I hope all the above makes sense.  Don't hesitate to ask questions;
finding things out is the whole point of this part of the GSoC
schedule.

Can anyone think of something else that's worth sorting out in this
preliminary phase?

I don't think you're meant to be spending more than an hour or so a
week in this preliminary phase until the coding period officially
starts on Monday 7th.

If you're *really* eager to start, you might want to look at 
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
This is a case where the analyzer "sees" a call through a function
pointer, and, despite figuring out what the function pointer actually
points to, entirely fails to properly handle the call, since the
supergraph and engine.cc code is looking at the static callgraph, and
needs work to handle such calls through function pointers.  I started
debugging this a couple of weeks ago, and realized it has a *lot* of
similarities to the vtable case, so thought I might leave it so you can
have a go at it once the project starts properly.  That said, before
the 7th you're meant to be focusing on schoolwork, I think, so we
really ought to be merely just sorting out accounts, ensuring your
coding environment is set up, etc.

Hope this is helpful

Dave



progress update after initial GSoC virtual meetup

2021-05-30 Thread Ankur Saini via Gcc
hello 

I was successfully able to build gcc with bootstrapping disabled and using xgcc 
directly from the build directory instead ( reducing the overall build time a 
lot, although it still takes about half an hour to build but it’s much faster 
than before ). Also I was also able to run one single test on the built 
compiler. 

Is there anything else I should be knowing to aid in development or should we 
start planing and preparing towards the project so that we can have a head 
start during coding phase ?

Thanks,

- Ankur