How to check if two symbols are from same source unit during WPA ?

2016-04-12 Thread Prathamesh Kulkarni
Hi,
How to check if two symbols are from same source file during WPA ?
Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
and symbol2 are from same
source files ? Would that be a sufficient condition or do I need to
check for something more ?

Thanks,
Prathamesh


Re: How to check if two symbols are from same source unit during WPA ?

2016-04-12 Thread Richard Biener
On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
 wrote:
>Hi,
>How to check if two symbols are from same source file during WPA ?
>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
>and symbol2 are from same
>source files ? Would that be a sufficient condition or do I need to
>check for something more ?

Why would you want to know?  The proposed check would verify it comes from the 
same TU.

Richard.

>Thanks,
>Prathamesh




Re: How to check if two symbols are from same source unit during WPA ?

2016-04-13 Thread Prathamesh Kulkarni
On 12 April 2016 at 22:41, Richard Biener  wrote:
> On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
>  wrote:
>>Hi,
>>How to check if two symbols are from same source file during WPA ?
>>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
>>and symbol2 are from same
>>source files ? Would that be a sufficient condition or do I need to
>>check for something more ?
>
> Why would you want to know?  The proposed check would verify it comes from 
> the same TU.
Hi,
I am trying to address issue with section anchors with LTO.
In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
Therefore we can take advantage of section anchors.
IIUC with balanced partitioning, variable is added in the first partition it is
referenced in. So it may get separated into other partition from functions
referencing it, thus causing it to be loaded as an external reference.

My point is that the "first" partition to which the variable is added may not
be the best choice for it.
For instance:
file1.c defines variables 'a' and 'b' and functions f1(), f2()
file2.c defines f3().
f1, f2, f3 use 'a' and 'b'.

It could happen that f3, a and b are mapped to one partition (assuming
f3 is processed first), and f1, f2 are mapped to another partition.
so f1 and f2 are forced to load a, b as external references.
It would be better instead to put  a and b in same partition as f1 and f2.

As first cut I tried to implement the following very simple approach:
Assume V(f) gives a set of all variables referenced by function f.
Let S = V(f1) intersect V(f2) ... intersect V(fn).
Add f1, f2, .. fn and S to same partition if
a) S is non-empty, that is, we add functions and "common" variables
referenced by these functions to same partition.
b) Function uses at least two global variables. If a function uses
only a single global variable, then AFAIU it wouldn't matter if it's loaded
as external reference.
However the above approach results in quite distorted partitioning
creating very large partitions and took 4x time to build chromium with LTO.

So I tried to restrain adding number of symbols, by allowing them to be added
only if they are from same TU.
This would (hopefully) ensure that we are not worse than non-LTO case
since we add functions and common set of variables referenced by these
functions to same partition provided all of them belong to same TU.
As a worst-case, it will add all symbols from the TU to which function
belongs, if all the functions in
that TU reference >=2 global variables and reference at least one global
variable in common from the same TU. We might want to put a higher upper
limit than 1 for number of shared referenced global vars to stop adding
too many symbols to same partition.

For eg:
file1.c  defines variables 'a', 'b' and functions f1_ab(), f2_ab()
file2.c: defines f3_ab()
where f1_ab, f2_ab, f3_ab both use a, b
If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
to the same partition but
not f3_ab() since it belongs to another TU.
(f3_ab() may get added to the same partition by another iteration if
there's enough space in
the partition but that's irrelevant).

Ideally I would like to put those functions and variables in the same
partition such that the use
of section anchors can be maximized, which would also involve adding
functions from other TU.
In above case it's desirable to add f3_ab to the same
partition as f1_ab, however I am not sure what constraints to check for since
we could potentially end up adding lots of symbols (as in approach 1).
For now, I only considered the check if they are from same TU.

I have attached a prototype patch that implements the above approach.
It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
to build chromium with LTO as without patch.
The patch should only affect targets with section anchor support (arm,
aarch64, ppc).
I have to yet perform benchmarking with the patch, I will get back
with some results
after I get that done.
I would be grateful for feedback especially regarding correctness.

Thanks,
Prathamesh
>
> Richard.
>
>>Thanks,
>>Prathamesh
>
>
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 9eb63c2..c7f17f1 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
+#include 
+#include 
+#include "toplev.h" /* FIXME: for target_supports_section_anchor_p().  */
 
 vec ltrans_partitions;
 
@@ -407,6 +410,63 @@ add_sorted_nodes (vec &next_nodes, 
ltrans_partition partition)
   add_symbol_to_partition (partition, node);
 }
 
+/* Let V(f) be the set of variables referenced by function f and defined in
+   same TU as f.
+   Let S = V(f1) interset V(f2)  intersect V(fn).
+   Add f1, f2, .. fn and S to set if:
+   i) S is non-empty
+   ii) Function uses at-least 2 global variables. 
+   iii) f1, f2, ... fn are from same TU.
+
+

Re: How to check if two symbols are from same source unit during WPA ?

2016-04-14 Thread Richard Biener
On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:

> On 12 April 2016 at 22:41, Richard Biener  wrote:
> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
> >  wrote:
> >>Hi,
> >>How to check if two symbols are from same source file during WPA ?
> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
> >>and symbol2 are from same
> >>source files ? Would that be a sufficient condition or do I need to
> >>check for something more ?
> >
> > Why would you want to know?  The proposed check would verify it comes from 
> > the same TU.
> Hi,
> I am trying to address issue with section anchors with LTO.
> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
> Therefore we can take advantage of section anchors.
> IIUC with balanced partitioning, variable is added in the first partition it 
> is
> referenced in. So it may get separated into other partition from functions
> referencing it, thus causing it to be loaded as an external reference.
> 
> My point is that the "first" partition to which the variable is added may not
> be the best choice for it.
> For instance:
> file1.c defines variables 'a' and 'b' and functions f1(), f2()
> file2.c defines f3().
> f1, f2, f3 use 'a' and 'b'.
> 
> It could happen that f3, a and b are mapped to one partition (assuming
> f3 is processed first), and f1, f2 are mapped to another partition.
> so f1 and f2 are forced to load a, b as external references.
> It would be better instead to put  a and b in same partition as f1 and f2.
> 
> As first cut I tried to implement the following very simple approach:
> Assume V(f) gives a set of all variables referenced by function f.
> Let S = V(f1) intersect V(f2) ... intersect V(fn).
> Add f1, f2, .. fn and S to same partition if
> a) S is non-empty, that is, we add functions and "common" variables
> referenced by these functions to same partition.
> b) Function uses at least two global variables. If a function uses
> only a single global variable, then AFAIU it wouldn't matter if it's loaded
> as external reference.
> However the above approach results in quite distorted partitioning
> creating very large partitions and took 4x time to build chromium with LTO.
> 
> So I tried to restrain adding number of symbols, by allowing them to be added
> only if they are from same TU.
> This would (hopefully) ensure that we are not worse than non-LTO case
> since we add functions and common set of variables referenced by these
> functions to same partition provided all of them belong to same TU.
> As a worst-case, it will add all symbols from the TU to which function
> belongs, if all the functions in
> that TU reference >=2 global variables and reference at least one global
> variable in common from the same TU. We might want to put a higher upper
> limit than 1 for number of shared referenced global vars to stop adding
> too many symbols to same partition.
> 
> For eg:
> file1.c  defines variables 'a', 'b' and functions f1_ab(), f2_ab()
> file2.c: defines f3_ab()
> where f1_ab, f2_ab, f3_ab both use a, b
> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
> to the same partition but
> not f3_ab() since it belongs to another TU.
> (f3_ab() may get added to the same partition by another iteration if
> there's enough space in
> the partition but that's irrelevant).
> 
> Ideally I would like to put those functions and variables in the same
> partition such that the use
> of section anchors can be maximized, which would also involve adding
> functions from other TU.
> In above case it's desirable to add f3_ab to the same
> partition as f1_ab, however I am not sure what constraints to check for since
> we could potentially end up adding lots of symbols (as in approach 1).
> For now, I only considered the check if they are from same TU.
> 
> I have attached a prototype patch that implements the above approach.
> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
> to build chromium with LTO as without patch.
> The patch should only affect targets with section anchor support (arm,
> aarch64, ppc).
> I have to yet perform benchmarking with the patch, I will get back
> with some results
> after I get that done.
> I would be grateful for feedback especially regarding correctness.

Ok, I didn't try to fully understand what you did but why not simply
keep partitioning of functions as-is and simply put variables into
the partition which contains the most references to it?

Section anchor optimization should be a 2nd choice for partitioning
and really partitioning functions together like we do should have priority.

Richard.


Re: How to check if two symbols are from same source unit during WPA ?

2016-04-14 Thread Prathamesh Kulkarni
On 14 April 2016 at 13:56, Richard Biener  wrote:
> On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:
>
>> On 12 April 2016 at 22:41, Richard Biener  wrote:
>> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
>> >  wrote:
>> >>Hi,
>> >>How to check if two symbols are from same source file during WPA ?
>> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
>> >>and symbol2 are from same
>> >>source files ? Would that be a sufficient condition or do I need to
>> >>check for something more ?
>> >
>> > Why would you want to know?  The proposed check would verify it comes from 
>> > the same TU.
>> Hi,
>> I am trying to address issue with section anchors with LTO.
>> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
>> Therefore we can take advantage of section anchors.
>> IIUC with balanced partitioning, variable is added in the first partition it 
>> is
>> referenced in. So it may get separated into other partition from functions
>> referencing it, thus causing it to be loaded as an external reference.
>>
>> My point is that the "first" partition to which the variable is added may not
>> be the best choice for it.
>> For instance:
>> file1.c defines variables 'a' and 'b' and functions f1(), f2()
>> file2.c defines f3().
>> f1, f2, f3 use 'a' and 'b'.
>>
>> It could happen that f3, a and b are mapped to one partition (assuming
>> f3 is processed first), and f1, f2 are mapped to another partition.
>> so f1 and f2 are forced to load a, b as external references.
>> It would be better instead to put  a and b in same partition as f1 and f2.
>>
>> As first cut I tried to implement the following very simple approach:
>> Assume V(f) gives a set of all variables referenced by function f.
>> Let S = V(f1) intersect V(f2) ... intersect V(fn).
>> Add f1, f2, .. fn and S to same partition if
>> a) S is non-empty, that is, we add functions and "common" variables
>> referenced by these functions to same partition.
>> b) Function uses at least two global variables. If a function uses
>> only a single global variable, then AFAIU it wouldn't matter if it's loaded
>> as external reference.
>> However the above approach results in quite distorted partitioning
>> creating very large partitions and took 4x time to build chromium with LTO.
>>
>> So I tried to restrain adding number of symbols, by allowing them to be added
>> only if they are from same TU.
>> This would (hopefully) ensure that we are not worse than non-LTO case
>> since we add functions and common set of variables referenced by these
>> functions to same partition provided all of them belong to same TU.
>> As a worst-case, it will add all symbols from the TU to which function
>> belongs, if all the functions in
>> that TU reference >=2 global variables and reference at least one global
>> variable in common from the same TU. We might want to put a higher upper
>> limit than 1 for number of shared referenced global vars to stop adding
>> too many symbols to same partition.
>>
>> For eg:
>> file1.c  defines variables 'a', 'b' and functions f1_ab(), f2_ab()
>> file2.c: defines f3_ab()
>> where f1_ab, f2_ab, f3_ab both use a, b
>> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
>> to the same partition but
>> not f3_ab() since it belongs to another TU.
>> (f3_ab() may get added to the same partition by another iteration if
>> there's enough space in
>> the partition but that's irrelevant).
>>
>> Ideally I would like to put those functions and variables in the same
>> partition such that the use
>> of section anchors can be maximized, which would also involve adding
>> functions from other TU.
>> In above case it's desirable to add f3_ab to the same
>> partition as f1_ab, however I am not sure what constraints to check for since
>> we could potentially end up adding lots of symbols (as in approach 1).
>> For now, I only considered the check if they are from same TU.
>>
>> I have attached a prototype patch that implements the above approach.
>> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
>> to build chromium with LTO as without patch.
>> The patch should only affect targets with section anchor support (arm,
>> aarch64, ppc).
>> I have to yet perform benchmarking with the patch, I will get back
>> with some results
>> after I get that done.
>> I would be grateful for feedback especially regarding correctness.
>
> Ok, I didn't try to fully understand what you did but why not simply
> keep partitioning of functions as-is and simply put variables into
> the partition which contains the most references to it?
>
> Section anchor optimization should be a 2nd choice for partitioning
> and really partitioning functions together like we do should have priority.
Well balanced partitioning would reorder functions without considering section
anchors, so putting variables in partition containing max references may not
be optimal.
For eg:
file1.c defines a, b, and three functions f1, f2, f3 

Re: How to check if two symbols are from same source unit during WPA ?

2016-04-14 Thread Richard Biener
On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:

> On 14 April 2016 at 13:56, Richard Biener  wrote:
> > On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:
> >
> >> On 12 April 2016 at 22:41, Richard Biener  wrote:
> >> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
> >> >  wrote:
> >> >>Hi,
> >> >>How to check if two symbols are from same source file during WPA ?
> >> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
> >> >>and symbol2 are from same
> >> >>source files ? Would that be a sufficient condition or do I need to
> >> >>check for something more ?
> >> >
> >> > Why would you want to know?  The proposed check would verify it comes 
> >> > from the same TU.
> >> Hi,
> >> I am trying to address issue with section anchors with LTO.
> >> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
> >> Therefore we can take advantage of section anchors.
> >> IIUC with balanced partitioning, variable is added in the first partition 
> >> it is
> >> referenced in. So it may get separated into other partition from functions
> >> referencing it, thus causing it to be loaded as an external reference.
> >>
> >> My point is that the "first" partition to which the variable is added may 
> >> not
> >> be the best choice for it.
> >> For instance:
> >> file1.c defines variables 'a' and 'b' and functions f1(), f2()
> >> file2.c defines f3().
> >> f1, f2, f3 use 'a' and 'b'.
> >>
> >> It could happen that f3, a and b are mapped to one partition (assuming
> >> f3 is processed first), and f1, f2 are mapped to another partition.
> >> so f1 and f2 are forced to load a, b as external references.
> >> It would be better instead to put  a and b in same partition as f1 and f2.
> >>
> >> As first cut I tried to implement the following very simple approach:
> >> Assume V(f) gives a set of all variables referenced by function f.
> >> Let S = V(f1) intersect V(f2) ... intersect V(fn).
> >> Add f1, f2, .. fn and S to same partition if
> >> a) S is non-empty, that is, we add functions and "common" variables
> >> referenced by these functions to same partition.
> >> b) Function uses at least two global variables. If a function uses
> >> only a single global variable, then AFAIU it wouldn't matter if it's loaded
> >> as external reference.
> >> However the above approach results in quite distorted partitioning
> >> creating very large partitions and took 4x time to build chromium with LTO.
> >>
> >> So I tried to restrain adding number of symbols, by allowing them to be 
> >> added
> >> only if they are from same TU.
> >> This would (hopefully) ensure that we are not worse than non-LTO case
> >> since we add functions and common set of variables referenced by these
> >> functions to same partition provided all of them belong to same TU.
> >> As a worst-case, it will add all symbols from the TU to which function
> >> belongs, if all the functions in
> >> that TU reference >=2 global variables and reference at least one global
> >> variable in common from the same TU. We might want to put a higher upper
> >> limit than 1 for number of shared referenced global vars to stop adding
> >> too many symbols to same partition.
> >>
> >> For eg:
> >> file1.c  defines variables 'a', 'b' and functions f1_ab(), f2_ab()
> >> file2.c: defines f3_ab()
> >> where f1_ab, f2_ab, f3_ab both use a, b
> >> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
> >> to the same partition but
> >> not f3_ab() since it belongs to another TU.
> >> (f3_ab() may get added to the same partition by another iteration if
> >> there's enough space in
> >> the partition but that's irrelevant).
> >>
> >> Ideally I would like to put those functions and variables in the same
> >> partition such that the use
> >> of section anchors can be maximized, which would also involve adding
> >> functions from other TU.
> >> In above case it's desirable to add f3_ab to the same
> >> partition as f1_ab, however I am not sure what constraints to check for 
> >> since
> >> we could potentially end up adding lots of symbols (as in approach 1).
> >> For now, I only considered the check if they are from same TU.
> >>
> >> I have attached a prototype patch that implements the above approach.
> >> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
> >> to build chromium with LTO as without patch.
> >> The patch should only affect targets with section anchor support (arm,
> >> aarch64, ppc).
> >> I have to yet perform benchmarking with the patch, I will get back
> >> with some results
> >> after I get that done.
> >> I would be grateful for feedback especially regarding correctness.
> >
> > Ok, I didn't try to fully understand what you did but why not simply
> > keep partitioning of functions as-is and simply put variables into
> > the partition which contains the most references to it?
> >
> > Section anchor optimization should be a 2nd choice for partitioning
> > and really partitioning functions together lik

Re: How to check if two symbols are from same source unit during WPA ?

2016-04-14 Thread Ramana Radhakrishnan

> 
> What happens in practice?  GCC doesn't put functions in random
> partitions.
> 

The data goes into a separate partition AFAIU - it means that all data accesses 
are as though they are extern references which means there's not necessarily 
any CSE'ing ability that's available with section anchors.

>> If it's not desired by default could we gate it on an option ?
>> AFIAU, section anchors optimization is important for ARM and AArch64.
> 
> For code size or for performance?  I wonder why section anchors cannot
> be "implemented" using some special relocations and thus as linker
> optimization.

For performance (and probably code size too) as you can end up CSE'ing the 
anchor point. the difference in performance with -flto-partitions=1 is visible 
on quite a few of the spec2k benchmarks. I don't remember which ones 
immediately but yeah it makes a difference.

Ramana
> 
> Richard.
> 



Re: How to check if two symbols are from same source unit during WPA ?

2016-04-14 Thread Richard Biener
On Thu, 14 Apr 2016, Ramana Radhakrishnan wrote:

> 
> > 
> > What happens in practice?  GCC doesn't put functions in random
> > partitions.
> > 
> 
> The data goes into a separate partition AFAIU - it means that all data 
> accesses are as though they are extern references which means there's 
> not necessarily any CSE'ing ability that's available with section 
> anchors.

No, they are added to partitions referencing them.  Actually they
end up in the first partition that references them.

> >> If it's not desired by default could we gate it on an option ?
> >> AFIAU, section anchors optimization is important for ARM and AArch64.
> > 
> > For code size or for performance?  I wonder why section anchors cannot
> > be "implemented" using some special relocations and thus as linker
> > optimization.
> 
> For performance (and probably code size too) as you can end up CSE'ing 
> the anchor point. the difference in performance with -flto-partitions=1 
> is visible on quite a few of the spec2k benchmarks. I don't remember 
> which ones immediately but yeah it makes a difference.

Yeah, as said elsewhere for things like spec2k we should have been
less aggressive with the goal to parallelize and build larger
partitions for small programs.  Thus increase lto-min-partition
up to a point where all benchmarks end up in a single partition
(small benchmarks, so spec2k6 doesn't count at all).  After all
our inlining limits only start with large-unit-insns as well,
which is 10 times of lto-min-partition...

Richard.


Re: How to check if two symbols are from same source unit during WPA ?

2016-04-15 Thread Jan Hubicka
> On Thu, 14 Apr 2016, Ramana Radhakrishnan wrote:
> 
> > 
> > > 
> > > What happens in practice?  GCC doesn't put functions in random
> > > partitions.
> > > 
> > 
> > The data goes into a separate partition AFAIU - it means that all data 
> > accesses are as though they are extern references which means there's 
> > not necessarily any CSE'ing ability that's available with section 
> > anchors.
> 
> No, they are added to partitions referencing them.  Actually they
> end up in the first partition that references them.

Yeah, this can be easily tuned to distribute variables last and put them
to partitioning referncing them most often.

Balanced partitioning tries to preserve source order (unless FDO tells 
otehrwise)
and thus we could add anchros into the cost metrics it uses to split partitions.
> 
> > >> If it's not desired by default could we gate it on an option ?
> > >> AFIAU, section anchors optimization is important for ARM and AArch64.
> > > 
> > > For code size or for performance?  I wonder why section anchors cannot
> > > be "implemented" using some special relocations and thus as linker
> > > optimization.
> > 
> > For performance (and probably code size too) as you can end up CSE'ing 
> > the anchor point. the difference in performance with -flto-partitions=1 
> > is visible on quite a few of the spec2k benchmarks. I don't remember 
> > which ones immediately but yeah it makes a difference.
> 
> Yeah, as said elsewhere for things like spec2k we should have been
> less aggressive with the goal to parallelize and build larger
> partitions for small programs.  Thus increase lto-min-partition
> up to a point where all benchmarks end up in a single partition
> (small benchmarks, so spec2k6 doesn't count at all).  After all
> our inlining limits only start with large-unit-insns as well,
> which is 10 times of lto-min-partition...

Well, I don't think large-unit-insns is particularly related to
lto-min-partition. But lto-min-partition was not really tuned at all
(I remember I dumped the value for some smal spec2k benchmark and based
it on that).  We could do some experiments. But for larger programs we
want more stable solution. Read/write globals are not that common,
but readonly globals are common in modern and large codebases, too.

Posibility is to assign achors at WPA time disabling any chance for
variables to be optimized out later. 

Honza
> 
> Richard.