about the gsoc

2018-03-06 Thread Jagmeet Singh
Any one for help me

I want to ask question about the ideas

reply please


Re: GCC GSOC Participation

2018-03-06 Thread Martin Jambor
Hello Prateek,

On Sat, Mar 03 2018, Prateek Kalra wrote:
> Hello GCC Community,
> My name is Prateek Kalra.I am pursuing integrated dual
> degree(B.tech+M.tech) in Computer Science Software Engineering,from Gautam
> Buddha University,Greater Noida.I am currently in 8th semester of the
> programme.
> I have experience in competitive programming with C++.Here's my linkedin
> profile:
> https://www.linkedin.com/in/prateek-kalra-6a40bab3/.
> I am interested in GSOC project "Implement a fuzzer leveraging GCC
> extensions".
> I had opted compiler design as one of the course subjects in the previous
> semester and was able to secure an 'A' grade at the end of the semester.
> I have theoretical knowledge of fuzz testing and csmith,that how the random
> C programs are generated to check the compiler bugs and I am very keen to
> work under this project.
> I request you to guide me to progress through the process.I would really
> appreciate if you could mentor me with the further research of this project
> idea.

I would suggest that you start with reading through Andi's email to
another student who expressed interest in that project which you can
find at: https://gcc.gnu.org/ml/gcc/2018-02/msg00216.html

Andi, do you have any further suggestions what Prateek should check-out,
perhaps build, examine and experiment with in order to come up with a
nice proposal?  Do you personally prefer starting with any particular
existing fuzzer, for example?

Good luck,

Martin


Re: Further for GSoC.

2018-03-06 Thread Martin Jambor
Hello, Tejas,

On Fri, Mar 02 2018, Joseph Myers wrote:
> On Fri, 2 Mar 2018, Tejas Joshi wrote:
>
>> I have some university level experience of working and programming assembly
>> language under Intel 80386DX architecture. I think it may help for
>> implementing supports for other architectures. Just for start, you
>> mentioned roundeven function as a guide for start. Where can I find these
>> (e.g. real.c) .c files for detailed study of these functions so that I can
>> have broader scenario? I have GCC 7.2.0 installed and could not find it in
>> library nor in libc/.
>
> You need to check out the GCC source code from version control and find 
> the files and functions referenced in there (locating pieces of GCC code 
> using find, grep, etc. on the GCC source tree is something you'll need to 
> do a lot), and make sure you can build GCC, run the testsuite, save 
> results from a testsuite run, build and run the testsuite and compare the 
> results of the two runs (this is something that would need doing very many 
> times in the course of any project working on GCC).
>

You might have figured this out already but just in case something is
not clear:

  1. How to check out our sources using svn and git is described at
https://gcc.gnu.org/svn.html and https://gcc.gnu.org/wiki/GitMirror
respectively, and

  2. perhaps more importantly, how to configure, build and test GCC is
described in steps linked from https://gcc.gnu.org/install/ (look
for --disable-bootstrap, among other things).

If you have any specific question regarding any of these steps, feel
free to ask on the mailing list or the IRC.

Good luck,

Martin


Re: GCC GSOC Participation

2018-03-06 Thread Prathamesh Kulkarni
CCing Andi Kleen, mentor of this project.

Regards,
Prathamesh

On 3 March 2018 at 16:22, Prateek Kalra  wrote:
> Hello GCC Community,
> My name is Prateek Kalra.I am pursuing integrated dual
> degree(B.tech+M.tech) in Computer Science Software Engineering,from Gautam
> Buddha University,Greater Noida.I am currently in 8th semester of the
> programme.
> I have experience in competitive programming with C++.Here's my linkedin
> profile:
> https://www.linkedin.com/in/prateek-kalra-6a40bab3/.
> I am interested in GSOC project "Implement a fuzzer leveraging GCC
> extensions".
> I had opted compiler design as one of the course subjects in the previous
> semester and was able to secure an 'A' grade at the end of the semester.
> I have theoretical knowledge of fuzz testing and csmith,that how the random
> C programs are generated to check the compiler bugs and I am very keen to
> work under this project.
> I request you to guide me to progress through the process.I would really
> appreciate if you could mentor me with the further research of this project
> idea.
> Regards,
> Prateek


Re: Regarding Google summer of code.

2018-03-06 Thread Martin Jambor
Hello Chaitanya,

On Fri, Mar 02 2018, Sai Chaitanya wrote:
> Hello,
> I am Chaitanya.while checking the organisation for GSOC I am very
> confused,sir I have skills in C,C++, Java and little bit of python,till now
> I didn't take part in any big projects.
> Please guide me which organisation and which project suitable for me.
>

you have reached out to developers of GNU Compiler Collection (GCC).  I
am afraid that we are unable to help you with picking the most suitable
GSoC mentor organization, for many reasons.  If you are thinking of
applying to do a GSoC project with us, look at our dedicated wiki page
https://gcc.gnu.org/wiki/SummerOfCode

If you are still interested after reading through it, I suggest that you
check out our sources (https://gcc.gnu.org/svn.html,
https://gcc.gnu.org/wiki/GitMirror), build the compiler (look at steps
referenced from https://gcc.gnu.org/install/), look around the code a
bit and then come back to us with specific questions (and ideally at
least an idea for the project).

Good luck,

Martin


Re: Getting into C++ Downloading gcc.

2018-03-06 Thread Jonathan Wakely
On 4 March 2018 at 02:40, Ray McAllister  wrote:
> Hi, I'm totally blind. I do most of my programming in BASIC, but I use C++
> now and then, actually, for drawing fractals.  I code graphics.  I've been
> using Dev-C++ because it's the only thing I can find compatible with my
> screen reader.  I don't like how I can't set up a char array bigger than
> 1400 by 1400  as I might want to make a fractal bigger than that.

You should be able to create arrays bigger than that, limited only by
the memory available on your computer.

You might not be able to create such large arrays on the stack, or as
a global variable, but you could create it on the heap:

char* array = new char[1*1];

Or a better way to do that might be:

#include 
#include 
using array_type = std::array;
auto array = std::make_unique();


> I have
> the computer fill an array with the fractal data for colors, and then it
> writes a bitmap file with the data and I can access that through BASIC or
> just show it to a friend.  All Dev-C lets me do for array size is 1400 by
> 1400 in an array, and that's using chars.  I'd use Booleans, but I need to
> include color data for each pixel.  I wonder if GCC would be better with
> that.

Dev-C++ is not a compiler, it's just an IDE, and it uses the Mingw
port of GCC for the compiler. That means you're already using GCC.

> I also need to knowk, please, how and where to download GCC, the
> latest version.  I'm not finding info on that.

As explained at https://gcc.gnu.org/install/binaries.html the GCC
project does not provide binaries to download, we only provide source
code. There are third-party binaries available, and the mingw and
mingw-w64 ports of GCC for MS Windows are available from their
respective projects. The https://gcc.gnu.org/install/binaries.html
page has links to them, and there are other builds of them available
like http://tdm-gcc.tdragon.net/

> In addition, when I run the
> dev-c++ programs from bASIC, a window comes up on the screen saying so. Is
> there a way, in GCC, to prevent that?

I have no idea, that sounds like a Windows feature, not something
caused by the compiler. Maybe somebody else can help there.


Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.

2018-03-06 Thread Renlin Li

Hi Richard,

On 06/03/18 16:04, Richard Biener wrote:

On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li  wrote:

Hi all,

The problem described here probably only affects targets whose ABI allow to
pass structured
arguments of certain size via registers.

If the mode of the parameter type is BLKmode, in the callee, during RTL
expanding,
a stack slot will be reserved for this parameter, and the incoming value
will be copied into
the stack slot.

However, the stack slot for the parameter will not be aligned if the
alignment of parameter type
exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
Chances are, unaligned memory access might cause run-time errors.

For local variable on the stack, the alignment of the data type is honored,
although the document states that it is not guaranteed.

For example:

#include 
union U {
 uint32_t M0;
 uint32_t M1;
 uint32_t M2;
 uint32_t M3;
} __attribute((aligned(16)));

void tmp (union U *);
void foo (union U P0)
{
   union U P1 = P0;
   tmp (&P1);
}

The code-gen from armv7-a is like this:

foo:
 @ args = 0, pretend = 0, frame = 48
 @ frame_needed = 0, uses_anonymous_args = 0
 strlr, [sp, #-4]!
 subsp, sp, #52
 movip, sp
 stmip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
 addlr, sp, #39
 biclr, lr, #15
 ldmip, {r0, r1, r2, r3}
 stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned
 movr0, lr
 bltmp
 addsp, sp, #52
 @ sp needed
 ldrpc, [sp], #4

There are other obvious missed optimizations in the code-generation above.
The stack slot for parameter P0 and local variable P1 could be merged.
So that some of the load/store instructions could be removed.
I think this is a known missed optimization case.

To summaries, there are two issues here:
1, (wrong code) unaligned stack slot allocated for parameters during
function expansion.
2, (missed optimization) stack slot for parameter sometimes is not
necessary.
In certain scenario, the argument register could directly be used.
Currently, this is only possible when the parameter mode is not BLKmode.

For issue 1, we can do similar things as expand_used_vars.
Dynamically align the stack slot address for parameters whose alignment
exceeds
PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
aligned address and fp when possible.

For issue 2, I checked the behavior of LLVM, it seems the stack slot
allocation
for parameters are explicitly exposed by the alloca IR instruction at the
very beginning.
Later, there are optimization/transformation passes like mem2reg, reg2mem,
sroa etc. to remove
unnecessary alloca instructions.

In gcc, the stack allocation for parameters and local variables are done
during expand pass, implicitly.
And RTL passes are not able to remove the unnecessary stack allocation and
load/store operations.

For example:

uint32_t bar(union U P0)
{
   return P0.M0;
}

Currently, the code-gen is different on different targets.
There are various backend hooks which make the code-gen sub-optimal.
For example, aarch64 target could directly return with w0 while armv7-a
target generates unnecessary
store and load.

However, this optimization should be target independent, unrelated target
alignment configuration.
Both issue 1&2 could be resolved if gcc has a similar approach. But I assume
the change is big.

Is there any suggestions for solving issue 1 and improving issue 2 in a
generic way?
I can create a bugzilla ticket to record the issue.


What does the ABI say for passing such over-aligned data types?

For solving 1) you could copy the argument as passed by the ABI
to a properly aligned stack location in the callee.

Generally it sounds like either the ABI doesn't specify anything
or the ABI specifies something that violates user expectation.

For 2) again, it is the ABI which specifies whether an argument
is passed via the stack or via registers.  So - what does the ABI say?



The compiler is doing the right thing here to pass argument via registers.
To be specific, there are such clause in the arm PCS:


B.5 If the argument is an alignment adjusted type its value is passed as a copy 
of the actual value. The
copy will have an alignment defined as follows.
...
For a Composite Type, the alignment of the copy will have 4-byte alignment if 
its natural alignment is
<= 4 and 8-byte alignment if its natural alignment is >= 8



C.3 If the argument requires double-word alignment (8-byte), the NCRN is 
rounded up to the next even
register number.
C.4 If the size in words of the argument is not more than r4 minus NCRN, the 
argument is copied into
core registers, starting at the NCRN. The NCRN is incremented by the number of 
registers used.
Successive registers hold the parts of the argument they would hold if its 
value were loaded into
those registers from memory using an LDM instruction. The argument has now been 
allocated.



This is quite similar for other RISC machines.
Here, the p

Re: How big (and fast) is going to be GCC 8? [part 2]

2018-03-06 Thread Martin Liška

On 03/06/2018 07:16 PM, Bin.Cheng wrote:

On Tue, Mar 6, 2018 at 5:50 PM, Martin Liška  wrote:

Hi.

This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0
which is built with PGO bootstrap.

I run empty C and C++ source file, tramp3d and the rest are some big beasts
from GCC source file. Feel free to suggest another test candidates? Note
that first column defines how many times was test run.

First thanks very much for collecting the data.
Since we enabled several loop passes at O3 and above levels, some data
for Ofast might be interesting?


Do you have a nice source file full of loop nests that would test that
properly?

Martin



Thanks,
bin


Martin


Re: How big (and fast) is going to be GCC 8?

2018-03-06 Thread Martin Liška

On 03/06/2018 05:18 PM, Martin Liška wrote:

Yes, in bytes. Would be nicer to have it in MB ;) It would be easily
readable. I'll fix that.


Hi.

I'm sending updated binary size statistics for both cc1 and cc1plus
in MB.

Martin


gcc-8-build-stats-v2.pdf.bz2
Description: application/bzip


gcc-8-build-stats-v2.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: How big (and fast) is going to be GCC 8? [part 2]

2018-03-06 Thread Bin.Cheng
On Tue, Mar 6, 2018 at 5:50 PM, Martin Liška  wrote:
> Hi.
>
> This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0
> which is built with PGO bootstrap.
>
> I run empty C and C++ source file, tramp3d and the rest are some big beasts
> from GCC source file. Feel free to suggest another test candidates? Note
> that first column defines how many times was test run.
First thanks very much for collecting the data.
Since we enabled several loop passes at O3 and above levels, some data
for Ofast might be interesting?

Thanks,
bin
>
> Martin


Re: How big (and fast) is going to be GCC 8? [part 2]

2018-03-06 Thread Martin Liška

Hi.

This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0
which is built with PGO bootstrap.

I run empty C and C++ source file, tramp3d and the rest are some big beasts
from GCC source file. Feel free to suggest another test candidates? Note
that first column defines how many times was test run.

Martin


gcc-8-perf-stats.pdf.bz2
Description: application/bzip


gcc-8-perf-stats.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: eliminate dead stores across functions

2018-03-06 Thread Bin.Cheng
On Tue, Mar 6, 2018 at 4:50 PM, Bin.Cheng  wrote:
> On Tue, Mar 6, 2018 at 4:44 PM, Martin Jambor  wrote:
>> Hi Bin,
>>
>> On Tue, Mar 06 2018, Bin Cheng wrote:
>>> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener

 Do you think the situation happens often enough to make this worthwhile?
>>> There is one probably more useful case.  Program may use global flags
>>> controlling
>>> how it does (heavy) computation.  Such flags are only set couple of
>>> times in execution
>>> time.  It would be useful if we can (IPA) propagate flags into computation 
>>> heavy
>>> functions by versioning (if necessary).  For example:
>>>
>>> int flag = 1;
>>> void foo ()
>>> {
>>>   //heavy computation wrto to flag
>>> }
>>> void main()
>>> {
>>>   flag = 2;
>>>   foo();
>>>   flag = 1;
>>>   foo();
>>> }
>>>
>>
>> So basically IPA-CP done on (not-addressable) static global variables.
>> Do you happen to know some real code which would benefit?  I'd like to
>> experiment with it but would like to have a real thing to look at, as
>> opposed to artificial test cases.
> As Richi pointed out, I think this is not rare in spec.  For this
> moment I only vaguely remember 544.nab_r for such issue, but I am sure
> there are other cases.
Sorry I forgot to mention it might not be static variables in file
scope, that's why I mentioned LTO previously.

Thanks,
bin
>
> Thanks,
> bin
>>
>> Thanks,
>>
>> Martin
>>
>>


Re: eliminate dead stores across functions

2018-03-06 Thread Bin.Cheng
On Tue, Mar 6, 2018 at 4:44 PM, Martin Jambor  wrote:
> Hi Bin,
>
> On Tue, Mar 06 2018, Bin Cheng wrote:
>> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener
>>>
>>> Do you think the situation happens often enough to make this worthwhile?
>> There is one probably more useful case.  Program may use global flags
>> controlling
>> how it does (heavy) computation.  Such flags are only set couple of
>> times in execution
>> time.  It would be useful if we can (IPA) propagate flags into computation 
>> heavy
>> functions by versioning (if necessary).  For example:
>>
>> int flag = 1;
>> void foo ()
>> {
>>   //heavy computation wrto to flag
>> }
>> void main()
>> {
>>   flag = 2;
>>   foo();
>>   flag = 1;
>>   foo();
>> }
>>
>
> So basically IPA-CP done on (not-addressable) static global variables.
> Do you happen to know some real code which would benefit?  I'd like to
> experiment with it but would like to have a real thing to look at, as
> opposed to artificial test cases.
As Richi pointed out, I think this is not rare in spec.  For this
moment I only vaguely remember 544.nab_r for such issue, but I am sure
there are other cases.

Thanks,
bin
>
> Thanks,
>
> Martin
>
>


Re: eliminate dead stores across functions

2018-03-06 Thread William Cohen
On 03/06/2018 09:28 AM, Richard Biener wrote:
> On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> For the following test-case,
>>
>> int a;
>>
>> __attribute__((noinline))
>> static void foo()
>> {
>>   a = 3;
>> }
>>
>> int main()
>> {
>>   a = 4;
>>   foo ();
>>   return a;
>> }
>>
>> I assume it's safe to remove "a = 4"  since 'a' would be overwritten
>> by call to foo ?
>> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects
>> of function call,
>> so could we perhaps use ipa_reference_get_not_written_global() in dse
>> pass to check if a global variable will be killed on call to a
>> function ? If not, I suppose we could write a similar ipa pass that
>> computes the set of killed global variables per function but I am not
>> sure if that's the correct approach.
> 
> Do you think the situation happens often enough to make this worthwhile?
> 
> ipa-reference doesn't compute must-def, only may-def and may-use IIRC.
> 
> Richard.
> 
>> Thanks,
>> Prathamesh

This dead write optimization sounds similar to "DeadSpy: a tool to pinpoint 
program inefficiencies" by Milind Chabbi and John Mellor-Crummey of Rice 
University:

https://dl.acm.org/citation.cfm?id=2259033

The abstract says there were numerous dead writes in the SPEC 2006 gcc 
benchmark and eliminating those provided average 15% improvement in performance.

-Will


Re: eliminate dead stores across functions

2018-03-06 Thread Martin Jambor
Hi Bin,

On Tue, Mar 06 2018, Bin Cheng wrote:
> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener
>>
>> Do you think the situation happens often enough to make this worthwhile?
> There is one probably more useful case.  Program may use global flags
> controlling
> how it does (heavy) computation.  Such flags are only set couple of
> times in execution
> time.  It would be useful if we can (IPA) propagate flags into computation 
> heavy
> functions by versioning (if necessary).  For example:
>
> int flag = 1;
> void foo ()
> {
>   //heavy computation wrto to flag
> }
> void main()
> {
>   flag = 2;
>   foo();
>   flag = 1;
>   foo();
> }
>

So basically IPA-CP done on (not-addressable) static global variables.
Do you happen to know some real code which would benefit?  I'd like to
experiment with it but would like to have a real thing to look at, as
opposed to artificial test cases.

Thanks,

Martin




Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-06 Thread Peter Bergner
On 3/5/18 9:33 AM, Segher Boessenkool wrote:
> On Mon, Mar 05, 2018 at 08:01:14AM +0100, Eric Botcazou wrote:
>> Apparently the authors of the SPARC psABI thought that the last part of your 
>> sentence is an interpolation and that the (historical) requirements were 
>> vague 
>> enough to allow their interpretation, IOW that the compiler can do the work.
> 
> Maybe we should have a target hook that says setjmp/longjmp are
> implemented by simple function calls (or as-if by function calls), so
> as not to penalize everyone who has an, erm, more conservative ABI?

Unless someone really wants to work on this, I'll have a look at
adding this once stage1 opens up.

Peter



Re: How big (and fast) is going to be GCC 8?

2018-03-06 Thread Martin Liška

On 03/06/2018 04:13 PM, David Malcolm wrote:

On Tue, 2018-03-06 at 11:14 +0100, Martin Liška wrote:

Hello.

Many significant changes has landed in mainline and will be released
as GCC 8.1.
I decided to use various GCC configs we have and test how there
configuration differ
in size and also binary size.

This is first part where I measured binary size, speed comparison
will follow.
Configuration names should be self-explaining, the 'system-*' is
built done
without bootstrap with my system compiler (GCC 7.3.0). All builds are
done
on my Intel Haswell machine.

Feel free to reply if you need any explanation.
Martin


Some possibly silly questions:


Hi David.

All of them are qualified!



(a) was this done with:
   --enable-checking=release ?


Yes.



(b) is this measuring cc1 ?


cc1plus. Let me also add cc1 when I'll have run-time numbers.



(c) are the units bytes?  (so ~183MB for the unstripped system-O2-
native cc1, ~25MB after stripping?)


Yes, in bytes. Would be nicer to have it in MB ;) It would be easily
readable. I'll fix that.



(d) do you have comparable data for gcc 7?


Will build corresponding builds for GCC 7 tonight.

Martin



Thanks
Dave



Re: eliminate dead stores across functions

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 4:50 PM, Bin.Cheng  wrote:
> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener
>  wrote:
>> On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni
>>  wrote:
>>> Hi,
>>> For the following test-case,
>>>
>>> int a;
>>>
>>> __attribute__((noinline))
>>> static void foo()
>>> {
>>>   a = 3;
>>> }
>>>
>>> int main()
>>> {
>>>   a = 4;
>>>   foo ();
>>>   return a;
>>> }
>>>
>>> I assume it's safe to remove "a = 4"  since 'a' would be overwritten
>>> by call to foo ?
>>> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects
>>> of function call,
>>> so could we perhaps use ipa_reference_get_not_written_global() in dse
>>> pass to check if a global variable will be killed on call to a
>>> function ? If not, I suppose we could write a similar ipa pass that
>>> computes the set of killed global variables per function but I am not
>>> sure if that's the correct approach.
>>
>> Do you think the situation happens often enough to make this worthwhile?
> There is one probably more useful case.  Program may use global flags
> controlling
> how it does (heavy) computation.  Such flags are only set couple of
> times in execution
> time.  It would be useful if we can (IPA) propagate flags into computation 
> heavy
> functions by versioning (if necessary).  For example:
>
> int flag = 1;
> void foo ()
> {
>   //heavy computation wrto to flag
> }
> void main()
> {
>   flag = 2;
>   foo();
>   flag = 1;
>   foo();
> }

Yeah, libquantum does this.  There's also related example
from some SPEC fortran testcase:

vodi foo()
{
  static int initialized;
  static T data;
  if (!initialized)
{
   data.x = 1;
   initialized = 1;
}
...
}

where we want to constant propagate from data.x.  IIRC I tried
to work on this, not sure if I solved it yet...

Richard.

> Of course this may only be useful for LTO.

> Thanks,
> bin
>>
>> ipa-reference doesn't compute must-def, only may-def and may-use IIRC.
>>
>> Richard.
>>
>>> Thanks,
>>> Prathamesh


Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li  wrote:
> Hi all,
>
> The problem described here probably only affects targets whose ABI allow to
> pass structured
> arguments of certain size via registers.
>
> If the mode of the parameter type is BLKmode, in the callee, during RTL
> expanding,
> a stack slot will be reserved for this parameter, and the incoming value
> will be copied into
> the stack slot.
>
> However, the stack slot for the parameter will not be aligned if the
> alignment of parameter type
> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
> Chances are, unaligned memory access might cause run-time errors.
>
> For local variable on the stack, the alignment of the data type is honored,
> although the document states that it is not guaranteed.
>
> For example:
>
> #include 
> union U {
> uint32_t M0;
> uint32_t M1;
> uint32_t M2;
> uint32_t M3;
> } __attribute((aligned(16)));
>
> void tmp (union U *);
> void foo (union U P0)
> {
>   union U P1 = P0;
>   tmp (&P1);
> }
>
> The code-gen from armv7-a is like this:
>
> foo:
> @ args = 0, pretend = 0, frame = 48
> @ frame_needed = 0, uses_anonymous_args = 0
> strlr, [sp, #-4]!
> subsp, sp, #52
> movip, sp
> stmip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
> addlr, sp, #39
> biclr, lr, #15
> ldmip, {r0, r1, r2, r3}
> stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned
> movr0, lr
> bltmp
> addsp, sp, #52
> @ sp needed
> ldrpc, [sp], #4
>
> There are other obvious missed optimizations in the code-generation above.
> The stack slot for parameter P0 and local variable P1 could be merged.
> So that some of the load/store instructions could be removed.
> I think this is a known missed optimization case.
>
> To summaries, there are two issues here:
> 1, (wrong code) unaligned stack slot allocated for parameters during
> function expansion.
> 2, (missed optimization) stack slot for parameter sometimes is not
> necessary.
>In certain scenario, the argument register could directly be used.
>Currently, this is only possible when the parameter mode is not BLKmode.
>
> For issue 1, we can do similar things as expand_used_vars.
> Dynamically align the stack slot address for parameters whose alignment
> exceeds
> PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
> aligned address and fp when possible.
>
> For issue 2, I checked the behavior of LLVM, it seems the stack slot
> allocation
> for parameters are explicitly exposed by the alloca IR instruction at the
> very beginning.
> Later, there are optimization/transformation passes like mem2reg, reg2mem,
> sroa etc. to remove
> unnecessary alloca instructions.
>
> In gcc, the stack allocation for parameters and local variables are done
> during expand pass, implicitly.
> And RTL passes are not able to remove the unnecessary stack allocation and
> load/store operations.
>
> For example:
>
> uint32_t bar(union U P0)
> {
>   return P0.M0;
> }
>
> Currently, the code-gen is different on different targets.
> There are various backend hooks which make the code-gen sub-optimal.
> For example, aarch64 target could directly return with w0 while armv7-a
> target generates unnecessary
> store and load.
>
> However, this optimization should be target independent, unrelated target
> alignment configuration.
> Both issue 1&2 could be resolved if gcc has a similar approach. But I assume
> the change is big.
>
> Is there any suggestions for solving issue 1 and improving issue 2 in a
> generic way?
> I can create a bugzilla ticket to record the issue.

What does the ABI say for passing such over-aligned data types?

For solving 1) you could copy the argument as passed by the ABI
to a properly aligned stack location in the callee.

Generally it sounds like either the ABI doesn't specify anything
or the ABI specifies something that violates user expectation.

For 2) again, it is the ABI which specifies whether an argument
is passed via the stack or via registers.  So - what does the ABI say?

Richard.

> Regards,
> Renlin


Re: eliminate dead stores across functions

2018-03-06 Thread Bin.Cheng
On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener
 wrote:
> On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> For the following test-case,
>>
>> int a;
>>
>> __attribute__((noinline))
>> static void foo()
>> {
>>   a = 3;
>> }
>>
>> int main()
>> {
>>   a = 4;
>>   foo ();
>>   return a;
>> }
>>
>> I assume it's safe to remove "a = 4"  since 'a' would be overwritten
>> by call to foo ?
>> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects
>> of function call,
>> so could we perhaps use ipa_reference_get_not_written_global() in dse
>> pass to check if a global variable will be killed on call to a
>> function ? If not, I suppose we could write a similar ipa pass that
>> computes the set of killed global variables per function but I am not
>> sure if that's the correct approach.
>
> Do you think the situation happens often enough to make this worthwhile?
There is one probably more useful case.  Program may use global flags
controlling
how it does (heavy) computation.  Such flags are only set couple of
times in execution
time.  It would be useful if we can (IPA) propagate flags into computation heavy
functions by versioning (if necessary).  For example:

int flag = 1;
void foo ()
{
  //heavy computation wrto to flag
}
void main()
{
  flag = 2;
  foo();
  flag = 1;
  foo();
}

Of course this may only be useful for LTO.

Thanks,
bin
>
> ipa-reference doesn't compute must-def, only may-def and may-use IIRC.
>
> Richard.
>
>> Thanks,
>> Prathamesh


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
> >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
> >>  wrote:
> >> > Hi,
> >> >
> >> > Thank you Richard and Honza for the suggestions. If I understand 
> >> > correctly,
> >> > the issue is that LTO file format keeps changing per compiler versions, 
> >> > so
> >> > we need a more “stable” representation and the first step for that would 
> >> > be
> >> > to “stabilize” representations for lto-cgraph and symbol table ?
> >>
> >> Yes.  Note the issue is that the current format is a 1:1 representation of
> >> the internal representation -- which means it is the internal 
> >> representation
> >> that changes frequently across releases.  I'm not sure how Honza wants
> >> to deal with those changes in the context of a "stable" IL format.  Given
> >> we haven't been able to provide a stable API to plugins I think it's much
> >> harder to provide a stable streaming format for all the IL details
> >>
> >> > Could you
> >> > please elaborate on what initial steps need to be taken in this regard, 
> >> > and
> >> > if it’s feasible within GSoC timeframe ?
> >>
> >> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> >> feasible
> >> at all ...)
> >
> > I skipped this, with GSoC timeframe I fully agree.  With feasibility at all 
> > not so
> > much - LLVM documents its bitcode to reasonable extend
> > https://llvm.org/docs/BitCodeFormat.html
> >
> > Reason why i mentioned it is that I would like to use this as an excuse to 
> > get
> > things incrementally cleaned up and it would be nice to keep it in mind 
> > while
> > working on this.
> 
> Ok.  It's probably close enough to what I recommended doing with respect
> to make the LTO bytecode "self-descriptive" -- thus start with making the
> structure documented and parseable without assigning semantics to
> every bit ;)  I think that can be achieved top-down in a very incremental
> way if you get the bottom implemented first (the data-streamer part).

OK :)
I did not mean to document every bit either, at least not for the fancy parts.
It would be nice to have clenned up i.e. the section headers/footers so they
do not depend on endianity and slowly cleanup similar nonsences at higher
levels.  So it may make sense to progress from both directions lower hanging
fruits first.

Honza


BLKmode parameters are stored in unaligned stack slot when passed via registers.

2018-03-06 Thread Renlin Li

Hi all,

The problem described here probably only affects targets whose ABI allow to 
pass structured
arguments of certain size via registers.

If the mode of the parameter type is BLKmode, in the callee, during RTL 
expanding,
a stack slot will be reserved for this parameter, and the incoming value will 
be copied into
the stack slot.

However, the stack slot for the parameter will not be aligned if the alignment 
of parameter type
exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
Chances are, unaligned memory access might cause run-time errors.

For local variable on the stack, the alignment of the data type is honored,
although the document states that it is not guaranteed.

For example:

#include 
union U {
uint32_t M0;
uint32_t M1;
uint32_t M2;
uint32_t M3;
} __attribute((aligned(16)));

void tmp (union U *);
void foo (union U P0)
{
  union U P1 = P0;
  tmp (&P1);
}

The code-gen from armv7-a is like this:

foo:
@ args = 0, pretend = 0, frame = 48
@ frame_needed = 0, uses_anonymous_args = 0
strlr, [sp, #-4]!
subsp, sp, #52
movip, sp
stmip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
addlr, sp, #39
biclr, lr, #15
ldmip, {r0, r1, r2, r3}
stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned
movr0, lr
bltmp
addsp, sp, #52
@ sp needed
ldrpc, [sp], #4

There are other obvious missed optimizations in the code-generation above.
The stack slot for parameter P0 and local variable P1 could be merged.
So that some of the load/store instructions could be removed.
I think this is a known missed optimization case.

To summaries, there are two issues here:
1, (wrong code) unaligned stack slot allocated for parameters during function 
expansion.
2, (missed optimization) stack slot for parameter sometimes is not necessary.
   In certain scenario, the argument register could directly be used.
   Currently, this is only possible when the parameter mode is not BLKmode.

For issue 1, we can do similar things as expand_used_vars.
Dynamically align the stack slot address for parameters whose alignment exceeds
PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
aligned address and fp when possible.

For issue 2, I checked the behavior of LLVM, it seems the stack slot allocation
for parameters are explicitly exposed by the alloca IR instruction at the very 
beginning.
Later, there are optimization/transformation passes like mem2reg, reg2mem, sroa 
etc. to remove
unnecessary alloca instructions.

In gcc, the stack allocation for parameters and local variables are done during 
expand pass, implicitly.
And RTL passes are not able to remove the unnecessary stack allocation and 
load/store operations.

For example:

uint32_t bar(union U P0)
{
  return P0.M0;
}

Currently, the code-gen is different on different targets.
There are various backend hooks which make the code-gen sub-optimal.
For example, aarch64 target could directly return with w0 while armv7-a target 
generates unnecessary
store and load.

However, this optimization should be target independent, unrelated target 
alignment configuration.
Both issue 1&2 could be resolved if gcc has a similar approach. But I assume 
the change is big.

Is there any suggestions for solving issue 1 and improving issue 2 in a generic 
way?
I can create a bugzilla ticket to record the issue.

Regards,
Renlin


Re: How big (and fast) is going to be GCC 8?

2018-03-06 Thread David Malcolm
On Tue, 2018-03-06 at 11:14 +0100, Martin Liška wrote:
> Hello.
> 
> Many significant changes has landed in mainline and will be released
> as GCC 8.1.
> I decided to use various GCC configs we have and test how there
> configuration differ
> in size and also binary size.
> 
> This is first part where I measured binary size, speed comparison
> will follow.
> Configuration names should be self-explaining, the 'system-*' is
> built done
> without bootstrap with my system compiler (GCC 7.3.0). All builds are
> done
> on my Intel Haswell machine.
> 
> Feel free to reply if you need any explanation.
> Martin

Some possibly silly questions:

(a) was this done with:
  --enable-checking=release ?

(b) is this measuring cc1 ?

(c) are the units bytes?  (so ~183MB for the unstripped system-O2-
native cc1, ~25MB after stripping?)

(d) do you have comparable data for gcc 7?

Thanks
Dave


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
>> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>>  wrote:
>> > Hi,
>> >
>> > Thank you Richard and Honza for the suggestions. If I understand correctly,
>> > the issue is that LTO file format keeps changing per compiler versions, so
>> > we need a more “stable” representation and the first step for that would be
>> > to “stabilize” representations for lto-cgraph and symbol table ?
>>
>> Yes.  Note the issue is that the current format is a 1:1 representation of
>> the internal representation -- which means it is the internal representation
>> that changes frequently across releases.  I'm not sure how Honza wants
>> to deal with those changes in the context of a "stable" IL format.  Given
>> we haven't been able to provide a stable API to plugins I think it's much
>> harder to provide a stable streaming format for all the IL details
>>
>> > Could you
>> > please elaborate on what initial steps need to be taken in this regard, and
>> > if it’s feasible within GSoC timeframe ?
>>
>> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
>> feasible
>> at all ...)
>
> I skipped this, with GSoC timeframe I fully agree.  With feasibility at all 
> not so
> much - LLVM documents its bitcode to reasonable extend
> https://llvm.org/docs/BitCodeFormat.html
>
> Reason why i mentioned it is that I would like to use this as an excuse to get
> things incrementally cleaned up and it would be nice to keep it in mind while
> working on this.

Ok.  It's probably close enough to what I recommended doing with respect
to make the LTO bytecode "self-descriptive" -- thus start with making the
structure documented and parseable without assigning semantics to
every bit ;)  I think that can be achieved top-down in a very incremental
way if you get the bottom implemented first (the data-streamer part).

Richard.

> Honza
>>
>> > Thanks!
>> >
>> >
>> > I am trying to break down the project into milestones for the proposal. So
>> > far, I have identified the following objectives:
>> >
>> > 1] Creating a separate driver, that can read LTO object files. Following
>> > Richard’s estimate, I’d leave around first half of the period for this 
>> > task.
>> >
>> > Would that be OK ?
>>
>> Yes.
>>
>> > Coming to 2nd half:
>> >
>> > 2] Dumping pass summaries.
>> >
>> > 3] Stabilizing lto-cgraph and symbol table.
>>
>> So I'd instead do
>>
>>  3] Enhance the user-interface of the driver
>>
>> like providing a way to list all function bodies, a way to dump
>> the IL of a single function body, a way to create a dot graph file
>> for the cgraph in the file, etc.
>>
>> Basically while there's a lot of dumping infrastructure in GCC
>> it may not always fit the needs of a LTO IL dumping tool 1:1
>> and may need refactoring enhancement.
>>
>> Richard.
>>
>> >
>> > Thanks,
>> >
>> > Hrishikesh
>> >
>> >
>> >
>> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
>> >>
>> >> Hello,
>> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
>> >> >  wrote:
>> >> > > Hello everyone,
>> >> > >
>> >> > >
>> >> > > Thanks for your suggestions and engaging response.
>> >> > >
>> >> > > Based on the feedback I think that the scope of this project comprises
>> >> > > of
>> >> > > following three indicative actions:
>> >> > >
>> >> > >
>> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
>> >> > > object API
>> >> > > for reading the lto file.
>> >> >
>> >> > Yes.  I expect this will take the whole first half of the project,
>> >> > after this you
>> >> > should be somewhat familiar with the infrastructure as well.  With the
>> >> > existing dumping infrastructure it should be possible to dump the
>> >> > callgraph and individual function bodies.
>> >> >
>> >> > >
>> >> > > 2. Extending LTO dump infrastructure:
>> >> > >
>> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
>> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
>> >> > > that for
>> >> > > dumping pass summaries ? For instance, should we add a new hook say
>> >> > > “dump”
>> >> > > to ipa_opt_pass_d that’d dump the pass
>> >> > > summary ?
>> >> >
>> >> > That sounds like a good idea indeed.  I'm not sure if this is the most
>> >> > interesting
>> >> > missing part - I guess we'll find out once a dump tool is available.
>> >>
>> >> Concering the LTO file format my longer term aim is to make the symbol
>> >> table sections (symtab used by lto-plugin as well as the callgraph
>> >> section)
>> >> and hopefully also the Gimple streams) documented and well behaving
>> >> without changing the format in every revision.
>> >>
>> >> On the other hand the summaries used by individual passes are intended to
>> >> be
>> >> pass specific and envolving as individula passes become stronger/new
>> >> passes
>> >> are added.
>> >>
>> >> It is quite a lot of work to stabilize gimple representation to this
>> >> extend,
>> >> For callgraph&symbol table this 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Thank you Richard and Honza for the suggestions. If I understand correctly,
> > the issue is that LTO file format keeps changing per compiler versions, so
> > we need a more “stable” representation and the first step for that would be
> > to “stabilize” representations for lto-cgraph and symbol table ?
> 
> Yes.  Note the issue is that the current format is a 1:1 representation of
> the internal representation -- which means it is the internal representation
> that changes frequently across releases.  I'm not sure how Honza wants
> to deal with those changes in the context of a "stable" IL format.  Given
> we haven't been able to provide a stable API to plugins I think it's much
> harder to provide a stable streaming format for all the IL details
> 
> > Could you
> > please elaborate on what initial steps need to be taken in this regard, and
> > if it’s feasible within GSoC timeframe ?
> 
> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> feasible
> at all ...)

I skipped this, with GSoC timeframe I fully agree.  With feasibility at all not 
so
much - LLVM documents its bitcode to reasonable extend
https://llvm.org/docs/BitCodeFormat.html

Reason why i mentioned it is that I would like to use this as an excuse to get
things incrementally cleaned up and it would be nice to keep it in mind while
working on this.

Honza
> 
> > Thanks!
> >
> >
> > I am trying to break down the project into milestones for the proposal. So
> > far, I have identified the following objectives:
> >
> > 1] Creating a separate driver, that can read LTO object files. Following
> > Richard’s estimate, I’d leave around first half of the period for this task.
> >
> > Would that be OK ?
> 
> Yes.
> 
> > Coming to 2nd half:
> >
> > 2] Dumping pass summaries.
> >
> > 3] Stabilizing lto-cgraph and symbol table.
> 
> So I'd instead do
> 
>  3] Enhance the user-interface of the driver
> 
> like providing a way to list all function bodies, a way to dump
> the IL of a single function body, a way to create a dot graph file
> for the cgraph in the file, etc.
> 
> Basically while there's a lot of dumping infrastructure in GCC
> it may not always fit the needs of a LTO IL dumping tool 1:1
> and may need refactoring enhancement.
> 
> Richard.
> 
> >
> > Thanks,
> >
> > Hrishikesh
> >
> >
> >
> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
> >>
> >> Hello,
> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >> >  wrote:
> >> > > Hello everyone,
> >> > >
> >> > >
> >> > > Thanks for your suggestions and engaging response.
> >> > >
> >> > > Based on the feedback I think that the scope of this project comprises
> >> > > of
> >> > > following three indicative actions:
> >> > >
> >> > >
> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> >> > > object API
> >> > > for reading the lto file.
> >> >
> >> > Yes.  I expect this will take the whole first half of the project,
> >> > after this you
> >> > should be somewhat familiar with the infrastructure as well.  With the
> >> > existing dumping infrastructure it should be possible to dump the
> >> > callgraph and individual function bodies.
> >> >
> >> > >
> >> > > 2. Extending LTO dump infrastructure:
> >> > >
> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
> >> > > that for
> >> > > dumping pass summaries ? For instance, should we add a new hook say
> >> > > “dump”
> >> > > to ipa_opt_pass_d that’d dump the pass
> >> > > summary ?
> >> >
> >> > That sounds like a good idea indeed.  I'm not sure if this is the most
> >> > interesting
> >> > missing part - I guess we'll find out once a dump tool is available.
> >>
> >> Concering the LTO file format my longer term aim is to make the symbol
> >> table sections (symtab used by lto-plugin as well as the callgraph
> >> section)
> >> and hopefully also the Gimple streams) documented and well behaving
> >> without changing the format in every revision.
> >>
> >> On the other hand the summaries used by individual passes are intended to
> >> be
> >> pass specific and envolving as individula passes become stronger/new
> >> passes
> >> are added.
> >>
> >> It is quite a lot of work to stabilize gimple representation to this
> >> extend,
> >> For callgraph&symbol table this is however more realistic. That would mean
> >> to
> >> move some of existing random stuff streamed there into summaries and
> >> additionaly
> >> cleaning up/rewriting lto-cgraph so the on disk format actually makes
> >> sense.
> >>
> >> I will be happy to help with any steps in this direction as well.
> >>
> >> Honza
> >
> >


Re: How big (and fast) is going to be GCC 8?

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 11:12 AM, Martin Liška  wrote:
> > Hello.
> >
> > Many significant changes has landed in mainline and will be released as GCC 
> > 8.1.
> > I decided to use various GCC configs we have and test how there 
> > configuration differ
> > in size and also binary size.
> >
> > This is first part where I measured binary size, speed comparison will 
> > follow.
> > Configuration names should be self-explaining, the 'system-*' is built done
> > without bootstrap with my system compiler (GCC 7.3.0). All builds are done
> > on my Intel Haswell machine.
> 
> So from the numbers I see that bootstrap causes a 8% bigger binary compared
> to non-bootstrap using GCC 7.3 at -O2 when including debug info and 1.2%
> larger stripped.  That means trunk generates larger code.

It is bit odd indeed because size stats from specs seems to imply otherwise.
It would be nice to work that out.  Also I am surprised that LTO increases text
size even for non-plugin build. I should not happen.
These issues are generally hard to debug though.  I will try to take a look.

I will send similar stats for my firefox experiments. If you have scripts to 
collect
them, they would be welcome.

Thanks for looking into this!
Honza
> 
> What is missing is a speed comparison of the various binaries -- you could
> try measuring this by doing a make all-gcc for a non-bootstrap config
> (so it uses -O2 -g and doesn't build target libs with the built compiler).
> 
> Richard.
> 
> > Feel free to reply if you need any explanation.
> > Martin


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Thank you Richard and Honza for the suggestions. If I understand correctly,
> > the issue is that LTO file format keeps changing per compiler versions, so
> > we need a more “stable” representation and the first step for that would be
> > to “stabilize” representations for lto-cgraph and symbol table ?
> 
> Yes.  Note the issue is that the current format is a 1:1 representation of
> the internal representation -- which means it is the internal representation
> that changes frequently across releases.  I'm not sure how Honza wants
> to deal with those changes in the context of a "stable" IL format.  Given
> we haven't been able to provide a stable API to plugins I think it's much
> harder to provide a stable streaming format for all the IL details

Well, because I think it would be good for us to more formalize our IL -
document it properly, remove stuff which is not necessary and get an API+file
representation. Those things are connected to each other and will need work.

If you look how much things chage, it is not very frequent we would change what
is in our CFG (I changed profile this release), how gimple tuples are
represented and what gimple instructions we have.  I think those parts are
resonably well defined. Even if I changed profile this release it is relatively
localized type of change.  I am still more commonly changing symbol table as it
needs to adapt for all LTO details, but I hope to be basically done.

What is more on the go are trees that we will hopefully deal with by defining
gimple types now with early debug done.

What we can do realistically now is to first aim to stream those of better
defined parts in externally parseable sections which do have documentation.  So
far only externally parseable section is the plugin symbol table, but we should
be able to do so with reasonable effort for symbol tables, CFGs and gimple
instruction streams.

In parallel we can incrementally deal with trees mostly hopefully by getting rid
of them (moving symbol names/etc to symbol table so it can live w/o 
declarations,
having gimple types etc.)

> 
> > Could you
> > please elaborate on what initial steps need to be taken in this regard, and
> > if it’s feasible within GSoC timeframe ?
> 
> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> feasible
> at all ...)
> 
> > Thanks!
> >
> >
> > I am trying to break down the project into milestones for the proposal. So
> > far, I have identified the following objectives:
> >
> > 1] Creating a separate driver, that can read LTO object files. Following
> > Richard’s estimate, I’d leave around first half of the period for this task.
> >
> > Would that be OK ?
> 
> Yes.
Yes, it looks good to me too.
> 
> > Coming to 2nd half:
> >
> > 2] Dumping pass summaries.
> >
> > 3] Stabilizing lto-cgraph and symbol table.
> 
> So I'd instead do
> 
>  3] Enhance the user-interface of the driver
> 
> like providing a way to list all function bodies, a way to dump
> the IL of a single function body, a way to create a dot graph file
> for the cgraph in the file, etc.
> 
> Basically while there's a lot of dumping infrastructure in GCC
> it may not always fit the needs of a LTO IL dumping tool 1:1
> and may need refactoring enhancement.

I would agree here - dumping pass summaries would be nice but we already have
that more or less.  All IPA passes dump their summary into beggining of their
dump file and I find that relatively sufficient to deal with mostly because
summaries are quite simple.  It is much harder to deal with the global sream of
trees and function bodies themselves.

Honza
> 
> Richard.
> 
> >
> > Thanks,
> >
> > Hrishikesh
> >
> >
> >
> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
> >>
> >> Hello,
> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >> >  wrote:
> >> > > Hello everyone,
> >> > >
> >> > >
> >> > > Thanks for your suggestions and engaging response.
> >> > >
> >> > > Based on the feedback I think that the scope of this project comprises
> >> > > of
> >> > > following three indicative actions:
> >> > >
> >> > >
> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> >> > > object API
> >> > > for reading the lto file.
> >> >
> >> > Yes.  I expect this will take the whole first half of the project,
> >> > after this you
> >> > should be somewhat familiar with the infrastructure as well.  With the
> >> > existing dumping infrastructure it should be possible to dump the
> >> > callgraph and individual function bodies.
> >> >
> >> > >
> >> > > 2. Extending LTO dump infrastructure:
> >> > >
> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
> >> > > that for
> >> > > dumping pass summaries ? For instance, should we add a new hook say
> >> > > “dump”
> >> > > to ipa_opt_pass_d that’d dump the pass
> >> > > summ

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
 wrote:
> Hi,
>
> Thank you Richard and Honza for the suggestions. If I understand correctly,
> the issue is that LTO file format keeps changing per compiler versions, so
> we need a more “stable” representation and the first step for that would be
> to “stabilize” representations for lto-cgraph and symbol table ?

Yes.  Note the issue is that the current format is a 1:1 representation of
the internal representation -- which means it is the internal representation
that changes frequently across releases.  I'm not sure how Honza wants
to deal with those changes in the context of a "stable" IL format.  Given
we haven't been able to provide a stable API to plugins I think it's much
harder to provide a stable streaming format for all the IL details

> Could you
> please elaborate on what initial steps need to be taken in this regard, and
> if it’s feasible within GSoC timeframe ?

I don't think it is feasible in the GSoC timeframe (nor do I think it's feasible
at all ...)

> Thanks!
>
>
> I am trying to break down the project into milestones for the proposal. So
> far, I have identified the following objectives:
>
> 1] Creating a separate driver, that can read LTO object files. Following
> Richard’s estimate, I’d leave around first half of the period for this task.
>
> Would that be OK ?

Yes.

> Coming to 2nd half:
>
> 2] Dumping pass summaries.
>
> 3] Stabilizing lto-cgraph and symbol table.

So I'd instead do

 3] Enhance the user-interface of the driver

like providing a way to list all function bodies, a way to dump
the IL of a single function body, a way to create a dot graph file
for the cgraph in the file, etc.

Basically while there's a lot of dumping infrastructure in GCC
it may not always fit the needs of a LTO IL dumping tool 1:1
and may need refactoring enhancement.

Richard.

>
> Thanks,
>
> Hrishikesh
>
>
>
> On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
>>
>> Hello,
>> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
>> >  wrote:
>> > > Hello everyone,
>> > >
>> > >
>> > > Thanks for your suggestions and engaging response.
>> > >
>> > > Based on the feedback I think that the scope of this project comprises
>> > > of
>> > > following three indicative actions:
>> > >
>> > >
>> > > 1. Creating separate driver i.e. separate dump tool that uses lto
>> > > object API
>> > > for reading the lto file.
>> >
>> > Yes.  I expect this will take the whole first half of the project,
>> > after this you
>> > should be somewhat familiar with the infrastructure as well.  With the
>> > existing dumping infrastructure it should be possible to dump the
>> > callgraph and individual function bodies.
>> >
>> > >
>> > > 2. Extending LTO dump infrastructure:
>> > >
>> > > GCC already seems to have dump infrastructure for pretty-printing tree
>> > > nodes, gimple statements etc. However I suppose we’d need to extend
>> > > that for
>> > > dumping pass summaries ? For instance, should we add a new hook say
>> > > “dump”
>> > > to ipa_opt_pass_d that’d dump the pass
>> > > summary ?
>> >
>> > That sounds like a good idea indeed.  I'm not sure if this is the most
>> > interesting
>> > missing part - I guess we'll find out once a dump tool is available.
>>
>> Concering the LTO file format my longer term aim is to make the symbol
>> table sections (symtab used by lto-plugin as well as the callgraph
>> section)
>> and hopefully also the Gimple streams) documented and well behaving
>> without changing the format in every revision.
>>
>> On the other hand the summaries used by individual passes are intended to
>> be
>> pass specific and envolving as individula passes become stronger/new
>> passes
>> are added.
>>
>> It is quite a lot of work to stabilize gimple representation to this
>> extend,
>> For callgraph&symbol table this is however more realistic. That would mean
>> to
>> move some of existing random stuff streamed there into summaries and
>> additionaly
>> cleaning up/rewriting lto-cgraph so the on disk format actually makes
>> sense.
>>
>> I will be happy to help with any steps in this direction as well.
>>
>> Honza
>
>


Re: eliminate dead stores across functions

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni
 wrote:
> Hi,
> For the following test-case,
>
> int a;
>
> __attribute__((noinline))
> static void foo()
> {
>   a = 3;
> }
>
> int main()
> {
>   a = 4;
>   foo ();
>   return a;
> }
>
> I assume it's safe to remove "a = 4"  since 'a' would be overwritten
> by call to foo ?
> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects
> of function call,
> so could we perhaps use ipa_reference_get_not_written_global() in dse
> pass to check if a global variable will be killed on call to a
> function ? If not, I suppose we could write a similar ipa pass that
> computes the set of killed global variables per function but I am not
> sure if that's the correct approach.

Do you think the situation happens often enough to make this worthwhile?

ipa-reference doesn't compute must-def, only may-def and may-use IIRC.

Richard.

> Thanks,
> Prathamesh


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Hrishikesh Kulkarni
Hi,

Thank you Richard and Honza for the suggestions. If I understand correctly,
the issue is that LTO file format keeps changing per compiler versions, so
we need a more “stable” representation and the first step for that would be
to “stabilize” representations for lto-cgraph and symbol table ? Could you
please elaborate on what initial steps need to be taken in this regard, and
if it’s feasible within GSoC timeframe ?

Thanks!

I am trying to break down the project into milestones for the proposal. So
far, I have identified the following objectives:

1] Creating a separate driver, that can read LTO object files. Following
Richard’s estimate, I’d leave around first half of the period for this task.

Would that be OK ?

Coming to 2nd half:

2] Dumping pass summaries.

3] Stabilizing lto-cgraph and symbol table.

Thanks,

Hrishikesh


On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:

> Hello,
> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >  wrote:
> > > Hello everyone,
> > >
> > >
> > > Thanks for your suggestions and engaging response.
> > >
> > > Based on the feedback I think that the scope of this project comprises
> of
> > > following three indicative actions:
> > >
> > >
> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> object API
> > > for reading the lto file.
> >
> > Yes.  I expect this will take the whole first half of the project,
> > after this you
> > should be somewhat familiar with the infrastructure as well.  With the
> > existing dumping infrastructure it should be possible to dump the
> > callgraph and individual function bodies.
> >
> > >
> > > 2. Extending LTO dump infrastructure:
> > >
> > > GCC already seems to have dump infrastructure for pretty-printing tree
> > > nodes, gimple statements etc. However I suppose we’d need to extend
> that for
> > > dumping pass summaries ? For instance, should we add a new hook say
> “dump”
> > > to ipa_opt_pass_d that’d dump the pass
> > > summary ?
> >
> > That sounds like a good idea indeed.  I'm not sure if this is the most
> > interesting
> > missing part - I guess we'll find out once a dump tool is available.
>
> Concering the LTO file format my longer term aim is to make the symbol
> table sections (symtab used by lto-plugin as well as the callgraph section)
> and hopefully also the Gimple streams) documented and well behaving
> without changing the format in every revision.
>
> On the other hand the summaries used by individual passes are intended to
> be
> pass specific and envolving as individula passes become stronger/new passes
> are added.
>
> It is quite a lot of work to stabilize gimple representation to this
> extend,
> For callgraph&symbol table this is however more realistic. That would mean
> to
> move some of existing random stuff streamed there into summaries and
> additionaly
> cleaning up/rewriting lto-cgraph so the on disk format actually makes
> sense.
>
> I will be happy to help with any steps in this direction as well.
>
> Honza
>


eliminate dead stores across functions

2018-03-06 Thread Prathamesh Kulkarni
Hi,
For the following test-case,

int a;

__attribute__((noinline))
static void foo()
{
  a = 3;
}

int main()
{
  a = 4;
  foo ();
  return a;
}

I assume it's safe to remove "a = 4"  since 'a' would be overwritten
by call to foo ?
IIUC, ipa-reference pass does mod/ref analysis to compute side-effects
of function call,
so could we perhaps use ipa_reference_get_not_written_global() in dse
pass to check if a global variable will be killed on call to a
function ? If not, I suppose we could write a similar ipa pass that
computes the set of killed global variables per function but I am not
sure if that's the correct approach.

Thanks,
Prathamesh


Re: How big (and fast) is going to be GCC 8?

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 11:12 AM, Martin Liška  wrote:
> Hello.
>
> Many significant changes has landed in mainline and will be released as GCC 
> 8.1.
> I decided to use various GCC configs we have and test how there configuration 
> differ
> in size and also binary size.
>
> This is first part where I measured binary size, speed comparison will follow.
> Configuration names should be self-explaining, the 'system-*' is built done
> without bootstrap with my system compiler (GCC 7.3.0). All builds are done
> on my Intel Haswell machine.

So from the numbers I see that bootstrap causes a 8% bigger binary compared
to non-bootstrap using GCC 7.3 at -O2 when including debug info and 1.2%
larger stripped.  That means trunk generates larger code.

What is missing is a speed comparison of the various binaries -- you could
try measuring this by doing a make all-gcc for a non-bootstrap config
(so it uses -O2 -g and doesn't build target libs with the built compiler).

Richard.

> Feel free to reply if you need any explanation.
> Martin


How big (and fast) is going to be GCC 8?

2018-03-06 Thread Martin Liška
Hello.

Many significant changes has landed in mainline and will be released as GCC 8.1.
I decided to use various GCC configs we have and test how there configuration 
differ
in size and also binary size.

This is first part where I measured binary size, speed comparison will follow.
Configuration names should be self-explaining, the 'system-*' is built done
without bootstrap with my system compiler (GCC 7.3.0). All builds are done
on my Intel Haswell machine.

Feel free to reply if you need any explanation.
Martin


gcc-8-build-stats.ods
Description: application/vnd.oasis.opendocument.spreadsheet


gcc-8-build-stats.pdf.bz2
Description: application/bzip