about the gsoc
Any one for help me I want to ask question about the ideas reply please
Re: GCC GSOC Participation
Hello Prateek, On Sat, Mar 03 2018, Prateek Kalra wrote: > Hello GCC Community, > My name is Prateek Kalra.I am pursuing integrated dual > degree(B.tech+M.tech) in Computer Science Software Engineering,from Gautam > Buddha University,Greater Noida.I am currently in 8th semester of the > programme. > I have experience in competitive programming with C++.Here's my linkedin > profile: > https://www.linkedin.com/in/prateek-kalra-6a40bab3/. > I am interested in GSOC project "Implement a fuzzer leveraging GCC > extensions". > I had opted compiler design as one of the course subjects in the previous > semester and was able to secure an 'A' grade at the end of the semester. > I have theoretical knowledge of fuzz testing and csmith,that how the random > C programs are generated to check the compiler bugs and I am very keen to > work under this project. > I request you to guide me to progress through the process.I would really > appreciate if you could mentor me with the further research of this project > idea. I would suggest that you start with reading through Andi's email to another student who expressed interest in that project which you can find at: https://gcc.gnu.org/ml/gcc/2018-02/msg00216.html Andi, do you have any further suggestions what Prateek should check-out, perhaps build, examine and experiment with in order to come up with a nice proposal? Do you personally prefer starting with any particular existing fuzzer, for example? Good luck, Martin
Re: Further for GSoC.
Hello, Tejas, On Fri, Mar 02 2018, Joseph Myers wrote: > On Fri, 2 Mar 2018, Tejas Joshi wrote: > >> I have some university level experience of working and programming assembly >> language under Intel 80386DX architecture. I think it may help for >> implementing supports for other architectures. Just for start, you >> mentioned roundeven function as a guide for start. Where can I find these >> (e.g. real.c) .c files for detailed study of these functions so that I can >> have broader scenario? I have GCC 7.2.0 installed and could not find it in >> library nor in libc/. > > You need to check out the GCC source code from version control and find > the files and functions referenced in there (locating pieces of GCC code > using find, grep, etc. on the GCC source tree is something you'll need to > do a lot), and make sure you can build GCC, run the testsuite, save > results from a testsuite run, build and run the testsuite and compare the > results of the two runs (this is something that would need doing very many > times in the course of any project working on GCC). > You might have figured this out already but just in case something is not clear: 1. How to check out our sources using svn and git is described at https://gcc.gnu.org/svn.html and https://gcc.gnu.org/wiki/GitMirror respectively, and 2. perhaps more importantly, how to configure, build and test GCC is described in steps linked from https://gcc.gnu.org/install/ (look for --disable-bootstrap, among other things). If you have any specific question regarding any of these steps, feel free to ask on the mailing list or the IRC. Good luck, Martin
Re: GCC GSOC Participation
CCing Andi Kleen, mentor of this project. Regards, Prathamesh On 3 March 2018 at 16:22, Prateek Kalra wrote: > Hello GCC Community, > My name is Prateek Kalra.I am pursuing integrated dual > degree(B.tech+M.tech) in Computer Science Software Engineering,from Gautam > Buddha University,Greater Noida.I am currently in 8th semester of the > programme. > I have experience in competitive programming with C++.Here's my linkedin > profile: > https://www.linkedin.com/in/prateek-kalra-6a40bab3/. > I am interested in GSOC project "Implement a fuzzer leveraging GCC > extensions". > I had opted compiler design as one of the course subjects in the previous > semester and was able to secure an 'A' grade at the end of the semester. > I have theoretical knowledge of fuzz testing and csmith,that how the random > C programs are generated to check the compiler bugs and I am very keen to > work under this project. > I request you to guide me to progress through the process.I would really > appreciate if you could mentor me with the further research of this project > idea. > Regards, > Prateek
Re: Regarding Google summer of code.
Hello Chaitanya, On Fri, Mar 02 2018, Sai Chaitanya wrote: > Hello, > I am Chaitanya.while checking the organisation for GSOC I am very > confused,sir I have skills in C,C++, Java and little bit of python,till now > I didn't take part in any big projects. > Please guide me which organisation and which project suitable for me. > you have reached out to developers of GNU Compiler Collection (GCC). I am afraid that we are unable to help you with picking the most suitable GSoC mentor organization, for many reasons. If you are thinking of applying to do a GSoC project with us, look at our dedicated wiki page https://gcc.gnu.org/wiki/SummerOfCode If you are still interested after reading through it, I suggest that you check out our sources (https://gcc.gnu.org/svn.html, https://gcc.gnu.org/wiki/GitMirror), build the compiler (look at steps referenced from https://gcc.gnu.org/install/), look around the code a bit and then come back to us with specific questions (and ideally at least an idea for the project). Good luck, Martin
Re: Getting into C++ Downloading gcc.
On 4 March 2018 at 02:40, Ray McAllister wrote: > Hi, I'm totally blind. I do most of my programming in BASIC, but I use C++ > now and then, actually, for drawing fractals. I code graphics. I've been > using Dev-C++ because it's the only thing I can find compatible with my > screen reader. I don't like how I can't set up a char array bigger than > 1400 by 1400 as I might want to make a fractal bigger than that. You should be able to create arrays bigger than that, limited only by the memory available on your computer. You might not be able to create such large arrays on the stack, or as a global variable, but you could create it on the heap: char* array = new char[1*1]; Or a better way to do that might be: #include #include using array_type = std::array; auto array = std::make_unique(); > I have > the computer fill an array with the fractal data for colors, and then it > writes a bitmap file with the data and I can access that through BASIC or > just show it to a friend. All Dev-C lets me do for array size is 1400 by > 1400 in an array, and that's using chars. I'd use Booleans, but I need to > include color data for each pixel. I wonder if GCC would be better with > that. Dev-C++ is not a compiler, it's just an IDE, and it uses the Mingw port of GCC for the compiler. That means you're already using GCC. > I also need to knowk, please, how and where to download GCC, the > latest version. I'm not finding info on that. As explained at https://gcc.gnu.org/install/binaries.html the GCC project does not provide binaries to download, we only provide source code. There are third-party binaries available, and the mingw and mingw-w64 ports of GCC for MS Windows are available from their respective projects. The https://gcc.gnu.org/install/binaries.html page has links to them, and there are other builds of them available like http://tdm-gcc.tdragon.net/ > In addition, when I run the > dev-c++ programs from bASIC, a window comes up on the screen saying so. Is > there a way, in GCC, to prevent that? I have no idea, that sounds like a Windows feature, not something caused by the compiler. Maybe somebody else can help there.
Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
Hi Richard, On 06/03/18 16:04, Richard Biener wrote: On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li wrote: Hi all, The problem described here probably only affects targets whose ABI allow to pass structured arguments of certain size via registers. If the mode of the parameter type is BLKmode, in the callee, during RTL expanding, a stack slot will be reserved for this parameter, and the incoming value will be copied into the stack slot. However, the stack slot for the parameter will not be aligned if the alignment of parameter type exceeds MAX_SUPPORTED_STACK_ALIGNMENT. Chances are, unaligned memory access might cause run-time errors. For local variable on the stack, the alignment of the data type is honored, although the document states that it is not guaranteed. For example: #include union U { uint32_t M0; uint32_t M1; uint32_t M2; uint32_t M3; } __attribute((aligned(16))); void tmp (union U *); void foo (union U P0) { union U P1 = P0; tmp (&P1); } The code-gen from armv7-a is like this: foo: @ args = 0, pretend = 0, frame = 48 @ frame_needed = 0, uses_anonymous_args = 0 strlr, [sp, #-4]! subsp, sp, #52 movip, sp stmip, {r0, r1, r2, r3} --> ip is not 128-bit aligned addlr, sp, #39 biclr, lr, #15 ldmip, {r0, r1, r2, r3} stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned movr0, lr bltmp addsp, sp, #52 @ sp needed ldrpc, [sp], #4 There are other obvious missed optimizations in the code-generation above. The stack slot for parameter P0 and local variable P1 could be merged. So that some of the load/store instructions could be removed. I think this is a known missed optimization case. To summaries, there are two issues here: 1, (wrong code) unaligned stack slot allocated for parameters during function expansion. 2, (missed optimization) stack slot for parameter sometimes is not necessary. In certain scenario, the argument register could directly be used. Currently, this is only possible when the parameter mode is not BLKmode. For issue 1, we can do similar things as expand_used_vars. Dynamically align the stack slot address for parameters whose alignment exceeds PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the aligned address and fp when possible. For issue 2, I checked the behavior of LLVM, it seems the stack slot allocation for parameters are explicitly exposed by the alloca IR instruction at the very beginning. Later, there are optimization/transformation passes like mem2reg, reg2mem, sroa etc. to remove unnecessary alloca instructions. In gcc, the stack allocation for parameters and local variables are done during expand pass, implicitly. And RTL passes are not able to remove the unnecessary stack allocation and load/store operations. For example: uint32_t bar(union U P0) { return P0.M0; } Currently, the code-gen is different on different targets. There are various backend hooks which make the code-gen sub-optimal. For example, aarch64 target could directly return with w0 while armv7-a target generates unnecessary store and load. However, this optimization should be target independent, unrelated target alignment configuration. Both issue 1&2 could be resolved if gcc has a similar approach. But I assume the change is big. Is there any suggestions for solving issue 1 and improving issue 2 in a generic way? I can create a bugzilla ticket to record the issue. What does the ABI say for passing such over-aligned data types? For solving 1) you could copy the argument as passed by the ABI to a properly aligned stack location in the callee. Generally it sounds like either the ABI doesn't specify anything or the ABI specifies something that violates user expectation. For 2) again, it is the ABI which specifies whether an argument is passed via the stack or via registers. So - what does the ABI say? The compiler is doing the right thing here to pass argument via registers. To be specific, there are such clause in the arm PCS: B.5 If the argument is an alignment adjusted type its value is passed as a copy of the actual value. The copy will have an alignment defined as follows. ... For a Composite Type, the alignment of the copy will have 4-byte alignment if its natural alignment is <= 4 and 8-byte alignment if its natural alignment is >= 8 C.3 If the argument requires double-word alignment (8-byte), the NCRN is rounded up to the next even register number. C.4 If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into core registers, starting at the NCRN. The NCRN is incremented by the number of registers used. Successive registers hold the parts of the argument they would hold if its value were loaded into those registers from memory using an LDM instruction. The argument has now been allocated. This is quite similar for other RISC machines. Here, the p
Re: How big (and fast) is going to be GCC 8? [part 2]
On 03/06/2018 07:16 PM, Bin.Cheng wrote: On Tue, Mar 6, 2018 at 5:50 PM, Martin Liška wrote: Hi. This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0 which is built with PGO bootstrap. I run empty C and C++ source file, tramp3d and the rest are some big beasts from GCC source file. Feel free to suggest another test candidates? Note that first column defines how many times was test run. First thanks very much for collecting the data. Since we enabled several loop passes at O3 and above levels, some data for Ofast might be interesting? Do you have a nice source file full of loop nests that would test that properly? Martin Thanks, bin Martin
Re: How big (and fast) is going to be GCC 8?
On 03/06/2018 05:18 PM, Martin Liška wrote: Yes, in bytes. Would be nicer to have it in MB ;) It would be easily readable. I'll fix that. Hi. I'm sending updated binary size statistics for both cc1 and cc1plus in MB. Martin gcc-8-build-stats-v2.pdf.bz2 Description: application/bzip gcc-8-build-stats-v2.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: How big (and fast) is going to be GCC 8? [part 2]
On Tue, Mar 6, 2018 at 5:50 PM, Martin Liška wrote: > Hi. > > This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0 > which is built with PGO bootstrap. > > I run empty C and C++ source file, tramp3d and the rest are some big beasts > from GCC source file. Feel free to suggest another test candidates? Note > that first column defines how many times was test run. First thanks very much for collecting the data. Since we enabled several loop passes at O3 and above levels, some data for Ofast might be interesting? Thanks, bin > > Martin
Re: How big (and fast) is going to be GCC 8? [part 2]
Hi. This is speed comparison of GCC 8 builds compared to my system GCC 7.3.0 which is built with PGO bootstrap. I run empty C and C++ source file, tramp3d and the rest are some big beasts from GCC source file. Feel free to suggest another test candidates? Note that first column defines how many times was test run. Martin gcc-8-perf-stats.pdf.bz2 Description: application/bzip gcc-8-perf-stats.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: eliminate dead stores across functions
On Tue, Mar 6, 2018 at 4:50 PM, Bin.Cheng wrote: > On Tue, Mar 6, 2018 at 4:44 PM, Martin Jambor wrote: >> Hi Bin, >> >> On Tue, Mar 06 2018, Bin Cheng wrote: >>> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener Do you think the situation happens often enough to make this worthwhile? >>> There is one probably more useful case. Program may use global flags >>> controlling >>> how it does (heavy) computation. Such flags are only set couple of >>> times in execution >>> time. It would be useful if we can (IPA) propagate flags into computation >>> heavy >>> functions by versioning (if necessary). For example: >>> >>> int flag = 1; >>> void foo () >>> { >>> //heavy computation wrto to flag >>> } >>> void main() >>> { >>> flag = 2; >>> foo(); >>> flag = 1; >>> foo(); >>> } >>> >> >> So basically IPA-CP done on (not-addressable) static global variables. >> Do you happen to know some real code which would benefit? I'd like to >> experiment with it but would like to have a real thing to look at, as >> opposed to artificial test cases. > As Richi pointed out, I think this is not rare in spec. For this > moment I only vaguely remember 544.nab_r for such issue, but I am sure > there are other cases. Sorry I forgot to mention it might not be static variables in file scope, that's why I mentioned LTO previously. Thanks, bin > > Thanks, > bin >> >> Thanks, >> >> Martin >> >>
Re: eliminate dead stores across functions
On Tue, Mar 6, 2018 at 4:44 PM, Martin Jambor wrote: > Hi Bin, > > On Tue, Mar 06 2018, Bin Cheng wrote: >> On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener >>> >>> Do you think the situation happens often enough to make this worthwhile? >> There is one probably more useful case. Program may use global flags >> controlling >> how it does (heavy) computation. Such flags are only set couple of >> times in execution >> time. It would be useful if we can (IPA) propagate flags into computation >> heavy >> functions by versioning (if necessary). For example: >> >> int flag = 1; >> void foo () >> { >> //heavy computation wrto to flag >> } >> void main() >> { >> flag = 2; >> foo(); >> flag = 1; >> foo(); >> } >> > > So basically IPA-CP done on (not-addressable) static global variables. > Do you happen to know some real code which would benefit? I'd like to > experiment with it but would like to have a real thing to look at, as > opposed to artificial test cases. As Richi pointed out, I think this is not rare in spec. For this moment I only vaguely remember 544.nab_r for such issue, but I am sure there are other cases. Thanks, bin > > Thanks, > > Martin > >
Re: eliminate dead stores across functions
On 03/06/2018 09:28 AM, Richard Biener wrote: > On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni > wrote: >> Hi, >> For the following test-case, >> >> int a; >> >> __attribute__((noinline)) >> static void foo() >> { >> a = 3; >> } >> >> int main() >> { >> a = 4; >> foo (); >> return a; >> } >> >> I assume it's safe to remove "a = 4" since 'a' would be overwritten >> by call to foo ? >> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects >> of function call, >> so could we perhaps use ipa_reference_get_not_written_global() in dse >> pass to check if a global variable will be killed on call to a >> function ? If not, I suppose we could write a similar ipa pass that >> computes the set of killed global variables per function but I am not >> sure if that's the correct approach. > > Do you think the situation happens often enough to make this worthwhile? > > ipa-reference doesn't compute must-def, only may-def and may-use IIRC. > > Richard. > >> Thanks, >> Prathamesh This dead write optimization sounds similar to "DeadSpy: a tool to pinpoint program inefficiencies" by Milind Chabbi and John Mellor-Crummey of Rice University: https://dl.acm.org/citation.cfm?id=2259033 The abstract says there were numerous dead writes in the SPEC 2006 gcc benchmark and eliminating those provided average 15% improvement in performance. -Will
Re: eliminate dead stores across functions
Hi Bin, On Tue, Mar 06 2018, Bin Cheng wrote: > On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener >> >> Do you think the situation happens often enough to make this worthwhile? > There is one probably more useful case. Program may use global flags > controlling > how it does (heavy) computation. Such flags are only set couple of > times in execution > time. It would be useful if we can (IPA) propagate flags into computation > heavy > functions by versioning (if necessary). For example: > > int flag = 1; > void foo () > { > //heavy computation wrto to flag > } > void main() > { > flag = 2; > foo(); > flag = 1; > foo(); > } > So basically IPA-CP done on (not-addressable) static global variables. Do you happen to know some real code which would benefit? I'd like to experiment with it but would like to have a real thing to look at, as opposed to artificial test cases. Thanks, Martin
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/5/18 9:33 AM, Segher Boessenkool wrote: > On Mon, Mar 05, 2018 at 08:01:14AM +0100, Eric Botcazou wrote: >> Apparently the authors of the SPARC psABI thought that the last part of your >> sentence is an interpolation and that the (historical) requirements were >> vague >> enough to allow their interpretation, IOW that the compiler can do the work. > > Maybe we should have a target hook that says setjmp/longjmp are > implemented by simple function calls (or as-if by function calls), so > as not to penalize everyone who has an, erm, more conservative ABI? Unless someone really wants to work on this, I'll have a look at adding this once stage1 opens up. Peter
Re: How big (and fast) is going to be GCC 8?
On 03/06/2018 04:13 PM, David Malcolm wrote: On Tue, 2018-03-06 at 11:14 +0100, Martin Liška wrote: Hello. Many significant changes has landed in mainline and will be released as GCC 8.1. I decided to use various GCC configs we have and test how there configuration differ in size and also binary size. This is first part where I measured binary size, speed comparison will follow. Configuration names should be self-explaining, the 'system-*' is built done without bootstrap with my system compiler (GCC 7.3.0). All builds are done on my Intel Haswell machine. Feel free to reply if you need any explanation. Martin Some possibly silly questions: Hi David. All of them are qualified! (a) was this done with: --enable-checking=release ? Yes. (b) is this measuring cc1 ? cc1plus. Let me also add cc1 when I'll have run-time numbers. (c) are the units bytes? (so ~183MB for the unstripped system-O2- native cc1, ~25MB after stripping?) Yes, in bytes. Would be nicer to have it in MB ;) It would be easily readable. I'll fix that. (d) do you have comparable data for gcc 7? Will build corresponding builds for GCC 7 tonight. Martin Thanks Dave
Re: eliminate dead stores across functions
On Tue, Mar 6, 2018 at 4:50 PM, Bin.Cheng wrote: > On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener > wrote: >> On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni >> wrote: >>> Hi, >>> For the following test-case, >>> >>> int a; >>> >>> __attribute__((noinline)) >>> static void foo() >>> { >>> a = 3; >>> } >>> >>> int main() >>> { >>> a = 4; >>> foo (); >>> return a; >>> } >>> >>> I assume it's safe to remove "a = 4" since 'a' would be overwritten >>> by call to foo ? >>> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects >>> of function call, >>> so could we perhaps use ipa_reference_get_not_written_global() in dse >>> pass to check if a global variable will be killed on call to a >>> function ? If not, I suppose we could write a similar ipa pass that >>> computes the set of killed global variables per function but I am not >>> sure if that's the correct approach. >> >> Do you think the situation happens often enough to make this worthwhile? > There is one probably more useful case. Program may use global flags > controlling > how it does (heavy) computation. Such flags are only set couple of > times in execution > time. It would be useful if we can (IPA) propagate flags into computation > heavy > functions by versioning (if necessary). For example: > > int flag = 1; > void foo () > { > //heavy computation wrto to flag > } > void main() > { > flag = 2; > foo(); > flag = 1; > foo(); > } Yeah, libquantum does this. There's also related example from some SPEC fortran testcase: vodi foo() { static int initialized; static T data; if (!initialized) { data.x = 1; initialized = 1; } ... } where we want to constant propagate from data.x. IIRC I tried to work on this, not sure if I solved it yet... Richard. > Of course this may only be useful for LTO. > Thanks, > bin >> >> ipa-reference doesn't compute must-def, only may-def and may-use IIRC. >> >> Richard. >> >>> Thanks, >>> Prathamesh
Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li wrote: > Hi all, > > The problem described here probably only affects targets whose ABI allow to > pass structured > arguments of certain size via registers. > > If the mode of the parameter type is BLKmode, in the callee, during RTL > expanding, > a stack slot will be reserved for this parameter, and the incoming value > will be copied into > the stack slot. > > However, the stack slot for the parameter will not be aligned if the > alignment of parameter type > exceeds MAX_SUPPORTED_STACK_ALIGNMENT. > Chances are, unaligned memory access might cause run-time errors. > > For local variable on the stack, the alignment of the data type is honored, > although the document states that it is not guaranteed. > > For example: > > #include > union U { > uint32_t M0; > uint32_t M1; > uint32_t M2; > uint32_t M3; > } __attribute((aligned(16))); > > void tmp (union U *); > void foo (union U P0) > { > union U P1 = P0; > tmp (&P1); > } > > The code-gen from armv7-a is like this: > > foo: > @ args = 0, pretend = 0, frame = 48 > @ frame_needed = 0, uses_anonymous_args = 0 > strlr, [sp, #-4]! > subsp, sp, #52 > movip, sp > stmip, {r0, r1, r2, r3} --> ip is not 128-bit aligned > addlr, sp, #39 > biclr, lr, #15 > ldmip, {r0, r1, r2, r3} > stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned > movr0, lr > bltmp > addsp, sp, #52 > @ sp needed > ldrpc, [sp], #4 > > There are other obvious missed optimizations in the code-generation above. > The stack slot for parameter P0 and local variable P1 could be merged. > So that some of the load/store instructions could be removed. > I think this is a known missed optimization case. > > To summaries, there are two issues here: > 1, (wrong code) unaligned stack slot allocated for parameters during > function expansion. > 2, (missed optimization) stack slot for parameter sometimes is not > necessary. >In certain scenario, the argument register could directly be used. >Currently, this is only possible when the parameter mode is not BLKmode. > > For issue 1, we can do similar things as expand_used_vars. > Dynamically align the stack slot address for parameters whose alignment > exceeds > PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the > aligned address and fp when possible. > > For issue 2, I checked the behavior of LLVM, it seems the stack slot > allocation > for parameters are explicitly exposed by the alloca IR instruction at the > very beginning. > Later, there are optimization/transformation passes like mem2reg, reg2mem, > sroa etc. to remove > unnecessary alloca instructions. > > In gcc, the stack allocation for parameters and local variables are done > during expand pass, implicitly. > And RTL passes are not able to remove the unnecessary stack allocation and > load/store operations. > > For example: > > uint32_t bar(union U P0) > { > return P0.M0; > } > > Currently, the code-gen is different on different targets. > There are various backend hooks which make the code-gen sub-optimal. > For example, aarch64 target could directly return with w0 while armv7-a > target generates unnecessary > store and load. > > However, this optimization should be target independent, unrelated target > alignment configuration. > Both issue 1&2 could be resolved if gcc has a similar approach. But I assume > the change is big. > > Is there any suggestions for solving issue 1 and improving issue 2 in a > generic way? > I can create a bugzilla ticket to record the issue. What does the ABI say for passing such over-aligned data types? For solving 1) you could copy the argument as passed by the ABI to a properly aligned stack location in the callee. Generally it sounds like either the ABI doesn't specify anything or the ABI specifies something that violates user expectation. For 2) again, it is the ABI which specifies whether an argument is passed via the stack or via registers. So - what does the ABI say? Richard. > Regards, > Renlin
Re: eliminate dead stores across functions
On Tue, Mar 6, 2018 at 2:28 PM, Richard Biener wrote: > On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni > wrote: >> Hi, >> For the following test-case, >> >> int a; >> >> __attribute__((noinline)) >> static void foo() >> { >> a = 3; >> } >> >> int main() >> { >> a = 4; >> foo (); >> return a; >> } >> >> I assume it's safe to remove "a = 4" since 'a' would be overwritten >> by call to foo ? >> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects >> of function call, >> so could we perhaps use ipa_reference_get_not_written_global() in dse >> pass to check if a global variable will be killed on call to a >> function ? If not, I suppose we could write a similar ipa pass that >> computes the set of killed global variables per function but I am not >> sure if that's the correct approach. > > Do you think the situation happens often enough to make this worthwhile? There is one probably more useful case. Program may use global flags controlling how it does (heavy) computation. Such flags are only set couple of times in execution time. It would be useful if we can (IPA) propagate flags into computation heavy functions by versioning (if necessary). For example: int flag = 1; void foo () { //heavy computation wrto to flag } void main() { flag = 2; foo(); flag = 1; foo(); } Of course this may only be useful for LTO. Thanks, bin > > ipa-reference doesn't compute must-def, only may-def and may-use IIRC. > > Richard. > >> Thanks, >> Prathamesh
Re: GSOC 2018 - Textual LTO dump tool project
> On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka wrote: > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni > >> wrote: > >> > Hi, > >> > > >> > Thank you Richard and Honza for the suggestions. If I understand > >> > correctly, > >> > the issue is that LTO file format keeps changing per compiler versions, > >> > so > >> > we need a more “stable” representation and the first step for that would > >> > be > >> > to “stabilize” representations for lto-cgraph and symbol table ? > >> > >> Yes. Note the issue is that the current format is a 1:1 representation of > >> the internal representation -- which means it is the internal > >> representation > >> that changes frequently across releases. I'm not sure how Honza wants > >> to deal with those changes in the context of a "stable" IL format. Given > >> we haven't been able to provide a stable API to plugins I think it's much > >> harder to provide a stable streaming format for all the IL details > >> > >> > Could you > >> > please elaborate on what initial steps need to be taken in this regard, > >> > and > >> > if it’s feasible within GSoC timeframe ? > >> > >> I don't think it is feasible in the GSoC timeframe (nor do I think it's > >> feasible > >> at all ...) > > > > I skipped this, with GSoC timeframe I fully agree. With feasibility at all > > not so > > much - LLVM documents its bitcode to reasonable extend > > https://llvm.org/docs/BitCodeFormat.html > > > > Reason why i mentioned it is that I would like to use this as an excuse to > > get > > things incrementally cleaned up and it would be nice to keep it in mind > > while > > working on this. > > Ok. It's probably close enough to what I recommended doing with respect > to make the LTO bytecode "self-descriptive" -- thus start with making the > structure documented and parseable without assigning semantics to > every bit ;) I think that can be achieved top-down in a very incremental > way if you get the bottom implemented first (the data-streamer part). OK :) I did not mean to document every bit either, at least not for the fancy parts. It would be nice to have clenned up i.e. the section headers/footers so they do not depend on endianity and slowly cleanup similar nonsences at higher levels. So it may make sense to progress from both directions lower hanging fruits first. Honza
BLKmode parameters are stored in unaligned stack slot when passed via registers.
Hi all, The problem described here probably only affects targets whose ABI allow to pass structured arguments of certain size via registers. If the mode of the parameter type is BLKmode, in the callee, during RTL expanding, a stack slot will be reserved for this parameter, and the incoming value will be copied into the stack slot. However, the stack slot for the parameter will not be aligned if the alignment of parameter type exceeds MAX_SUPPORTED_STACK_ALIGNMENT. Chances are, unaligned memory access might cause run-time errors. For local variable on the stack, the alignment of the data type is honored, although the document states that it is not guaranteed. For example: #include union U { uint32_t M0; uint32_t M1; uint32_t M2; uint32_t M3; } __attribute((aligned(16))); void tmp (union U *); void foo (union U P0) { union U P1 = P0; tmp (&P1); } The code-gen from armv7-a is like this: foo: @ args = 0, pretend = 0, frame = 48 @ frame_needed = 0, uses_anonymous_args = 0 strlr, [sp, #-4]! subsp, sp, #52 movip, sp stmip, {r0, r1, r2, r3} --> ip is not 128-bit aligned addlr, sp, #39 biclr, lr, #15 ldmip, {r0, r1, r2, r3} stmlr, {r0, r1, r2, r3} --> lr is 128-bit aligned movr0, lr bltmp addsp, sp, #52 @ sp needed ldrpc, [sp], #4 There are other obvious missed optimizations in the code-generation above. The stack slot for parameter P0 and local variable P1 could be merged. So that some of the load/store instructions could be removed. I think this is a known missed optimization case. To summaries, there are two issues here: 1, (wrong code) unaligned stack slot allocated for parameters during function expansion. 2, (missed optimization) stack slot for parameter sometimes is not necessary. In certain scenario, the argument register could directly be used. Currently, this is only possible when the parameter mode is not BLKmode. For issue 1, we can do similar things as expand_used_vars. Dynamically align the stack slot address for parameters whose alignment exceeds PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the aligned address and fp when possible. For issue 2, I checked the behavior of LLVM, it seems the stack slot allocation for parameters are explicitly exposed by the alloca IR instruction at the very beginning. Later, there are optimization/transformation passes like mem2reg, reg2mem, sroa etc. to remove unnecessary alloca instructions. In gcc, the stack allocation for parameters and local variables are done during expand pass, implicitly. And RTL passes are not able to remove the unnecessary stack allocation and load/store operations. For example: uint32_t bar(union U P0) { return P0.M0; } Currently, the code-gen is different on different targets. There are various backend hooks which make the code-gen sub-optimal. For example, aarch64 target could directly return with w0 while armv7-a target generates unnecessary store and load. However, this optimization should be target independent, unrelated target alignment configuration. Both issue 1&2 could be resolved if gcc has a similar approach. But I assume the change is big. Is there any suggestions for solving issue 1 and improving issue 2 in a generic way? I can create a bugzilla ticket to record the issue. Regards, Renlin
Re: How big (and fast) is going to be GCC 8?
On Tue, 2018-03-06 at 11:14 +0100, Martin Liška wrote: > Hello. > > Many significant changes has landed in mainline and will be released > as GCC 8.1. > I decided to use various GCC configs we have and test how there > configuration differ > in size and also binary size. > > This is first part where I measured binary size, speed comparison > will follow. > Configuration names should be self-explaining, the 'system-*' is > built done > without bootstrap with my system compiler (GCC 7.3.0). All builds are > done > on my Intel Haswell machine. > > Feel free to reply if you need any explanation. > Martin Some possibly silly questions: (a) was this done with: --enable-checking=release ? (b) is this measuring cc1 ? (c) are the units bytes? (so ~183MB for the unstripped system-O2- native cc1, ~25MB after stripping?) (d) do you have comparable data for gcc 7? Thanks Dave
Re: GSOC 2018 - Textual LTO dump tool project
On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka wrote: >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni >> wrote: >> > Hi, >> > >> > Thank you Richard and Honza for the suggestions. If I understand correctly, >> > the issue is that LTO file format keeps changing per compiler versions, so >> > we need a more “stable” representation and the first step for that would be >> > to “stabilize” representations for lto-cgraph and symbol table ? >> >> Yes. Note the issue is that the current format is a 1:1 representation of >> the internal representation -- which means it is the internal representation >> that changes frequently across releases. I'm not sure how Honza wants >> to deal with those changes in the context of a "stable" IL format. Given >> we haven't been able to provide a stable API to plugins I think it's much >> harder to provide a stable streaming format for all the IL details >> >> > Could you >> > please elaborate on what initial steps need to be taken in this regard, and >> > if it’s feasible within GSoC timeframe ? >> >> I don't think it is feasible in the GSoC timeframe (nor do I think it's >> feasible >> at all ...) > > I skipped this, with GSoC timeframe I fully agree. With feasibility at all > not so > much - LLVM documents its bitcode to reasonable extend > https://llvm.org/docs/BitCodeFormat.html > > Reason why i mentioned it is that I would like to use this as an excuse to get > things incrementally cleaned up and it would be nice to keep it in mind while > working on this. Ok. It's probably close enough to what I recommended doing with respect to make the LTO bytecode "self-descriptive" -- thus start with making the structure documented and parseable without assigning semantics to every bit ;) I think that can be achieved top-down in a very incremental way if you get the bottom implemented first (the data-streamer part). Richard. > Honza >> >> > Thanks! >> > >> > >> > I am trying to break down the project into milestones for the proposal. So >> > far, I have identified the following objectives: >> > >> > 1] Creating a separate driver, that can read LTO object files. Following >> > Richard’s estimate, I’d leave around first half of the period for this >> > task. >> > >> > Would that be OK ? >> >> Yes. >> >> > Coming to 2nd half: >> > >> > 2] Dumping pass summaries. >> > >> > 3] Stabilizing lto-cgraph and symbol table. >> >> So I'd instead do >> >> 3] Enhance the user-interface of the driver >> >> like providing a way to list all function bodies, a way to dump >> the IL of a single function body, a way to create a dot graph file >> for the cgraph in the file, etc. >> >> Basically while there's a lot of dumping infrastructure in GCC >> it may not always fit the needs of a LTO IL dumping tool 1:1 >> and may need refactoring enhancement. >> >> Richard. >> >> > >> > Thanks, >> > >> > Hrishikesh >> > >> > >> > >> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka wrote: >> >> >> >> Hello, >> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni >> >> > wrote: >> >> > > Hello everyone, >> >> > > >> >> > > >> >> > > Thanks for your suggestions and engaging response. >> >> > > >> >> > > Based on the feedback I think that the scope of this project comprises >> >> > > of >> >> > > following three indicative actions: >> >> > > >> >> > > >> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto >> >> > > object API >> >> > > for reading the lto file. >> >> > >> >> > Yes. I expect this will take the whole first half of the project, >> >> > after this you >> >> > should be somewhat familiar with the infrastructure as well. With the >> >> > existing dumping infrastructure it should be possible to dump the >> >> > callgraph and individual function bodies. >> >> > >> >> > > >> >> > > 2. Extending LTO dump infrastructure: >> >> > > >> >> > > GCC already seems to have dump infrastructure for pretty-printing tree >> >> > > nodes, gimple statements etc. However I suppose we’d need to extend >> >> > > that for >> >> > > dumping pass summaries ? For instance, should we add a new hook say >> >> > > “dump” >> >> > > to ipa_opt_pass_d that’d dump the pass >> >> > > summary ? >> >> > >> >> > That sounds like a good idea indeed. I'm not sure if this is the most >> >> > interesting >> >> > missing part - I guess we'll find out once a dump tool is available. >> >> >> >> Concering the LTO file format my longer term aim is to make the symbol >> >> table sections (symtab used by lto-plugin as well as the callgraph >> >> section) >> >> and hopefully also the Gimple streams) documented and well behaving >> >> without changing the format in every revision. >> >> >> >> On the other hand the summaries used by individual passes are intended to >> >> be >> >> pass specific and envolving as individula passes become stronger/new >> >> passes >> >> are added. >> >> >> >> It is quite a lot of work to stabilize gimple representation to this >> >> extend, >> >> For callgraph&symbol table this
Re: GSOC 2018 - Textual LTO dump tool project
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni > wrote: > > Hi, > > > > Thank you Richard and Honza for the suggestions. If I understand correctly, > > the issue is that LTO file format keeps changing per compiler versions, so > > we need a more “stable” representation and the first step for that would be > > to “stabilize” representations for lto-cgraph and symbol table ? > > Yes. Note the issue is that the current format is a 1:1 representation of > the internal representation -- which means it is the internal representation > that changes frequently across releases. I'm not sure how Honza wants > to deal with those changes in the context of a "stable" IL format. Given > we haven't been able to provide a stable API to plugins I think it's much > harder to provide a stable streaming format for all the IL details > > > Could you > > please elaborate on what initial steps need to be taken in this regard, and > > if it’s feasible within GSoC timeframe ? > > I don't think it is feasible in the GSoC timeframe (nor do I think it's > feasible > at all ...) I skipped this, with GSoC timeframe I fully agree. With feasibility at all not so much - LLVM documents its bitcode to reasonable extend https://llvm.org/docs/BitCodeFormat.html Reason why i mentioned it is that I would like to use this as an excuse to get things incrementally cleaned up and it would be nice to keep it in mind while working on this. Honza > > > Thanks! > > > > > > I am trying to break down the project into milestones for the proposal. So > > far, I have identified the following objectives: > > > > 1] Creating a separate driver, that can read LTO object files. Following > > Richard’s estimate, I’d leave around first half of the period for this task. > > > > Would that be OK ? > > Yes. > > > Coming to 2nd half: > > > > 2] Dumping pass summaries. > > > > 3] Stabilizing lto-cgraph and symbol table. > > So I'd instead do > > 3] Enhance the user-interface of the driver > > like providing a way to list all function bodies, a way to dump > the IL of a single function body, a way to create a dot graph file > for the cgraph in the file, etc. > > Basically while there's a lot of dumping infrastructure in GCC > it may not always fit the needs of a LTO IL dumping tool 1:1 > and may need refactoring enhancement. > > Richard. > > > > > Thanks, > > > > Hrishikesh > > > > > > > > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka wrote: > >> > >> Hello, > >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni > >> > wrote: > >> > > Hello everyone, > >> > > > >> > > > >> > > Thanks for your suggestions and engaging response. > >> > > > >> > > Based on the feedback I think that the scope of this project comprises > >> > > of > >> > > following three indicative actions: > >> > > > >> > > > >> > > 1. Creating separate driver i.e. separate dump tool that uses lto > >> > > object API > >> > > for reading the lto file. > >> > > >> > Yes. I expect this will take the whole first half of the project, > >> > after this you > >> > should be somewhat familiar with the infrastructure as well. With the > >> > existing dumping infrastructure it should be possible to dump the > >> > callgraph and individual function bodies. > >> > > >> > > > >> > > 2. Extending LTO dump infrastructure: > >> > > > >> > > GCC already seems to have dump infrastructure for pretty-printing tree > >> > > nodes, gimple statements etc. However I suppose we’d need to extend > >> > > that for > >> > > dumping pass summaries ? For instance, should we add a new hook say > >> > > “dump” > >> > > to ipa_opt_pass_d that’d dump the pass > >> > > summary ? > >> > > >> > That sounds like a good idea indeed. I'm not sure if this is the most > >> > interesting > >> > missing part - I guess we'll find out once a dump tool is available. > >> > >> Concering the LTO file format my longer term aim is to make the symbol > >> table sections (symtab used by lto-plugin as well as the callgraph > >> section) > >> and hopefully also the Gimple streams) documented and well behaving > >> without changing the format in every revision. > >> > >> On the other hand the summaries used by individual passes are intended to > >> be > >> pass specific and envolving as individula passes become stronger/new > >> passes > >> are added. > >> > >> It is quite a lot of work to stabilize gimple representation to this > >> extend, > >> For callgraph&symbol table this is however more realistic. That would mean > >> to > >> move some of existing random stuff streamed there into summaries and > >> additionaly > >> cleaning up/rewriting lto-cgraph so the on disk format actually makes > >> sense. > >> > >> I will be happy to help with any steps in this direction as well. > >> > >> Honza > > > >
Re: How big (and fast) is going to be GCC 8?
> On Tue, Mar 6, 2018 at 11:12 AM, Martin Liška wrote: > > Hello. > > > > Many significant changes has landed in mainline and will be released as GCC > > 8.1. > > I decided to use various GCC configs we have and test how there > > configuration differ > > in size and also binary size. > > > > This is first part where I measured binary size, speed comparison will > > follow. > > Configuration names should be self-explaining, the 'system-*' is built done > > without bootstrap with my system compiler (GCC 7.3.0). All builds are done > > on my Intel Haswell machine. > > So from the numbers I see that bootstrap causes a 8% bigger binary compared > to non-bootstrap using GCC 7.3 at -O2 when including debug info and 1.2% > larger stripped. That means trunk generates larger code. It is bit odd indeed because size stats from specs seems to imply otherwise. It would be nice to work that out. Also I am surprised that LTO increases text size even for non-plugin build. I should not happen. These issues are generally hard to debug though. I will try to take a look. I will send similar stats for my firefox experiments. If you have scripts to collect them, they would be welcome. Thanks for looking into this! Honza > > What is missing is a speed comparison of the various binaries -- you could > try measuring this by doing a make all-gcc for a non-bootstrap config > (so it uses -O2 -g and doesn't build target libs with the built compiler). > > Richard. > > > Feel free to reply if you need any explanation. > > Martin
Re: GSOC 2018 - Textual LTO dump tool project
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni > wrote: > > Hi, > > > > Thank you Richard and Honza for the suggestions. If I understand correctly, > > the issue is that LTO file format keeps changing per compiler versions, so > > we need a more “stable” representation and the first step for that would be > > to “stabilize” representations for lto-cgraph and symbol table ? > > Yes. Note the issue is that the current format is a 1:1 representation of > the internal representation -- which means it is the internal representation > that changes frequently across releases. I'm not sure how Honza wants > to deal with those changes in the context of a "stable" IL format. Given > we haven't been able to provide a stable API to plugins I think it's much > harder to provide a stable streaming format for all the IL details Well, because I think it would be good for us to more formalize our IL - document it properly, remove stuff which is not necessary and get an API+file representation. Those things are connected to each other and will need work. If you look how much things chage, it is not very frequent we would change what is in our CFG (I changed profile this release), how gimple tuples are represented and what gimple instructions we have. I think those parts are resonably well defined. Even if I changed profile this release it is relatively localized type of change. I am still more commonly changing symbol table as it needs to adapt for all LTO details, but I hope to be basically done. What is more on the go are trees that we will hopefully deal with by defining gimple types now with early debug done. What we can do realistically now is to first aim to stream those of better defined parts in externally parseable sections which do have documentation. So far only externally parseable section is the plugin symbol table, but we should be able to do so with reasonable effort for symbol tables, CFGs and gimple instruction streams. In parallel we can incrementally deal with trees mostly hopefully by getting rid of them (moving symbol names/etc to symbol table so it can live w/o declarations, having gimple types etc.) > > > Could you > > please elaborate on what initial steps need to be taken in this regard, and > > if it’s feasible within GSoC timeframe ? > > I don't think it is feasible in the GSoC timeframe (nor do I think it's > feasible > at all ...) > > > Thanks! > > > > > > I am trying to break down the project into milestones for the proposal. So > > far, I have identified the following objectives: > > > > 1] Creating a separate driver, that can read LTO object files. Following > > Richard’s estimate, I’d leave around first half of the period for this task. > > > > Would that be OK ? > > Yes. Yes, it looks good to me too. > > > Coming to 2nd half: > > > > 2] Dumping pass summaries. > > > > 3] Stabilizing lto-cgraph and symbol table. > > So I'd instead do > > 3] Enhance the user-interface of the driver > > like providing a way to list all function bodies, a way to dump > the IL of a single function body, a way to create a dot graph file > for the cgraph in the file, etc. > > Basically while there's a lot of dumping infrastructure in GCC > it may not always fit the needs of a LTO IL dumping tool 1:1 > and may need refactoring enhancement. I would agree here - dumping pass summaries would be nice but we already have that more or less. All IPA passes dump their summary into beggining of their dump file and I find that relatively sufficient to deal with mostly because summaries are quite simple. It is much harder to deal with the global sream of trees and function bodies themselves. Honza > > Richard. > > > > > Thanks, > > > > Hrishikesh > > > > > > > > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka wrote: > >> > >> Hello, > >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni > >> > wrote: > >> > > Hello everyone, > >> > > > >> > > > >> > > Thanks for your suggestions and engaging response. > >> > > > >> > > Based on the feedback I think that the scope of this project comprises > >> > > of > >> > > following three indicative actions: > >> > > > >> > > > >> > > 1. Creating separate driver i.e. separate dump tool that uses lto > >> > > object API > >> > > for reading the lto file. > >> > > >> > Yes. I expect this will take the whole first half of the project, > >> > after this you > >> > should be somewhat familiar with the infrastructure as well. With the > >> > existing dumping infrastructure it should be possible to dump the > >> > callgraph and individual function bodies. > >> > > >> > > > >> > > 2. Extending LTO dump infrastructure: > >> > > > >> > > GCC already seems to have dump infrastructure for pretty-printing tree > >> > > nodes, gimple statements etc. However I suppose we’d need to extend > >> > > that for > >> > > dumping pass summaries ? For instance, should we add a new hook say > >> > > “dump” > >> > > to ipa_opt_pass_d that’d dump the pass > >> > > summ
Re: GSOC 2018 - Textual LTO dump tool project
On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni wrote: > Hi, > > Thank you Richard and Honza for the suggestions. If I understand correctly, > the issue is that LTO file format keeps changing per compiler versions, so > we need a more “stable” representation and the first step for that would be > to “stabilize” representations for lto-cgraph and symbol table ? Yes. Note the issue is that the current format is a 1:1 representation of the internal representation -- which means it is the internal representation that changes frequently across releases. I'm not sure how Honza wants to deal with those changes in the context of a "stable" IL format. Given we haven't been able to provide a stable API to plugins I think it's much harder to provide a stable streaming format for all the IL details > Could you > please elaborate on what initial steps need to be taken in this regard, and > if it’s feasible within GSoC timeframe ? I don't think it is feasible in the GSoC timeframe (nor do I think it's feasible at all ...) > Thanks! > > > I am trying to break down the project into milestones for the proposal. So > far, I have identified the following objectives: > > 1] Creating a separate driver, that can read LTO object files. Following > Richard’s estimate, I’d leave around first half of the period for this task. > > Would that be OK ? Yes. > Coming to 2nd half: > > 2] Dumping pass summaries. > > 3] Stabilizing lto-cgraph and symbol table. So I'd instead do 3] Enhance the user-interface of the driver like providing a way to list all function bodies, a way to dump the IL of a single function body, a way to create a dot graph file for the cgraph in the file, etc. Basically while there's a lot of dumping infrastructure in GCC it may not always fit the needs of a LTO IL dumping tool 1:1 and may need refactoring enhancement. Richard. > > Thanks, > > Hrishikesh > > > > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka wrote: >> >> Hello, >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni >> > wrote: >> > > Hello everyone, >> > > >> > > >> > > Thanks for your suggestions and engaging response. >> > > >> > > Based on the feedback I think that the scope of this project comprises >> > > of >> > > following three indicative actions: >> > > >> > > >> > > 1. Creating separate driver i.e. separate dump tool that uses lto >> > > object API >> > > for reading the lto file. >> > >> > Yes. I expect this will take the whole first half of the project, >> > after this you >> > should be somewhat familiar with the infrastructure as well. With the >> > existing dumping infrastructure it should be possible to dump the >> > callgraph and individual function bodies. >> > >> > > >> > > 2. Extending LTO dump infrastructure: >> > > >> > > GCC already seems to have dump infrastructure for pretty-printing tree >> > > nodes, gimple statements etc. However I suppose we’d need to extend >> > > that for >> > > dumping pass summaries ? For instance, should we add a new hook say >> > > “dump” >> > > to ipa_opt_pass_d that’d dump the pass >> > > summary ? >> > >> > That sounds like a good idea indeed. I'm not sure if this is the most >> > interesting >> > missing part - I guess we'll find out once a dump tool is available. >> >> Concering the LTO file format my longer term aim is to make the symbol >> table sections (symtab used by lto-plugin as well as the callgraph >> section) >> and hopefully also the Gimple streams) documented and well behaving >> without changing the format in every revision. >> >> On the other hand the summaries used by individual passes are intended to >> be >> pass specific and envolving as individula passes become stronger/new >> passes >> are added. >> >> It is quite a lot of work to stabilize gimple representation to this >> extend, >> For callgraph&symbol table this is however more realistic. That would mean >> to >> move some of existing random stuff streamed there into summaries and >> additionaly >> cleaning up/rewriting lto-cgraph so the on disk format actually makes >> sense. >> >> I will be happy to help with any steps in this direction as well. >> >> Honza > >
Re: eliminate dead stores across functions
On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni wrote: > Hi, > For the following test-case, > > int a; > > __attribute__((noinline)) > static void foo() > { > a = 3; > } > > int main() > { > a = 4; > foo (); > return a; > } > > I assume it's safe to remove "a = 4" since 'a' would be overwritten > by call to foo ? > IIUC, ipa-reference pass does mod/ref analysis to compute side-effects > of function call, > so could we perhaps use ipa_reference_get_not_written_global() in dse > pass to check if a global variable will be killed on call to a > function ? If not, I suppose we could write a similar ipa pass that > computes the set of killed global variables per function but I am not > sure if that's the correct approach. Do you think the situation happens often enough to make this worthwhile? ipa-reference doesn't compute must-def, only may-def and may-use IIRC. Richard. > Thanks, > Prathamesh
Re: GSOC 2018 - Textual LTO dump tool project
Hi, Thank you Richard and Honza for the suggestions. If I understand correctly, the issue is that LTO file format keeps changing per compiler versions, so we need a more “stable” representation and the first step for that would be to “stabilize” representations for lto-cgraph and symbol table ? Could you please elaborate on what initial steps need to be taken in this regard, and if it’s feasible within GSoC timeframe ? Thanks! I am trying to break down the project into milestones for the proposal. So far, I have identified the following objectives: 1] Creating a separate driver, that can read LTO object files. Following Richard’s estimate, I’d leave around first half of the period for this task. Would that be OK ? Coming to 2nd half: 2] Dumping pass summaries. 3] Stabilizing lto-cgraph and symbol table. Thanks, Hrishikesh On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka wrote: > Hello, > > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni > > wrote: > > > Hello everyone, > > > > > > > > > Thanks for your suggestions and engaging response. > > > > > > Based on the feedback I think that the scope of this project comprises > of > > > following three indicative actions: > > > > > > > > > 1. Creating separate driver i.e. separate dump tool that uses lto > object API > > > for reading the lto file. > > > > Yes. I expect this will take the whole first half of the project, > > after this you > > should be somewhat familiar with the infrastructure as well. With the > > existing dumping infrastructure it should be possible to dump the > > callgraph and individual function bodies. > > > > > > > > 2. Extending LTO dump infrastructure: > > > > > > GCC already seems to have dump infrastructure for pretty-printing tree > > > nodes, gimple statements etc. However I suppose we’d need to extend > that for > > > dumping pass summaries ? For instance, should we add a new hook say > “dump” > > > to ipa_opt_pass_d that’d dump the pass > > > summary ? > > > > That sounds like a good idea indeed. I'm not sure if this is the most > > interesting > > missing part - I guess we'll find out once a dump tool is available. > > Concering the LTO file format my longer term aim is to make the symbol > table sections (symtab used by lto-plugin as well as the callgraph section) > and hopefully also the Gimple streams) documented and well behaving > without changing the format in every revision. > > On the other hand the summaries used by individual passes are intended to > be > pass specific and envolving as individula passes become stronger/new passes > are added. > > It is quite a lot of work to stabilize gimple representation to this > extend, > For callgraph&symbol table this is however more realistic. That would mean > to > move some of existing random stuff streamed there into summaries and > additionaly > cleaning up/rewriting lto-cgraph so the on disk format actually makes > sense. > > I will be happy to help with any steps in this direction as well. > > Honza >
eliminate dead stores across functions
Hi, For the following test-case, int a; __attribute__((noinline)) static void foo() { a = 3; } int main() { a = 4; foo (); return a; } I assume it's safe to remove "a = 4" since 'a' would be overwritten by call to foo ? IIUC, ipa-reference pass does mod/ref analysis to compute side-effects of function call, so could we perhaps use ipa_reference_get_not_written_global() in dse pass to check if a global variable will be killed on call to a function ? If not, I suppose we could write a similar ipa pass that computes the set of killed global variables per function but I am not sure if that's the correct approach. Thanks, Prathamesh
Re: How big (and fast) is going to be GCC 8?
On Tue, Mar 6, 2018 at 11:12 AM, Martin Liška wrote: > Hello. > > Many significant changes has landed in mainline and will be released as GCC > 8.1. > I decided to use various GCC configs we have and test how there configuration > differ > in size and also binary size. > > This is first part where I measured binary size, speed comparison will follow. > Configuration names should be self-explaining, the 'system-*' is built done > without bootstrap with my system compiler (GCC 7.3.0). All builds are done > on my Intel Haswell machine. So from the numbers I see that bootstrap causes a 8% bigger binary compared to non-bootstrap using GCC 7.3 at -O2 when including debug info and 1.2% larger stripped. That means trunk generates larger code. What is missing is a speed comparison of the various binaries -- you could try measuring this by doing a make all-gcc for a non-bootstrap config (so it uses -O2 -g and doesn't build target libs with the built compiler). Richard. > Feel free to reply if you need any explanation. > Martin
How big (and fast) is going to be GCC 8?
Hello. Many significant changes has landed in mainline and will be released as GCC 8.1. I decided to use various GCC configs we have and test how there configuration differ in size and also binary size. This is first part where I measured binary size, speed comparison will follow. Configuration names should be self-explaining, the 'system-*' is built done without bootstrap with my system compiler (GCC 7.3.0). All builds are done on my Intel Haswell machine. Feel free to reply if you need any explanation. Martin gcc-8-build-stats.ods Description: application/vnd.oasis.opendocument.spreadsheet gcc-8-build-stats.pdf.bz2 Description: application/bzip