Re: [PATCH 3/5] IPA ICF pass
On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? (without ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall 403069 kB (12%) ggc phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall 2944210 kB (88%) ggc phase stream out: 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall 0 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall 0 kB ( 0%) ggc ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization: 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall 173701 kB ( 5%) ggc ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall 532704 kB (16%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall 2226088 kB (66%) ggc ipa lto decl out: 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall 14226 kB ( 0%) ggc ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall 364151 kB (11%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall 8196 kB ( 0%) ggc ipa lto cgraph merge: 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall 4938 kB ( 0%) ggc ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 92.91 4.29 151.73 3348693 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. (with ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall 1468975 kB (33%) ggc phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall 2944210 kB (67%) ggc phase stream out: 4.52 ( 4%) usr 1.90 (30%) sys 73.47 (38%) wall 12 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 (
Re: [PATCH 3/5] IPA ICF pass
On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote: It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. Forgot to post the binary size numbers (in bytes): | gold's icf off | gold's icf on | --+++ gcc's icf off |79793880|74881040| --+-+ gcc's icf on |78043608|73612800| --+++ -- Markus
Re: Avoid privatization of TLS variables
I may be guilty of missing a crucial point here, but: why do we care about having a small limit of static TLS variables? We surely could allocate, say, a megabyte of static TLS for each thread. We already allocate 64M for the thread-local malloc arena, after all. It doesn't cost anything beyond a little address space. What am I missing? Andrew.
Re: Avoid privatization of TLS variables
On 27/09/14 08:56, Andrew Haley wrote: I may be guilty of missing a crucial point here, but: why do we care about having a small limit of static TLS variables? We surely could allocate, say, a megabyte of static TLS for each thread. We already allocate 64M for the thread-local malloc arena, On 64-bit systems, I mean. after all. It doesn't cost anything beyond a little address space. What am I missing? Andrew.
Re: RFA: one more version of the patch for PR61360
Hi Vlad, Vladimir Makarov vmaka...@redhat.com writes: I guess we achieved the consensus about the following patch to fix PR61360 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was successfully bootstrapped and tested (w/wo -march=amdfam10) on x86/x86-64. Is it ok to commit to trunk? 2014-09-26 Vladimir Makarov vmaka...@redhat.com PR target/61360 * lra.c (lra): Remove call of recog_init. * recog.c (constrain_operands): Permit reg for memory constraint when LRA is used. * config/i386/i386.md (*floatSWI48:modeMODEF:mode2_sse): Enable first alternative independently on RA stage. Many thanks for doing this! I hadn't realised when seeing the bug originally that LRA made the recog.c part possible. Obviously I can't approve it, but this approach seems much cleaner to me. Richard
RE: [PATCH] Fix PR preprocessor/58893 access to uninitialized memory
Hmm, original massage bounced, resent, without html. From: bernd.edlin...@hotmail.de To: l...@redhat.com; gcc-patches@gcc.gnu.org CC: jos...@codesourcery.com Subject: RE: [PATCH] Fix PR preprocessor/58893 access to uninitialized memory Date: Sat, 27 Sep 2014 11:42:29 +0200 On Fri, 26 Sep 2014 12:48:44, Jeff Law wrote: On 09/26/14 06:21, Bernd Edlinger wrote: Hi, this patch fixes PR58893, which is an access to uninitialized memory, which may or may not crash in linemap_resolve_location, or just print error messages with bogus location. When the first -include file is processed we have the case, where pfile-cur_token == pfile-cur_run-base, this is directly called by the front end. However in the case of the second -include file, this is called from _cpp_lex_token - _cpp_get_fresh_line - cpp_push_include, with pfile-cur_token != pfile-cur_run-base, and pfile-cur_token[-1].src_loc and token not (yet) initialized. The problem is, when the include file cannot be found, we need src_loc to be initialized to some safe value: 0 means UNKNOWN_LOCATION. Regarding the hunk in cpp_diagnostic, which is not directly involved in this bug, but it is still obviously wrong: The line src_loc = pfile-cur_run-prev-limit-src_loc is probably unreachable, but will crash it is ever executed. see: _cpp_init_tokenrun (tokenrun *run, unsigned int count) { run-base = XNEWVEC (cpp_token, count); run-limit = run-base + count; run-next = NULL; } so, limit points at the end of the run. Boot-Strapped and Regression-tested on x86_64-linux-gnu Ok for trunk? Thanks Bernd. changelog-pr58893.txt 2014-09-26 Bernd Edlingerbernd.edlin...@hotmail.de PR preprocessor/58893 * errors.c (cpp_diagnostic): Fix possible out of bounds access. * files.c (_cpp_stack_include): Initialize src_loc for IT_CMDLINE. patch-pr58893.diff --- libcpp/errors.c 2014-01-02 23:24:45.0 +0100 +++ libcpp/errors.c 2014-09-24 10:30:33.708048505 +0200 @@ -48,10 +48,7 @@ cpp_diagnostic (cpp_reader * pfile, int current run -- that is invalid. */ else if (pfile-cur_token == pfile-cur_run-base) { - if (pfile-cur_run-prev != NULL) - src_loc = pfile-cur_run-prev-limit-src_loc; - else - src_loc = 0; + src_loc = 0; } else { --- libcpp/files.c 2014-05-21 20:54:12.0 +0200 +++ libcpp/files.c 2014-09-24 10:35:47.191117490 +0200 @@ -991,6 +991,9 @@ _cpp_stack_include (cpp_reader *pfile, c _cpp_file *file; bool stacked; + if (type == IT_CMDLINE pfile-cur_token != pfile-cur_run-base) + pfile-cur_token[-1].src_loc = 0; Comment before this change. Someone not familiar with this code is going to have no idea why these two lines exist. Ok, I added a comment now, do you like it? Please try to include a testcase. If you're having trouble reproducing on the trunk, you could use MALLOC_PERTURB per c#8 in the bug report. If there's a way to set environment variables in our testing framework that may be a reasonable way to test (if you need to do that, limit testing to linux targets as we'll have a dependency on glibc features). For whatever reason, the first -include must end with a pragma as in the PR, and MALLOC_PERTURB_ must be set to something. Then we get an ICE, otherwise we get an error message without line number. I tried to make this a valid test case, but that might be less trivial than it looks at first sight. I tried to set MALLOC_PERTURB_=123 globally, like this: MALLOC_PERTURB_=123 make -k check but then this happened: WARNING: program timed out. FAIL: gcc.c-torture/unsorted/dump-noaddr.c, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -dumpbase dump1/dump-noaddr.c -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr FAIL: gcc.c-torture/unsorted/dump-noaddr.c.000i.cgraph, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison FAIL: gcc.c-torture/unsorted/dump-noaddr.c.003t.original, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison FAIL: gcc.c-torture/unsorted/dump-noaddr.c.032t.profile_estimate, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison FAIL: gcc.c-torture/unsorted/dump-noaddr.c.253t.statistics, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison WARNING: program timed out. FAIL: gcc.c-torture/unsorted/dump-noaddr.c, -O3 -g -dumpbase dump1/dump-noaddr.c -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr FAIL: gcc.c-torture/unsorted/dump-noaddr.c.000i.cgraph, -O3 -g comparison FAIL: gcc.c-torture/unsorted/dump-noaddr.c.003t.original, -O3 -g comparison FAIL:
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 01:27 AM, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. The majority of the problem are groups of candidates that are built according to hash. The hash value is based on a number of arguments, number of BB, number of gimple statements and types of these statements. It groups function into classes. In WPA (before a body of any function is loaded) I get following histogram: Dump after WPA based types groups Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items Class size histogram [num of members]: number of classe number of classess [1]: 86453 classes [2]: 5680 classes [3]: 1541 classes [4]: 915 classes [5]: 446 classes [6]: 346 classes [7]: 200 classes [8]: 181 classes [9]: 154 classes [10]: 109 classes [11]: 87 classes [12]: 87 classes [13]: 68 classes [14]: 58 classes [15]: 58 classes [16]: 41 classes [17]: 25 classes [18]: 33 classes [19]: 28 classes [20]: 25 classes [21]: 19 classes [22]: 30 classes [23]: 24 classes [24]: 33 classes [25]: 17 classes [26]: 15 classes [27]: 10 classes [28]: 13 classes [29]: 18 classes [30]: 10 classes It means that each class with more than one member needs to be iterated and these functions are compared. And yes, there's the root of the problem. I have to load function body to process deep function comparison. As you can see, we have almost 200k function, where more than half each situated in a group with more that one member. So that 1GB extra memory usage is caused by these bodies: Init called for 105004 items (54.84%). Memory footprint can be significantly reduced if one can load the body and release it and the memory is freed. I asked Honza about it, but it looks GGC mechanism cannot be easily forced to release it. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. You are right, the number were quite promising, but during the time, I had to reduce the aggressivity of the pass. As Honza said, it can be improved step-by-step. One thing that Martin can try is to hook into lto-symtab and try to check that the COMDAT functions that are known to be same pass the equality check. I suppose we will learn interesting things this way. Good point, I will try it. Martin I think the patch adds quite important infrastructure for gimple semantic equality checking and function merging. I went through the majority of code and I think it is mostly ready to mainline (i.e. cleaner than what we have in tree-ssa-tailmerge) so hope we can finish the review process next week. We will need to get better cost/benefits ratio to enable it for -O2 that is someting I would really like to see for 5.0, but it seems to be easier to handle this incrementally Thank you for the review, Martin Honza
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 07:59 AM, Markus Trippelsdorf wrote: On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? (without ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall 403069 kB (12%) ggc phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall 2944210 kB (88%) ggc phase stream out: 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall 0 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall 0 kB ( 0%) ggc ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization: 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall 173701 kB ( 5%) ggc ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall 532704 kB (16%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall 2226088 kB (66%) ggc ipa lto decl out: 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall 14226 kB ( 0%) ggc ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall 364151 kB (11%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall8196 kB ( 0%) ggc ipa lto cgraph merge: 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall4938 kB ( 0%) ggc ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 92.91 4.29 151.73 3348693 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. (with ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall 1468975 kB (33%) ggc phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall 2944210 kB (67%) ggc phase stream out: 4.52 ( 4%) usr
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 09:47 AM, Markus Trippelsdorf wrote: On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote: It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. Forgot to post the binary size numbers (in bytes): | gold's icf off | gold's icf on | --+++ gcc's icf off |79793880|74881040| --+-+ gcc's icf on |78043608|73612800| --+++ Thanks once more! Gold ICF is quite strong, I will verify what functions are not caught by IPA ICF. These data present that IPA ICF can reduce the binary by 2.19%. I know that it's quite a small improvement, but if you realize that the pass can reduce just the size of .text (and slightly related sections). There are stats about libxul.so (please ignore last 3 columns): Section name Start Size in BSizePortion Disk read in B Disk read Sec. portion 0 0 0.00 B 0.00% 0 0.00 B 0.00% .note.gnu.build-i512 36 36.00 B 0.00% 0 0.00 B 0.00% .dynsym 552 8119279.29 KB 0.08% 0 0.00 B 0.00% .dynstr81744 9085988.73 KB 0.09% 0 0.00 B 0.00% .hash 172608 2175221.24 KB 0.02% 0 0.00 B 0.00% .gnu.version 1943606766 6.61 KB 0.01% 0 0.00 B 0.00% .gnu.version_d201128 56 56.00 B 0.00% 0 0.00 B 0.00% .gnu.version_r2011841216 1.19 KB 0.00% 0 0.00 B 0.00% .rela.dyn 202400 8198208 7.82 MB 8.56% 0 0.00 B 0.00% .rela.plt8400608 7027268.62 KB 0.07% 0 0.00 B 0.00% .init8470880 26 26.00 B 0.00% 0 0.00 B 0.00% .plt 8470912 4686445.77 KB 0.05% 0 0.00 B 0.00% .text85177763901433337.21 MB 40.72% 0 0.00 B 0.00% .fini 47532112 9 9.00 B 0.00% 0 0.00 B 0.00% .rodata 475322881525856014.55 MB 15.93% 0 0.00 B 0.00% .eh_frame 62790848 6203564 5.92 MB 6.47% 0 0.00 B 0.00% .eh_frame_hdr 68994412 1088012 1.04 MB 1.14% 0 0.00 B 0.00% .tbss 70082560 4 4.00 B 0.00% 0 0.00 B 0.00% .dynamic700825601104 1.08 KB 0.00% 0 0.00 B 0.00% .got700836641384 1.35 KB 0.00% 0 0.00 B 0.00% .got.plt70085048 2344822.90 KB 0.02% 0 0.00 B 0.00% .data 70108544 811616 792.59 KB 0.85% 0 0.00 B 0.00% .jcr70920160 8 8.00 B 0.00% 0 0.00 B 0.00% .tm_clone_table 70920168 0 0.00 B 0.00% 0 0.00 B 0.00% .fini_array 70920168 8 8.00 B 0.00% 0 0.00 B 0.00% .init_array 70920176 16 16.00 B 0.00% 0 0.00 B 0.00% .data.rel.ro.loca 70920192 3938880 3.76 MB 4.11% 0 0.00 B 0.00% .data.rel.ro74859072 269216 262.91 KB 0.28% 0 0.00 B 0.00% .bss75128320 1844246 1.76 MB 1.92% 0 0.00 B 0.00% .debug_line 75128288 517517.00 B 0.00% 0 0.00 B 0.00% .debug_info 75128805 817817.00 B 0.00% 0 0.00 B 0.00% .debug_abbrev 75129622 438438.00 B 0.00% 0
Re: [Ping] Port of VTV for Cygwin and MinGW
Hi Patrick, the mingw/cygwin part your patch looks fine to me. Nevertheless I have one question regarding to you. Do you have FSF papers for gcc already? As I asked an overseer and he didn't found you on the list. Regards, Kai
Re: [Patch, Fortran] Add CO_BROADCAST
The failures for the gfortran.dg/coarray_collectives_9.f90 are fixed with the following patch: --- ../_clean/gcc/testsuite/gfortran.dg/coarray_collectives_9.f90 2014-09-25 12:14:05.0 +0200 +++ gcc/testsuite/gfortran.dg/coarray_collectives_9.f90 2014-09-27 13:03:41.0 +0200 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options -fcoarray=single } +! { dg-options -fcoarray=single -fmax-errors=40 } ! ! ! CO_BROADCAST/CO_REDUCE @@ -29,7 +29,7 @@ program test call co_reduce(abc) ! { dg-error Missing actual argument 'operator' in call to 'co_reduce' } call co_broadcast(1, source_image=1) ! { dg-error 'a' argument of 'co_broadcast' intrinsic at .1. must be a variable } call co_reduce(a=1, operator=red_f) ! { dg-error 'a' argument of 'co_reduce' intrinsic at .1. must be a variable } - call co_reduce(a=val, operator=red_f2) ! { dg-error OPERATOR argument at (1) must be a PURE function } + call co_reduce(a=val, operator=red_f2) ! { dg-error OPERATOR argument at \\(1\\) must be a PURE function } call co_broadcast(val, source_image=[1,2]) ! { dg-error must be a scalar } call co_broadcast(val, source_image=1.0) ! { dg-error must be INTEGER } @@ -49,14 +49,14 @@ program test call co_reduce(val, red_f, stat=[1,2]) ! { dg-error must be a scalar } call co_reduce(val, red_f, stat=1.0) ! { dg-error must be INTEGER } call co_reduce(val, red_f, stat=1) ! { dg-error must be a variable } - call co_reduce(val, red_f, stat=i, result_image=1) ! OK - call co_reduce(val, red_f, stat=i, errmsg=errmsg, result_image=1) ! OK + call co_reduce(val, red_f, stat=i, result_image=1) ! { dg-error CO_REDUCE at \\(1\\) is not yet implemented } + call co_reduce(val, red_f, stat=i, errmsg=errmsg, result_image=1) ! { dg-error CO_REDUCE at \\(1\\) is not yet implemented } call co_reduce(val, red_f, stat=i, errmsg=[errmsg], result_image=1) ! { dg-error must be a scalar } call co_reduce(val, red_f, stat=i, errmsg=5, result_image=1) ! { dg-error must be CHARACTER } call co_reduce(val, red_f, errmsg=abc) ! { dg-error must be a variable } call co_reduce(val, red_f, stat=i8) ! { dg-error The stat= argument at .1. must be a kind=4 integer variable } call co_reduce(val, red_f, errmsg=msg4) ! { dg-error The errmsg= argument at .1. must be a default-kind character variable } - call co_broadcasr(vec(idx), 1) ! { dg-error Argument 'A' with INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_sum shall not have a vector subscript } - call co_reduce(vec([1,3,2]), red_f) ! { dg-error Argument 'A' with INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_min shall not have a vector subscript } + call co_broadcasr(vec(idx), 1) ! OK? + call co_reduce(vec([1,3,2]), red_f) ! { dg-error Argument 'A' with INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_reduce shall not have a vector subscript } end program test Am I missing something? Cheers, Dominique
Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching
The new tests fail on darwin: /opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: -mnop-mcount is not implemented for -fPIC and gcc.target/i386/record-mcount.c fails because mcount_loc is not found in the assembly. TIA Dominique
[PATCH] PR libitm/61164: use always_inline consistently
2014-09-27 Gleb Fotengauer-Malinovskiy gle...@altlinux.org libitm/ PR libitm/61164 * local_atomic (__always_inline): Add inline. (__calculate_memory_order): Remove inline. (atomic_thread_fence): Likewise. (atomic_signal_fence): Likewise. (atomic_flag_test_and_set_explicit): Likewise. (atomic_flag_clear_explicit): Likewise. (atomic_flag_test_and_set): Likewise. (atomic_flag_clear): Likewise. --- libitm/local_atomic | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libitm/local_atomic b/libitm/local_atomic index c3e079f..ae35ada 100644 --- a/libitm/local_atomic +++ b/libitm/local_atomic - inline __always_inline void + __always_inline void atomic_signal_fence(memory_order __m) noexcept { __atomic_thread_fence (__m); @@ -1544,38 +1544,38 @@ namespace std // _GLIBCXX_VISIBILITY(default) // Function definitions, atomic_flag operations. - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set_explicit(atomic_flag* __a, memory_order __m) noexcept { return __a-test_and_set(__m); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set_explicit(volatile atomic_flag* __a, memory_order __m) noexcept { return __a-test_and_set(__m); } - inline __always_inline void + __always_inline void atomic_flag_clear_explicit(atomic_flag* __a, memory_order __m) noexcept { __a-clear(__m); } - inline __always_inline void + __always_inline void atomic_flag_clear_explicit(volatile atomic_flag* __a, memory_order __m) noexcept { __a-clear(__m); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set(atomic_flag* __a) noexcept { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set(volatile atomic_flag* __a) noexcept { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } - inline __always_inline void + __always_inline void atomic_flag_clear(atomic_flag* __a) noexcept { atomic_flag_clear_explicit(__a, memory_order_seq_cst); } - inline __always_inline void + __always_inline void atomic_flag_clear(volatile atomic_flag* __a) noexcept { atomic_flag_clear_explicit(__a, memory_order_seq_cst); } -- glebfm
Re: [C++ PATCH] Fix -Wlogical-not-parentheses (PR c++/62199)
On Fri, 22 Aug 2014, Marc Glisse wrote: On Fri, 22 Aug 2014, Jason Merrill wrote: On 08/22/2014 03:24 PM, Marc Glisse wrote: Note that there is a patch waiting for a review that makes us accept !v for vector v: Ah, indeed. I still think we might as well treat vectors the same as other types here. Ok, now that it is a conscious choice, it feels much safer :-) Though depending on where exactly this is called, it would be funny if we warned for !v==-1 but not for !v==true, when the possible values for the elements of !v are actually {-1,0}. I guess I'll have to test after Marek commits. Sadly, this is exactly what is happening. With my patch, typedef int veci __attribute__ ((vector_size (4 * sizeof (int; void f (veci *a) { *a = !*a == -1; } warning: logical not is only applied to the left hand side of comparison I also get the warning if I replace -1 with 0 or with a vector, but not with true. This seems like the reverse of what is desirable. I don't see how to change that, the warning is super-early (it warns for templates that are not instantiated) and we may not know yet if the lhs is a vector. I guess people using vectors in such a strange construct can just always add parentheses, and it should be rare that anyone writes !vec==true and thus misses a useful warning. -- Marc Glisse
Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue
On 09/25/14 07:03, Chen Gang wrote: Need use VOID instead of SI, or when real VOIDmode comes, it does not match SImode, so cause issue. This patch can fix this issue and pass testsuite. The related test code ('void' will cause CALL instead of SET): typedef void (*T)(void); f1 () { ((T) 0)(); } The related error: [root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s f1 Analyzing compilation unit Performing interprocedural optimizations *free_lang_data visibility early_local_cleanups free-inline-summary whole-program inlineAssembling functions: f1 /tmp/calls.c: In function 'f1': /tmp/calls.c:5:1: error: unrecognizable insn: } ^ (call_insn 5 2 8 2 (parallel [ (call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) (void))0B] S4 A32]) (const_int 24 [0x18])) (clobber (reg:SI 15 r15)) ]) /tmp/calls.c:4 -1 (nil) (nil)) /tmp/calls.c:5:1: internal compiler error: in extract_insn, at recog.c:2204 0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:109 0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:117 0xac552b extract_insn(rtx_def*) ../../gcc/gcc/recog.c:2204 0x8b919e instantiate_virtual_regs_in_insn ../../gcc/gcc/function.c:1614 0x8ba347 instantiate_virtual_regs ../../gcc/gcc/function.c:1934 0x8ba452 execute ../../gcc/gcc/function.c:1983 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. Is this test case (or a similar one) in the gcc test suite? If not, can you please add it to the test suite. 2014-09-25 Chen Gang gang.chen.5...@gmail.com * config/microblaze/microblaze.md (call_internal1): Use VOID instead of SI to fix ((void (*)(void)) 0)() issue --- gcc/config/microblaze/microblaze.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/microblaze/microblaze.md b/gcc/config/microblaze/microblaze.md index b971737..3b4faf4 100644 --- a/gcc/config/microblaze/microblaze.md +++ b/gcc/config/microblaze/microblaze.md @@ -2062,7 +2062,7 @@ (set_attr length4)]) (define_insn call_internal1 - [(call (mem (match_operand:SI 0 call_insn_simple_operand ri)) + [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri)) (match_operand:SI 1 i)) (clobber (reg:SI R_SR))] I've verified that your patch does not cause any test suite regressions. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
[PATCH 2/2] Make -Q --help print param defaults and min/max values
From: Andi Kleen a...@linux.intel.com Make -Q --help print the --param default, min, max values, similar to how it does print the defaults for other flags. This is useful to let a option auto tuner automatically query all needed information abourt gcc params (previously it needed to access the .def file in the source) gcc/: 2014-09-26 Andi Kleen a...@linux.intel.com * opts.c (print_filtered_help): Print --param min/max/default with -Q. --- gcc/opts.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/opts.c b/gcc/opts.c index 0a49bc0..5cb5a39 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -953,6 +953,7 @@ print_filtered_help (unsigned int include_flags, const char *help; bool found = false; bool displayed = false; + char new_help[128]; if (include_flags == CL_PARAMS) { @@ -971,6 +972,15 @@ print_filtered_help (unsigned int include_flags, /* Get the translation. */ help = _(help); + if (!opts-x_quiet_flag) + { + snprintf (new_help, sizeof (new_help), + _(default %d minimum %d maximum %d), + compiler_params[i].default_value, + compiler_params[i].min_value, + compiler_params[i].max_value); + help = new_help; + } wrap_help (help, param, strlen (param), columns); } putchar ('\n'); @@ -985,7 +995,6 @@ print_filtered_help (unsigned int include_flags, for (i = 0; i cl_options_count; i++) { - char new_help[128]; const struct cl_option *option = cl_options + i; unsigned int len; const char *opt; -- 2.1.1
[PATCH 1/2] Remove -fshort-double
From: Andi Kleen a...@linux.intel.com -fshort-double has crashes the compiler since 4.6 (see PR60410) Since it's an obscure option that apparently nobody uses it the best way to fix it seems to just remove it. This prevents constant ICEs when running an gcc optimization flags autotuner. gcc/testsuite/: 2014-09-26 Andi Kleen a...@linux.intel.com PR target/60410 * gcc.dg/lto/pr55113_0.c: Remove. gcc/: 2014-09-26 Andi Kleen a...@linux.intel.com PR target/60410 * doc/invoke.texi: Remove -fshort-double. * lto-wrapper.c (merge_and_complain): Dito. (run_gcc): Dito. gcc/c-family/: 2014-09-26 Andi Kleen a...@linux.intel.com PR target/60410 * c-common.c (c_common_nodes_and_builtins): Remove -fshort-double. * c.opt: Dito. gcc/lto/: 2014-09-26 Andi Kleen a...@linux.intel.com PR target/60410 * lto-lang.c (lto_init): Remove -fshort-double. --- gcc/c-family/c-common.c | 2 +- gcc/c-family/c.opt | 4 gcc/doc/invoke.texi | 10 +- gcc/lto-wrapper.c| 3 --- gcc/lto/lto-lang.c | 2 +- gcc/testsuite/gcc.dg/lto/pr55113_0.c | 14 -- 6 files changed, 3 insertions(+), 32 deletions(-) delete mode 100644 gcc/testsuite/gcc.dg/lto/pr55113_0.c diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index a9e0191..7a529a2 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -5325,7 +5325,7 @@ c_common_nodes_and_builtins (void) tree va_list_ref_type_node; tree va_list_arg_type_node; - build_common_tree_nodes (flag_signed_char, flag_short_double); + build_common_tree_nodes (flag_signed_char, false); /* Define `int' and `char' first so that dbx will output them first. */ record_builtin_type (RID_INT, NULL, integer_type_node); diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 72ac2ed..d6a9698 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1251,10 +1251,6 @@ frtti C++ ObjC++ Optimization Var(flag_rtti) Init(1) Generate run time type descriptor information -fshort-double -C ObjC C++ ObjC++ LTO Optimization Var(flag_short_double) -Use the same size for double as for float - fshort-enums C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums) Use the narrowest integer type possible for enumeration types diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0c3f4be..b2b667d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1094,7 +1094,7 @@ See S/390 and zSeries Options. -fno-jump-tables @gol -frecord-gcc-switches @gol -freg-struct-return -fshort-enums @gol --fshort-double -fshort-wchar @gol +-fshort-wchar @gol -fverbose-asm -fpack-struct[=@var{n}] -fstack-check @gol -fstack-limit-register=@var{reg} -fstack-limit-symbol=@var{sym} @gol -fno-stack-limit -fsplit-stack @gol @@ -22598,14 +22598,6 @@ is equivalent to the smallest integer type that has enough room. code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface. -@item -fshort-double -@opindex fshort-double -Use the same size for @code{double} as for @code{float}. - -@strong{Warning:} the @option{-fshort-double} switch causes GCC to generate -code that is not binary compatible with code generated without that switch. -Use it to conform to a non-default application binary interface. - @item -fshort-wchar @opindex fshort-wchar Override the underlying type for @samp{wchar_t} to be @samp{short diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 08fd090..a2ce79c 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -275,7 +275,6 @@ merge_and_complain (struct cl_decoded_option **decoded_options, case OPT_freg_struct_return: case OPT_fpcc_struct_return: - case OPT_fshort_double: for (j = 0; j *decoded_options_count; ++j) if ((*decoded_options)[j].opt_index == foption-opt_index) break; @@ -500,7 +499,6 @@ run_gcc (unsigned argc, char *argv[]) case OPT_fgnu_tm: case OPT_freg_struct_return: case OPT_fpcc_struct_return: - case OPT_fshort_double: case OPT_ffp_contract_: case OPT_fwrapv: case OPT_ftrapv: @@ -573,7 +571,6 @@ run_gcc (unsigned argc, char *argv[]) case OPT_freg_struct_return: case OPT_fpcc_struct_return: - case OPT_fshort_double: /* Ignore these, they are determined by the input files. ??? We fail to diagnose a possible mismatch here. */ continue; diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c index 9e8524a..57b4f71 100644 --- a/gcc/lto/lto-lang.c +++ b/gcc/lto/lto-lang.c @@ -1165,7 +1165,7 @@ lto_init (void) flag_generate_lto = (flag_wpa != NULL); /* Create the basic integer types. */ - build_common_tree_nodes (flag_signed_char, flag_short_double); +
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Thanks for the explaination. I have changed the loop_depth into a short interger hoping that we can save some memory :-) Attached please find the updated patch. Bootstrapped and reg-tested on x86_64-suse-linux. Please do a final revew once the assignment is ready. As for the new list walking interface, I choose the function no_equiv and tried the checked cast way. The bad news is that GCC failed to bootstrap with the following change: Index: ira.c === --- ira.c (revision 215536) +++ ira.c (working copy) @@ -3242,12 +3242,12 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE void *data ATTRIBUTE_UNUSED) { int regno; - rtx list; + rtx_insn_list *list; if (!REG_P (reg)) return; regno = REGNO (reg); - list = reg_equiv[regno].init_insns; + list = as_a rtx_insn_list * (reg_equiv[regno].init_insns); if (list == const0_rtx) return; reg_equiv[regno].init_insns = const0_rtx; @@ -3258,9 +3258,9 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE return; ira_reg_equiv[regno].defined_p = false; ira_reg_equiv[regno].init_insns = NULL; - for (; list; list = XEXP (list, 1)) + for (; list; list = list-next ()) { - rtx insn = XEXP (list, 0); + rtx_insn *insn = list-insn (); remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); } } Error message: ... checking for suffix of object files... configure: error: in `/home/yangfei/gcc-devel/build/x86_64-unknown-linux-gnu/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. make[2]: *** [configure-stage1-target-libgcc] Error 1 make[2]: Leaving directory `/home/yangfei/gcc-devel/build' make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/home/yangfei/gcc-devel/build' make: *** [all] Error 2 I think the code change is OK. Anything I missed? Cheers, Felix On Sat, Sep 27, 2014 at 5:03 AM, Jeff Law l...@redhat.com wrote: On 09/26/14 07:57, Felix Yang wrote: Hi Jeff, Thanks for the suggestions. I updated the patch accordingly. 1. Both my employer(Huawei) and I have signed the copyright assignments with FSF. These assignments are already sent via post two days ago and hopefully should reach FSF in one week. Maybe it's OK to commit this patch now? Not really. It needs to be accepted by the FSF before we can include the work. 2. I am not turning member loop_depth of struct equivalence into short integer as GCC API such as bb_loop_depth returns a loop's depth as a 32-bit interger. There's already other places that assume loops don't nest that deep. Please go ahead and change it. And no need to explicitly mark the unused bits. That's just a maintenance nightmare in the long term anyway :-) 3. I find it's kind of difficult to use the new type and interfaces for list walking the init_insns list for this patch. The type of init_insns list is rtx, not rtl_insn_list *. Seems we need to change a lot in order to use the new interface. Not clear about the reason why it is not adjusted when we are transferring to the new interface. Anyway, I think it's better to have another patch fix that issue. OK? The right way to go is to add a checked cast when we have some code that is using the old interface and other code using the new interface. It's actually a pretty easy change. The checked casts effectively mark the limits of where we've been able to push the RTL typesafety work. Long term as we push the typesafety work further into the compiler many/most of the checked casts will go away. Unfortunately, that won't work in this case because other code wants to store a (const0_rtx) into the insn list. (const0_rtx) isn't an INSN, so the checked cast fails and we get a nice abort/ICE. Conceptually we just need another marker that is an INSN and we might as well just convert the whole file to use the new interface at that point. Consider the request pulled. The const0-rtx problem may be why this wasn't converted in the first palce. Or it may simply have been a time problem. David's done 250 patches around RTL typesafety, but he also has other work to be doing ;-) 4. This bug is only reproduceable with my local customized GCC version. So I don't have a testcase then. OK. I'll do a final review when I get notice about the copyright assignment from the FSF. jeff Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 215658) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,14 @@ +2014-09-26 Felix Yang felix.y...@huawei.com + Jeff Law l...@redhat.com + + * ira.c (struct equivalence): Change member is_arg_equivalence and replace + into boolean bitfields; turn member loop_depth into a short integer; add new + member no_equiv
Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching
On Sat, Sep 27, 2014 at 01:21:29PM +0200, Dominique Dhumieres wrote: The new tests fail on darwin: /opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: -mnop-mcount is not implemented for -fPIC and gcc.target/i386/record-mcount.c fails because mcount_loc is not found in the assembly. Sorry. Here's a patch. I'll install it as obvious unless someone complains. Also I hope it's the last mcount patch for now :-) -Andi diff --git a/gcc/testsuite/gcc.target/i386/nop-mcount.c b/gcc/testsuite/gcc.target/i386/nop-mcount.c index 2592231..942cae0 100644 --- a/gcc/testsuite/gcc.target/i386/nop-mcount.c +++ b/gcc/testsuite/gcc.target/i386/nop-mcount.c @@ -1,5 +1,5 @@ /* Test -mnop-mcount */ -/* { dg-do compile } */ +/* { dg-do compile { target *-*-linux* } } */ /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */ /* { dg-final { scan-assembler-not __fentry__ } } */ /* Origin: Andi Kleen */ diff --git a/gcc/testsuite/gcc.target/i386/record-mcount.c b/gcc/testsuite/gcc.target/i386/record-mcount.c index dae413e..26b0dbc 100644 --- a/gcc/testsuite/gcc.target/i386/record-mcount.c +++ b/gcc/testsuite/gcc.target/i386/record-mcount.c @@ -1,5 +1,5 @@ /* Test -mrecord-mcount */ -/* { dg-do compile } */ +/* { dg-do compile { target *-*-linux* } } */ /* { dg-options -pg -mrecord-mcount } */ /* { dg-final { scan-assembler mcount_loc } } */ /* Origin: Andi Kleen */
Re: [RFC/PATCH] Fix-it hints
On 25 September 2014 11:34, Dodji Seketeli do...@redhat.com wrote: When the caret line is disabled with -fno-diagnostics-show-caret, the fix-it hint is printed as: gcc/testsuite/g++.dg/template/crash83.C:5:21: error: an explicit specialization must be preceded by 'template ' gcc/testsuite/g++.dg/template/crash83.C:5:21: fixit: template The latter form may allow an IDE (such as emacs) to automatically apply the fix. Nice. Is the fixit: prefix used by other compilers too? Or are there variations from compiler to compiler? Clang seems to do: fix-it:t.cpp:{7:25-7:29}:Gamma http://clang.llvm.org/docs/UsersManual.html#cmdoption-fdiagnostics-parseable-fixits where the location is a half-open range (insertions are marked as {7:25-7:25}). However, this is not the standard GNU format for ranges, and last time I tried to update that (to support multiple ranges) RMS was not in favour of adding the Clang flavour. Quoting the filename also does not seem to be what GNU mandates. No idea why they use {} around the location range and not sure if they support just 7:25, which seems nicer. Also, I don't know who are the consumers of Clang's format that GCC will be interested in supporting. No idea about other compilers that support fix-it hints. Comments? Currently, fix-it hints are limited to insertions at one single location, whereas Clang allows insertions, deletions, and replacements at arbitrary location ranges. Do you have example of each of these kinds of fix-it hints? (deletions, replacement at location ranges). I think it'd be nice to have an idea of what needs to be done, even if we are not doing it in extenso right now. I think in the case of deletions, they just underline the text to be deleted: typename3.C:3:7: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] const const long long long int x; ^~ fix-it:typename3.C:{3:7-3:13}: In the case of replacements, there does not seem to be any visible difference with an insertion: typename3.C:4:33: note: change this ',' to a ';' to call 'f' const const long long long int x, ^ ; fix-it:typename3.C:{4:33-4:34}:; Cheers, Manuel.
Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching
Le 27 sept. 2014 à 18:45, Dominique d'Humières domi...@lps.ens.fr a écrit : I think the patch for gcc.target/i386/nop-mcount.c should be --- ../_clean/gcc/testsuite/gcc.target/i386/nop-mcount.c 2014-09-26 23:29:45.0 +0200 +++ gcc/testsuite/gcc.target/i386/nop-mcount.c2014-09-27 18:43:40.0 +0200 @@ -1,5 +1,5 @@ /* Test -mnop-mcount */ -/* { dg-do compile } */ +/* { dg-do compile { target { *-*-linux* nonpic } } } */ /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */ /* { dg-final { scan-assembler-not __fentry__ } } */ /* Origin: Andi Kleen */ Dominique Le 27 sept. 2014 à 17:35, Andi Kleen a...@firstfloor.org a écrit : On Sat, Sep 27, 2014 at 01:21:29PM +0200, Dominique Dhumieres wrote: The new tests fail on darwin: /opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: -mnop-mcount is not implemented for -fPIC and gcc.target/i386/record-mcount.c fails because mcount_loc is not found in the assembly. Sorry. Here's a patch. I'll install it as obvious unless someone complains. Also I hope it's the last mcount patch for now :-) -Andi diff --git a/gcc/testsuite/gcc.target/i386/nop-mcount.c b/gcc/testsuite/gcc.target/i386/nop-mcount.c index 2592231..942cae0 100644 --- a/gcc/testsuite/gcc.target/i386/nop-mcount.c +++ b/gcc/testsuite/gcc.target/i386/nop-mcount.c @@ -1,5 +1,5 @@ /* Test -mnop-mcount */ -/* { dg-do compile } */ +/* { dg-do compile { target *-*-linux* } } */ /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */ /* { dg-final { scan-assembler-not __fentry__ } } */ /* Origin: Andi Kleen */ diff --git a/gcc/testsuite/gcc.target/i386/record-mcount.c b/gcc/testsuite/gcc.target/i386/record-mcount.c index dae413e..26b0dbc 100644 --- a/gcc/testsuite/gcc.target/i386/record-mcount.c +++ b/gcc/testsuite/gcc.target/i386/record-mcount.c @@ -1,5 +1,5 @@ /* Test -mrecord-mcount */ -/* { dg-do compile } */ +/* { dg-do compile { target *-*-linux* } } */ /* { dg-options -pg -mrecord-mcount } */ /* { dg-final { scan-assembler mcount_loc } } */ /* Origin: Andi Kleen */
Re: [PATCH v2] PR libitm/61164: use always_inline consistently
2014-09-27 Gleb Fotengauer-Malinovskiy gle...@altlinux.org libitm/ PR libitm/61164 * local_atomic (__always_inline): Add inline. (__calculate_memory_order): Remove inline. (atomic_thread_fence): Likewise. (atomic_signal_fence): Likewise. (atomic_flag_test_and_set_explicit): Likewise. (atomic_flag_clear_explicit): Likewise. (atomic_flag_test_and_set): Likewise. (atomic_flag_clear): Likewise. --- Sorry, previous patch is incomplete. libitm/local_atomic | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libitm/local_atomic b/libitm/local_atomic index c3e079f..ae35ada 100644 --- a/libitm/local_atomic +++ b/libitm/local_atomic - inline __always_inline void + __always_inline void atomic_signal_fence(memory_order __m) noexcept { __atomic_thread_fence (__m); @@ -1544,38 +1544,38 @@ namespace std // _GLIBCXX_VISIBILITY(default) // Function definitions, atomic_flag operations. - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set_explicit(atomic_flag* __a, memory_order __m) noexcept { return __a-test_and_set(__m); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set_explicit(volatile atomic_flag* __a, memory_order __m) noexcept { return __a-test_and_set(__m); } - inline __always_inline void + __always_inline void atomic_flag_clear_explicit(atomic_flag* __a, memory_order __m) noexcept { __a-clear(__m); } - inline __always_inline void + __always_inline void atomic_flag_clear_explicit(volatile atomic_flag* __a, memory_order __m) noexcept { __a-clear(__m); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set(atomic_flag* __a) noexcept { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } - inline __always_inline bool + __always_inline bool atomic_flag_test_and_set(volatile atomic_flag* __a) noexcept { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } - inline __always_inline void + __always_inline void atomic_flag_clear(atomic_flag* __a) noexcept { atomic_flag_clear_explicit(__a, memory_order_seq_cst); } - inline __always_inline void + __always_inline void atomic_flag_clear(volatile atomic_flag* __a) noexcept { atomic_flag_clear_explicit(__a, memory_order_seq_cst); } -- glebfm pgp7Xa9S2hOTe.pgp Description: PGP signature
Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching
On Sat, Sep 27, 2014 at 06:45:21PM +0200, Dominique d'Humières wrote: I think the patch for gcc.target/i386/nop-mcount.c should be True. Thanks. -Andi
[PATCH 2/2] Remove x86 cmpstrnsi
From: Andi Kleen a...@linux.intel.com In my tests the optimized glibc out of line strcmp is always faster than using inline rep ; cmpsb, even for small strings. The Intel optimization manual also recommends to not use it. So remove the cmpstrnsi instruction. Tested on Sandy Bridge, Westmere Intel CPUs. gcc/: 2014-09-27 Andi Kleen a...@linux.intel.com * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders. --- gcc/config/i386/i386.md | 85 - 1 file changed, 85 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 98df8e1..1d2f1a5 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16097,91 +16097,6 @@ (const_string *))) (set_attr mode QI)]) -(define_expand cmpstrnsi - [(set (match_operand:SI 0 register_operand) - (compare:SI (match_operand:BLK 1 general_operand) - (match_operand:BLK 2 general_operand))) - (use (match_operand 3 general_operand)) - (use (match_operand 4 immediate_operand))] - -{ - rtx addr1, addr2, out, outlow, count, countreg, align; - - if (optimize_insn_for_size_p () !TARGET_INLINE_ALL_STRINGOPS) -FAIL; - - /* Can't use this if the user has appropriated ecx, esi or edi. */ - if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG]) -FAIL; - - out = operands[0]; - if (!REG_P (out)) -out = gen_reg_rtx (SImode); - - addr1 = copy_addr_to_reg (XEXP (operands[1], 0)); - addr2 = copy_addr_to_reg (XEXP (operands[2], 0)); - if (addr1 != XEXP (operands[1], 0)) -operands[1] = replace_equiv_address_nv (operands[1], addr1); - if (addr2 != XEXP (operands[2], 0)) -operands[2] = replace_equiv_address_nv (operands[2], addr2); - - count = operands[3]; - countreg = ix86_zero_extend_to_Pmode (count); - - /* %%% Iff we are testing strict equality, we can use known alignment - to good advantage. This may be possible with combine, particularly - once cc0 is dead. */ - align = operands[4]; - - if (CONST_INT_P (count)) -{ - if (INTVAL (count) == 0) - { - emit_move_insn (operands[0], const0_rtx); - DONE; - } - emit_insn (gen_cmpstrnqi_nz_1 (addr1, addr2, countreg, align, -operands[1], operands[2])); -} - else -{ - rtx (*gen_cmp) (rtx, rtx); - - gen_cmp = (TARGET_64BIT -? gen_cmpdi_1 : gen_cmpsi_1); - - emit_insn (gen_cmp (countreg, countreg)); - emit_insn (gen_cmpstrnqi_1 (addr1, addr2, countreg, align, - operands[1], operands[2])); -} - - outlow = gen_lowpart (QImode, out); - emit_insn (gen_cmpintqi (outlow)); - emit_move_insn (out, gen_rtx_SIGN_EXTEND (SImode, outlow)); - - if (operands[0] != out) -emit_move_insn (operands[0], out); - - DONE; -}) - -;; Produce a tri-state integer (-1, 0, 1) from condition codes. - -(define_expand cmpintqi - [(set (match_dup 1) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_dup 2) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (parallel [(set (match_operand:QI 0 register_operand) - (minus:QI (match_dup 1) -(match_dup 2))) - (clobber (reg:CC FLAGS_REG))])] - -{ - operands[1] = gen_reg_rtx (QImode); - operands[2] = gen_reg_rtx (QImode); -}) - ;; memcmp recognizers. The `cmpsb' opcode does nothing if the count is ;; zero. Emit extra code to make sure that a zero-length compare is EQ. -- 2.1.1
[PATCH 1/2] Remove i386 cmpstrnsi peephole
From: Andi Kleen a...@linux.intel.com The peephole that removes the code to compute a tristate for cmpstrnsi when only a boolean jump is needed never triggers in my tests. Just remove it. gcc/: 2014-09-27 Andi Kleen a...@linux.intel.com * config/i386/i386.md: Remove peepholes for cmpstrn*. --- gcc/config/i386/i386.md | 77 - 1 file changed, 77 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 004302d..98df8e1 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16297,83 +16297,6 @@ (const_string 0) (const_string *))) (set_attr prefix_rep 1)]) - -;; Peephole optimizations to clean up after cmpstrn*. This should be -;; handled in combine, but it is not currently up to the task. -;; When used for their truth value, the cmpstrn* expanders generate -;; code like this: -;; -;; repz cmpsb -;; seta %al -;; setb %dl -;; cmpb %al, %dl -;; jcc label -;; -;; The intermediate three instructions are unnecessary. - -;; This one handles cmpstrn*_nz_1... -(define_peephole2 - [(parallel[ - (set (reg:CC FLAGS_REG) - (compare:CC (mem:BLK (match_operand 4 register_operand)) - (mem:BLK (match_operand 5 register_operand - (use (match_operand 6 register_operand)) - (use (match_operand:SI 3 immediate_operand)) - (clobber (match_operand 0 register_operand)) - (clobber (match_operand 1 register_operand)) - (clobber (match_operand 2 register_operand))]) - (set (match_operand:QI 7 register_operand) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_operand:QI 8 register_operand) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (reg FLAGS_REG) - (compare (match_dup 7) (match_dup 8))) - ] - peep2_reg_dead_p (4, operands[7]) peep2_reg_dead_p (4, operands[8]) - [(parallel[ - (set (reg:CC FLAGS_REG) - (compare:CC (mem:BLK (match_dup 4)) - (mem:BLK (match_dup 5 - (use (match_dup 6)) - (use (match_dup 3)) - (clobber (match_dup 0)) - (clobber (match_dup 1)) - (clobber (match_dup 2))])]) - -;; ...and this one handles cmpstrn*_1. -(define_peephole2 - [(parallel[ - (set (reg:CC FLAGS_REG) - (if_then_else:CC (ne (match_operand 6 register_operand) - (const_int 0)) - (compare:CC (mem:BLK (match_operand 4 register_operand)) - (mem:BLK (match_operand 5 register_operand))) - (const_int 0))) - (use (match_operand:SI 3 immediate_operand)) - (use (reg:CC FLAGS_REG)) - (clobber (match_operand 0 register_operand)) - (clobber (match_operand 1 register_operand)) - (clobber (match_operand 2 register_operand))]) - (set (match_operand:QI 7 register_operand) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_operand:QI 8 register_operand) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (reg FLAGS_REG) - (compare (match_dup 7) (match_dup 8))) - ] - peep2_reg_dead_p (4, operands[7]) peep2_reg_dead_p (4, operands[8]) - [(parallel[ - (set (reg:CC FLAGS_REG) - (if_then_else:CC (ne (match_dup 6) - (const_int 0)) - (compare:CC (mem:BLK (match_dup 4)) - (mem:BLK (match_dup 5))) - (const_int 0))) - (use (match_dup 3)) - (use (reg:CC FLAGS_REG)) - (clobber (match_dup 0)) - (clobber (match_dup 1)) - (clobber (match_dup 2))])]) ;; Conditional move instructions. -- 2.1.1
[PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
Hello, This patch enables the streaming of LTO bytecode, needed by offload target, using existing LTO infrastructure. It creates new prefix for the section names (.gnu.target_lto_) and streams out the functions and variables with omp declare target attribute, including the functions for outlined '#pragma omp target' regions. The offload compiler (under ifdef ACCEL_COMPILER) reads and compiles these new sections. But I have doubts regarding the offload_lto_mode switch. Why I added it: The outlined target regions (say omp_fn0) contains references from the parent functions. And that's correct for the case when we stream out the host-side version of omp_fn0. But for the target version there are no parent functions, node-used_from_other_partition gets incorrect value (always 1), and offload compiler crashes on streaming in. Another solution is to remain referenced_from_other_partition_p and reachable_from_other_partition_p unchanged, then used_from_other_partition will have incorrect value for target regions, but the offload compiler will just ignore it. Which approach is better? Anyway, now it's bootstrapped and regtested on i686-linux and x86_64-linux. 2014-09-27 Ilya Verbin ilya.ver...@intel.com Ilya Tocar ilya.to...@intel.com Andrey Turetskiy andrey.turets...@intel.com Bernd Schmidt ber...@codesourcery.com gcc/ * cgraph.h (symtab_node): Add need_dump flag. * cgraphunit.c: Include lto-section-names.h. (initialize_offload): New function. (ipa_passes): Initialize offload and call ipa_write_summaries if there is something to write to OMP_SECTION_NAME_PREFIX sections. (symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp. * ipa-inline-analysis.c (inline_generate_summary): Do not exit under flag_openmp. (inline_free_summary): Always remove hooks. * lto-cgraph.c (lto_set_symtab_encoder_in_partition): Exit if there is no need to encode the node. (referenced_from_other_partition_p, reachable_from_other_partition_p): Ignore references from non-target functions to target functions if we are streaming out target-side bytecode (offload lto mode). (select_what_to_dump): New function. * lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define. (section_name_prefix): Declare. * lto-streamer.c (offload_lto_mode): New variable. (section_name_prefix): New variable. (lto_get_section_name): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. * lto-streamer.h (select_what_to_dump): Declare. (offload_lto_mode): Declare. * omp-low.c (is_targetreg_ctx): New function. (create_omp_child_function, check_omp_nesting_restrictions): Use it. (expand_omp_target): Set mark_force_output for the target functions. (lower_omp_critical): Add target attribute for omp critical symbol. * passes.c (ipa_write_summaries): Call select_what_to_dump. gcc/lto/ * lto-object.c (lto_obj_add_section): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. * lto-partition.c (add_symbol_to_partition_1): Always set node-need_dump to true. (lto_promote_cross_file_statics): Call select_what_to_dump. * lto.c (lto_section_with_id): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. (read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if being built as an offload compiler. Thanks, -- Ilya --- diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 7481906..9ab970d 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -444,6 +444,11 @@ public: /* Set when init priority is set. */ unsigned in_init_priority_hash : 1; + /* Set when symbol needs to be dumped into LTO bytecode for LTO, + or in pragma omp target case, for separate compilation targeting + a different architecture. */ + unsigned need_dump : 1; + /* Ordering of all symtab entries. */ int order; diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index b854e4b..4ab4c57 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -211,6 +211,7 @@ along with GCC; see the file COPYING3. If not see #include tree-nested.h #include gimplify.h #include dbgcnt.h +#include lto-section-names.h /* Queue of cgraph nodes scheduled to be added into cgraph. This is a secondary queue used during optimization to accommodate passes that @@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder) free (nodes); } +/* Check whether there is at least one function or global variable to offload. + */ + +static bool +initialize_offload (void) +{ + bool have_offload = false; + struct cgraph_node *node; + struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl))) + { + have_offload = true; + break; + } + +
Re: [RFC/PATCH] More precise diagnostic locations: dynamic locations for columns vs explicit offset
On 25 September 2014 13:39, Dodji Seketeli do...@redhat.com wrote: 2) In the Fortran FE, which gives quite precise location information by tracking the characters that it wants to warn about instead of relying on the line-map machinery. So with this feature, the Fortran FE would then use the then more generic diagnostics machinery, right? This is the plan. The benefits will be more shared code, deleting a lot of duplicated stuff in the Fortran FE and supporting in Fortran all the goodies of GCC (#pragmas, color, options printing, macro unwinder). However, the Fortran FE still has a long way to go to make use of all the features of the common diagnostics machinery.. On the bright side, the common diagnostics machinery already supports all features of the Fortran FE except for offset locations, multiple locations (and multiple carets), and buffered diagnostics (well, the diagnostics machinery does buffer, but there is no API to clear the buffer without printing it). I think a bunch of FE diagnostic calls could already use the common machinery (some are already using it). Unfortunately, most Fortran diagnostics use offset locations because the FE does not track the locations of tokens with line-maps (I think the locations are computed but not stored or passed down to the diagnostic functions). The work to be done is not even technically difficult. The lack of progress on this is due, as always, to lack of time by the people currently working on GCC and/or lack of new contributors. Cheers, Manuel.
Re: [PATCH 2/2] Remove x86 cmpstrnsi
On Sat, 2014-09-27 at 11:10 -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com In my tests the optimized glibc out of line strcmp is always faster than using inline rep ; cmpsb, even for small strings. The Intel optimization manual also recommends to not use it. So remove the cmpstrnsi instruction. Tested on Sandy Bridge, Westmere Intel CPUs. gcc/: 2014-09-27 Andi Kleena...@linux.intel.com * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders. This has been mentioned a while ago, e.g. https://gcc.gnu.org/ml/gcc/2002-10/msg01616.html https://gcc.gnu.org/ml/gcc/2003-04/msg00166.html Instead of just completely removing it, how about disabling it for newer CPU types if not optimizing for size? Cheers, Oleg
Re: [PATCH 2/2] Remove x86 cmpstrnsi
On Sat, Sep 27, 2014 at 08:45:18PM +0200, Oleg Endo wrote: On Sat, 2014-09-27 at 11:10 -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com In my tests the optimized glibc out of line strcmp is always faster than using inline rep ; cmpsb, even for small strings. The Intel optimization manual also recommends to not use it. So remove the cmpstrnsi instruction. Tested on Sandy Bridge, Westmere Intel CPUs. gcc/: 2014-09-27 Andi Kleen a...@linux.intel.com * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders. This has been mentioned a while ago, e.g. https://gcc.gnu.org/ml/gcc/2002-10/msg01616.html https://gcc.gnu.org/ml/gcc/2003-04/msg00166.html Instead of just completely removing it, how about disabling it for newer CPU types if not optimizing for size? I believe it was slow even on old CPUs. But back then glibc may have been even slower. Not sure it is worth keeping it for -Os, especially given that parts of it have already bitrotted. -Andi
Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue
I guess it is not in our test case, or after this patch, the new should get a little better result than the old (at present, they are same). I shall try to add related case into testsuite within this month. Thanks Send from Lenovo A788t. Michael Eager ea...@eagerm.com wrote: On 09/25/14 07:03, Chen Gang wrote: Need use VOID instead of SI, or when real VOIDmode comes, it does not match SImode, so cause issue. This patch can fix this issue and pass testsuite. The related test code ('void' will cause CALL instead of SET): typedef void (*T)(void); f1 () { ((T) 0)(); } The related error: [root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s f1 Analyzing compilation unit Performing interprocedural optimizations *free_lang_data visibility early_local_cleanups free-inline-summary whole-program inlineAssembling functions: f1 /tmp/calls.c: In function 'f1': /tmp/calls.c:5:1: error: unrecognizable insn: } ^ (call_insn 5 2 8 2 (parallel [ (call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) (void))0B] S4 A32]) (const_int 24 [0x18])) (clobber (reg:SI 15 r15)) ]) /tmp/calls.c:4 -1 (nil) (nil)) /tmp/calls.c:5:1: internal compiler error: in extract_insn, at recog.c:2204 0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:109 0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:117 0xac552b extract_insn(rtx_def*) ../../gcc/gcc/recog.c:2204 0x8b919e instantiate_virtual_regs_in_insn ../../gcc/gcc/function.c:1614 0x8ba347 instantiate_virtual_regs ../../gcc/gcc/function.c:1934 0x8ba452 execute ../../gcc/gcc/function.c:1983 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. Is this test case (or a similar one) in the gcc test suite? If not, can you please add it to the test suite. 2014-09-25 Chen Gang gang.chen.5...@gmail.com * config/microblaze/microblaze.md (call_internal1): Use VOID instead of SI to fix ((void (*)(void)) 0)() issue --- gcc/config/microblaze/microblaze.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/microblaze/microblaze.md b/gcc/config/microblaze/microblaze.md index b971737..3b4faf4 100644 --- a/gcc/config/microblaze/microblaze.md +++ b/gcc/config/microblaze/microblaze.md @@ -2062,7 +2062,7 @@ (set_attr length 4)]) (define_insn call_internal1 - [(call (mem (match_operand:SI 0 call_insn_simple_operand ri)) + [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri)) (match_operand:SI 1 i)) (clobber (reg:SI R_SR))] I've verified that your patch does not cause any test suite regressions. -- Michael Eager ea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [PATCH][AArch64] LR register not used in leaf functions
On 23/09/14 01:58, Jiong Wang wrote: On 22/09/14 16:43, Kugan wrote: AArch64 has the same issue ARM had where the LR register was not used in leaf functions. This was reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this test-case need to be added with more live ranges for the need for the LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for the case AArch64 to see this. The same fix (from the thread https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went into ARM should apply to AArch64 as well. Regression tested on qemu for aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? This still be a partial fix. LR should be a caller-saved register free to use in case it's saved properly to across function call. Indeed. This should be improved from the generic code. Right now, if a hard register is used in EPILOGUE_USES, it conflicts with all the live ranges till a call site kills. I think we should have this patch till the generic code can be improved. Thanks, Kugan I had a very similar patch to this sitting in my local tree and under various benchmark analysis. -- Jiong Thanks, Kugan gcc/ChangeLog: 2014-09-23 Kugan Vivekanandarajah kug...@linaro.org * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after epilogue_completed is true.
Re: [PATCH] combine: Allow substituting the target reg of a clobber
On Mon, Sep 22, 2014 at 04:20:12PM -0600, Jeff Law wrote: Can you add a testcase which shows the 3-insn combination from PR62151 applying? I've tried to make a stable future-proof testcase that does such a three-insn combination. Not easy at all. But now it dawns on me: do you just want the actual testcase from the PR? (Well, fixed so that it is valid C, I suppose). With a test that combine does its job, of course? Not sure how to test that, but maybe I'll learn. Or is a test showing the testcase working after the change good enough? Segher
Re: [PATCH 3/5] IPA ICF pass
Hi. Thank you Markus for presenting numbers, it corresponds with I measured. If I see correctly, IPA ICF pass takes about 7 seconds, the rest is distributed in verifier (not interesting for release version of the compiler) and 'phase opt and generate'. No idea what can make the difference? phase opt and generate just combine all the optimization times together, so it is same 7 seconds as in the ICF pass :) 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you have any idea how many of those turns out to be different? It would be nice to be able to release the duplicate bodies from memory after the equivalency was stablished Honza Martin
Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue
And excuse me, I am not quite familiar with adding testsuite, so during I am trying, welcome any related ideas, suggestions or completions. Thanks. On 9/28/14 3:08, Chen Gang wrote: I guess it is not in our test case, or after this patch, the new should get a little better result than the old (at present, they are same). I shall try to add related case into testsuite within this month. Thanks Send from Lenovo A788t. Michael Eager ea...@eagerm.com wrote: On 09/25/14 07:03, Chen Gang wrote: Need use VOID instead of SI, or when real VOIDmode comes, it does not match SImode, so cause issue. This patch can fix this issue and pass testsuite. The related test code ('void' will cause CALL instead of SET): typedef void (*T)(void); f1 () { ((T) 0)(); } The related error: [root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s f1 Analyzing compilation unit Performing interprocedural optimizations *free_lang_data visibility early_local_cleanups free-inline-summary whole-program inlineAssembling functions: f1 /tmp/calls.c: In function 'f1': /tmp/calls.c:5:1: error: unrecognizable insn: } ^ (call_insn 5 2 8 2 (parallel [ (call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) (void))0B] S4 A32]) (const_int 24 [0x18])) (clobber (reg:SI 15 r15)) ]) /tmp/calls.c:4 -1 (nil) (nil)) /tmp/calls.c:5:1: internal compiler error: in extract_insn, at recog.c:2204 0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:109 0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:117 0xac552b extract_insn(rtx_def*) ../../gcc/gcc/recog.c:2204 0x8b919e instantiate_virtual_regs_in_insn ../../gcc/gcc/function.c:1614 0x8ba347 instantiate_virtual_regs ../../gcc/gcc/function.c:1934 0x8ba452 execute ../../gcc/gcc/function.c:1983 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. Is this test case (or a similar one) in the gcc test suite? If not, can you please add it to the test suite. 2014-09-25 Chen Gang gang.chen.5...@gmail.com * config/microblaze/microblaze.md (call_internal1): Use VOID instead of SI to fix ((void (*)(void)) 0)() issue --- gcc/config/microblaze/microblaze.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/microblaze/microblaze.md b/gcc/config/microblaze/microblaze.md index b971737..3b4faf4 100644 --- a/gcc/config/microblaze/microblaze.md +++ b/gcc/config/microblaze/microblaze.md @@ -2062,7 +2062,7 @@ (set_attr length 4)]) (define_insn call_internal1 - [(call (mem (match_operand:SI 0 call_insn_simple_operand ri)) + [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri)) (match_operand:SI 1 i)) (clobber (reg:SI R_SR))] I've verified that your patch does not cause any test suite regressions. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed