Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt wrote: > On 01/05/2014 12:35 PM, Felix Yang wrote: >> >> Ping? >> Cheers, > > > I have a different patch which I'll submit next week after some more > testing. The assert in cfgrtl is unnecessarily broad and really only needs > to trigger if -freorder-blocks-and-partition; there's nothing wrong with > entering cfglayout after normal bb-reorder. Currently -freorder-blocks-and-partition is the default for x86. I assume that hw-doloop is not enabled for any i386 targets, which is why we haven't seen this? And will this mean that -freorder-blocks-and-partition cannot be used for the targets that use hw-doloop? If so, should -freorder-blocks-and-partition be prevented with a warning for those targets? > > I've also tested that Blackfin still benefits from the hw-doloop reordering > code and generates more hardware loops if it's enabled. So we want to be > able to run it at -O2. I looked at hw-doloop briefly and since it seems to be doing some manual bb reordering I guess it can't simply be moved before bbro. It seems like a better long-term solution would be to make bbro hw-doloop-aware as Felix suggested earlier. Teresa > > > Bernd > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch] libgcov.c re-factoring
On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka wrote: >> 2014-01-03 Rong Xu >> >> * gcc/gcov-io.c (gcov_var): Move from gcov-io.h. >> (gcov_position): Ditto. >> (gcov_is_error): Ditto. >> (gcov_rewrite): Ditto. >> * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov >> only part to libgcc/libgcov.h. >> * libgcc/libgcov-driver.c: Use libgcov.h. >> (buffer_fn_data): Use xmalloc instead of malloc. >> (gcov_exit_merge_gcda): Ditto. >> * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto. >> * libgcc/libgcov.h: New common header files for libgcov-*.h. >> * libgcc/libgcov-interface.c: Use libgcov.h >> * libgcc/libgcov-merge.c: Ditto. >> * libgcc/libgcov-profiler.c: Ditto. >> * libgcc/Makefile.in: Add dependence to libgcov.h > > OK, with the licence changes and... >> >> Index: gcc/gcov-io.c >> === >> --- gcc/gcov-io.c (revision 206100) >> +++ gcc/gcov-io.c (working copy) >> @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns >> static void gcov_allocate (unsigned); >> #endif >> >> +/* Optimum number of gcov_unsigned_t's read from or written to disk. */ >> +#define GCOV_BLOCK_SIZE (1 << 10) >> + >> +GCOV_LINKAGE struct gcov_var >> +{ >> + FILE *file; >> + gcov_position_t start; /* Position of first byte of block */ >> + unsigned offset; /* Read/write position within the block. */ >> + unsigned length; /* Read limit in the block. */ >> + unsigned overread; /* Number of words overread. */ >> + int error; /* < 0 overflow, > 0 disk error. */ >> + int mode;/* < 0 writing, > 0 reading */ >> +#if IN_LIBGCOV >> + /* Holds one block plus 4 bytes, thus all coverage reads & writes >> + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE >> + to and from the disk. libgcov never backtracks and only writes 4 >> + or 8 byte objects. */ >> + gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; >> +#else >> + int endian; /* Swap endianness. */ >> + /* Holds a variable length block, as the compiler can write >> + strings and needs to backtrack. */ >> + size_t alloc; >> + gcov_unsigned_t *buffer; >> +#endif >> +} gcov_var; >> + >> +/* Save the current position in the gcov file. */ >> +static inline gcov_position_t >> +gcov_position (void) >> +{ >> + gcc_assert (gcov_var.mode > 0); >> + return gcov_var.start + gcov_var.offset; >> +} >> + >> +/* Return nonzero if the error flag is set. */ >> +static inline int >> +gcov_is_error (void) >> +{ >> + return gcov_var.file ? gcov_var.error : 1; >> +} >> + >> +#if IN_LIBGCOV >> +/* Move to beginning of file and initialize for writing. */ >> +GCOV_LINKAGE inline void >> +gcov_rewrite (void) >> +{ >> + gcc_assert (gcov_var.mode > 0); > > I would turn those two asserts into checking asserts so they do not > bloat the runtime lib. Ok, but note that there are a number of other gcc_assert already in gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should I go ahead and change all of them in gcov-io.c? Thanks, Teresa > > Thanks, > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch] libgcov.c re-factoring
On Mon, Jan 6, 2014 at 9:49 AM, Teresa Johnson wrote: > On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka wrote: >>> 2014-01-03 Rong Xu >>> >>> * gcc/gcov-io.c (gcov_var): Move from gcov-io.h. >>> (gcov_position): Ditto. >>> (gcov_is_error): Ditto. >>> (gcov_rewrite): Ditto. >>> * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov >>> only part to libgcc/libgcov.h. >>> * libgcc/libgcov-driver.c: Use libgcov.h. >>> (buffer_fn_data): Use xmalloc instead of malloc. >>> (gcov_exit_merge_gcda): Ditto. >>> * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto. >>> * libgcc/libgcov.h: New common header files for libgcov-*.h. >>> * libgcc/libgcov-interface.c: Use libgcov.h >>> * libgcc/libgcov-merge.c: Ditto. >>> * libgcc/libgcov-profiler.c: Ditto. >>> * libgcc/Makefile.in: Add dependence to libgcov.h >> >> OK, with the licence changes and... >>> >>> Index: gcc/gcov-io.c >>> === >>> --- gcc/gcov-io.c (revision 206100) >>> +++ gcc/gcov-io.c (working copy) >>> @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns >>> static void gcov_allocate (unsigned); >>> #endif >>> >>> +/* Optimum number of gcov_unsigned_t's read from or written to disk. */ >>> +#define GCOV_BLOCK_SIZE (1 << 10) >>> + >>> +GCOV_LINKAGE struct gcov_var >>> +{ >>> + FILE *file; >>> + gcov_position_t start; /* Position of first byte of block */ >>> + unsigned offset; /* Read/write position within the block. */ >>> + unsigned length; /* Read limit in the block. */ >>> + unsigned overread; /* Number of words overread. */ >>> + int error; /* < 0 overflow, > 0 disk error. */ >>> + int mode;/* < 0 writing, > 0 reading */ >>> +#if IN_LIBGCOV >>> + /* Holds one block plus 4 bytes, thus all coverage reads & writes >>> + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE >>> + to and from the disk. libgcov never backtracks and only writes 4 >>> + or 8 byte objects. */ >>> + gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; >>> +#else >>> + int endian; /* Swap endianness. */ >>> + /* Holds a variable length block, as the compiler can write >>> + strings and needs to backtrack. */ >>> + size_t alloc; >>> + gcov_unsigned_t *buffer; >>> +#endif >>> +} gcov_var; >>> + >>> +/* Save the current position in the gcov file. */ >>> +static inline gcov_position_t >>> +gcov_position (void) >>> +{ >>> + gcc_assert (gcov_var.mode > 0); >>> + return gcov_var.start + gcov_var.offset; >>> +} >>> + >>> +/* Return nonzero if the error flag is set. */ >>> +static inline int >>> +gcov_is_error (void) >>> +{ >>> + return gcov_var.file ? gcov_var.error : 1; >>> +} >>> + >>> +#if IN_LIBGCOV >>> +/* Move to beginning of file and initialize for writing. */ >>> +GCOV_LINKAGE inline void >>> +gcov_rewrite (void) >>> +{ >>> + gcc_assert (gcov_var.mode > 0); >> >> I would turn those two asserts into checking asserts so they do not >> bloat the runtime lib. > > Ok, but note that there are a number of other gcc_assert already in > gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should > I go ahead and change all of them in gcov-io.c? Actually, I tried changing these two, but gcc_checking_assert is undefined in libgcov.a. Ok to commit without this change? Teresa > > Thanks, > Teresa > >> >> Thanks, >> Honza > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch] libgcov.c re-factoring
On Wed, Jan 8, 2014 at 6:34 AM, Jan Hubicka wrote: >> >> Actually, I tried changing these two, but gcc_checking_assert is >> undefined in libgcov.a. Ok to commit without this change? > > OK. > incrementally can you please define gcov_nonruntime_assert that will wind into > gcc_assert for code within gcc/coverage tools and into nothing for libgcov > runtime and we can change those offenders to that. Ok, committed as r206435. Will send the assert patch in a follow-up later this week. Teresa > > Honza >> >> Teresa >> >> > >> > Thanks, >> > Teresa >> > >> >> >> >> Thanks, >> >> Honza >> > >> > >> > >> > -- >> > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
On Tue, Jan 7, 2014 at 8:07 AM, Bernd Schmidt wrote: > On 01/05/2014 05:10 PM, Teresa Johnson wrote: >> >> On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt >> wrote: >>> >>> I have a different patch which I'll submit next week after some more >>> testing. The assert in cfgrtl is unnecessarily broad and really only >>> needs >>> to trigger if -freorder-blocks-and-partition; there's nothing wrong with >>> entering cfglayout after normal bb-reorder. >> >> >> Currently -freorder-blocks-and-partition is the default for x86. I >> assume that hw-doloop is not enabled for any i386 targets, which is >> why we haven't seen this? > > > Precisely. > > >> And will this mean that -freorder-blocks-and-partition cannot be used >> for the targets that use hw-doloop? If so, should >> -freorder-blocks-and-partition be prevented with a warning for those >> targets? > > > If someone explicitly chooses that option we can turn off the reordering in > hw-doloop. That should happen sufficiently rarely that it isn't a problem. > That's what the patch below does - bootstraped on x86_64-linux, tested there > and with bfin-elf. Ok? Ok, looks good to me. > > >>> I've also tested that Blackfin still benefits from the hw-doloop >>> reordering >>> code and generates more hardware loops if it's enabled. So we want to be >>> able to run it at -O2. >> >> >> I looked at hw-doloop briefly and since it seems to be doing some >> manual bb reordering I guess it can't simply be moved before bbro. It >> seems like a better long-term solution would be to make bbro >> hw-doloop-aware as Felix suggested earlier. > > > Maybe. It could be argued that the code in hw-doloop is relevant only for a > small class of targets so it should only be enabled for them. In any case, > that's not stage 3 material and two ports are broken... Ok, that makes sense. Thanks, Teresa > > > Bernd > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Patch] Avoid gcc_assert in libgcov
As suggested by Honza, avoid bloating libgcov from gcc_assert by using a new macro gcov_nonruntime_assert in gcov-io.c that is only mapped to gcc_assert when not in libgcov. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2014-01-09 Teresa Johnson * gcov-io.c (gcov_position): Use gcov_nonruntime_assert. (gcov_is_error): Ditto. (gcov_rewrite): Ditto. (gcov_open): Ditto. (gcov_write_words): Ditto. (gcov_write_length): Ditto. (gcov_read_words): Ditto. (gcov_read_summary): Ditto. (gcov_sync): Ditto. (gcov_seek): Ditto. (gcov_histo_index): Ditto. (static void gcov_histogram_merge): Ditto. (compute_working_sets): Ditto. * gcov-io.h (gcov_nonruntime_assert): Define. Index: gcov-io.c === --- gcov-io.c (revision 206435) +++ gcov-io.c (working copy) @@ -67,7 +67,7 @@ GCOV_LINKAGE struct gcov_var static inline gcov_position_t gcov_position (void) { - gcc_assert (gcov_var.mode > 0); + gcov_nonruntime_assert (gcov_var.mode > 0); return gcov_var.start + gcov_var.offset; } @@ -83,7 +83,7 @@ gcov_is_error (void) GCOV_LINKAGE inline void gcov_rewrite (void) { - gcc_assert (gcov_var.mode > 0); + gcov_nonruntime_assert (gcov_var.mode > 0); gcov_var.mode = -1; gcov_var.start = 0; gcov_var.offset = 0; @@ -133,7 +133,7 @@ gcov_open (const char *name, int mode) s_flock.l_pid = getpid (); #endif - gcc_assert (!gcov_var.file); + gcov_nonruntime_assert (!gcov_var.file); gcov_var.start = 0; gcov_var.offset = gcov_var.length = 0; gcov_var.overread = -1u; @@ -291,14 +291,14 @@ gcov_write_words (unsigned words) { gcov_unsigned_t *result; - gcc_assert (gcov_var.mode < 0); + gcov_nonruntime_assert (gcov_var.mode < 0); #if IN_LIBGCOV if (gcov_var.offset >= GCOV_BLOCK_SIZE) { gcov_write_block (GCOV_BLOCK_SIZE); if (gcov_var.offset) { - gcc_assert (gcov_var.offset == 1); + gcov_nonruntime_assert (gcov_var.offset == 1); memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4); } } @@ -393,9 +393,9 @@ gcov_write_length (gcov_position_t position) gcov_unsigned_t length; gcov_unsigned_t *buffer; - gcc_assert (gcov_var.mode < 0); - gcc_assert (position + 2 <= gcov_var.start + gcov_var.offset); - gcc_assert (position >= gcov_var.start); + gcov_nonruntime_assert (gcov_var.mode < 0); + gcov_nonruntime_assert (position + 2 <= gcov_var.start + gcov_var.offset); + gcov_nonruntime_assert (position >= gcov_var.start); offset = position - gcov_var.start; length = gcov_var.offset - offset - 2; buffer = (gcov_unsigned_t *) &gcov_var.buffer[offset]; @@ -481,14 +481,14 @@ gcov_read_words (unsigned words) const gcov_unsigned_t *result; unsigned excess = gcov_var.length - gcov_var.offset; - gcc_assert (gcov_var.mode > 0); + gcov_nonruntime_assert (gcov_var.mode > 0); if (excess < words) { gcov_var.start += gcov_var.offset; #if IN_LIBGCOV if (excess) { - gcc_assert (excess == 1); + gcov_nonruntime_assert (excess == 1); memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4); } #else @@ -497,7 +497,7 @@ gcov_read_words (unsigned words) gcov_var.offset = 0; gcov_var.length = excess; #if IN_LIBGCOV - gcc_assert (!gcov_var.length || gcov_var.length == 1); + gcov_nonruntime_assert (!gcov_var.length || gcov_var.length == 1); excess = GCOV_BLOCK_SIZE; #else if (gcov_var.length + words > gcov_var.alloc) @@ -614,7 +614,7 @@ gcov_read_summary (struct gcov_summary *summary) while (!cur_bitvector) { h_ix = bv_ix * 32; - gcc_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE); + gcov_nonruntime_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE); cur_bitvector = histo_bitvector[bv_ix++]; } while (!(cur_bitvector & 0x1)) @@ -622,7 +622,7 @@ gcov_read_summary (struct gcov_summary *summary) h_ix++; cur_bitvector >>= 1; } - gcc_assert (h_ix < GCOV_HISTOGRAM_SIZE); + gcov_nonruntime_assert (h_ix < GCOV_HISTOGRAM_SIZE); csum->histogram[h_ix].num_counters = gcov_read_unsigned (); csum->histogram[h_ix].min_value = gcov_read_counter (); @@ -642,7 +642,7 @@ gcov_read_summary (struct gcov_summary *summary) GCOV_LINKAGE void gcov_sync (gcov_position_t base, gcov_unsigned_t length) { - gcc_assert (gcov_var.mode > 0); + gcov_nonruntime_assert (gcov_var.mode > 0); base += length; if (base - gcov_var.start <= gcov_var.length) gcov_var.offset = base - gcov_var.start; @@ -661,7 +661,7 @@ gcov_sync (gcov_position_t base, gcov_unsig
Re: [Patch] Avoid gcc_assert in libgcov
On Thu, Jan 9, 2014 at 6:56 AM, Jan Hubicka wrote: >> As suggested by Honza, avoid bloating libgcov from gcc_assert by using >> a new macro gcov_nonruntime_assert in gcov-io.c that is only mapped to >> gcc_assert when not in libgcov. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2014-01-09 Teresa Johnson >> >> * gcov-io.c (gcov_position): Use gcov_nonruntime_assert. >> (gcov_is_error): Ditto. >> (gcov_rewrite): Ditto. >> (gcov_open): Ditto. >> (gcov_write_words): Ditto. >> (gcov_write_length): Ditto. >> (gcov_read_words): Ditto. >> (gcov_read_summary): Ditto. >> (gcov_sync): Ditto. >> (gcov_seek): Ditto. >> (gcov_histo_index): Ditto. >> (static void gcov_histogram_merge): Ditto. >> (compute_working_sets): Ditto. >> * gcov-io.h (gcov_nonruntime_assert): Define. >> > >> @@ -481,14 +481,14 @@ gcov_read_words (unsigned words) >>const gcov_unsigned_t *result; >>unsigned excess = gcov_var.length - gcov_var.offset; >> >> - gcc_assert (gcov_var.mode > 0); >> + gcov_nonruntime_assert (gcov_var.mode > 0); >>if (excess < words) >> { >>gcov_var.start += gcov_var.offset; >> #if IN_LIBGCOV >>if (excess) >> { >> - gcc_assert (excess == 1); >> + gcov_nonruntime_assert (excess == 1); > > It probably makes no sense to put nonruntime access into IN_LIBGCOV defines. You are right - there were several that were in IN_LIBGCOV defines that I can just remove. > >> memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4); >> } >> #else >> @@ -497,7 +497,7 @@ gcov_read_words (unsigned words) >>gcov_var.offset = 0; >>gcov_var.length = excess; >> #if IN_LIBGCOV >> - gcc_assert (!gcov_var.length || gcov_var.length == 1); >> + gcov_nonruntime_assert (!gcov_var.length || gcov_var.length == 1); >>excess = GCOV_BLOCK_SIZE; >> #else >>if (gcov_var.length + words > gcov_var.alloc) >> @@ -614,7 +614,7 @@ gcov_read_summary (struct gcov_summary *summary) >>while (!cur_bitvector) >> { >>h_ix = bv_ix * 32; >> - gcc_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE); >> + gcov_nonruntime_assert (bv_ix < >> GCOV_HISTOGRAM_BITVECTOR_SIZE); >>cur_bitvector = histo_bitvector[bv_ix++]; >> } >>while (!(cur_bitvector & 0x1)) >> @@ -622,7 +622,7 @@ gcov_read_summary (struct gcov_summary *summary) >>h_ix++; >>cur_bitvector >>= 1; >> } >> - gcc_assert (h_ix < GCOV_HISTOGRAM_SIZE); >> + gcov_nonruntime_assert (h_ix < GCOV_HISTOGRAM_SIZE); > > How many of those asserts can be triggered by a corrupted gcda file? > I would like to make libgcov more safe WRT file corruptions, too, so in that > case we should produce an error message. In that case should we call gcov_error when IN_LIBGCOV? One possibility would be to simply make gcov_nonruntime_assert be defined as if (!EXPR) gcov_error in the IN_LIBGCOV case. But I think what you wanted here was to reduce libgcov bloat by removing calls altogether, which this wouldn't solve. But if we want to call gcov_error in some cases, I think I need to add another macro that will either do gcc_assert when !IN_LIBGCOV and "if (!EXPR) gcov_error" when IN_LIBGCOV. Is that what you had in mind? Thanks, Teresa > > The rest of changes seems OK. > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead
This patch unsets -freorder-blocks-and-partition when -fprofile-use is not specified. Function splitting was not actually being performed in that case, as probably_never_executed_bb_p does not distinguish any basic blocks as being cold vs hot when there is no profile data. Leaving it enabled, however, causes the assembly code generator to create (empty) cold sections and labels, leading to unnecessary size overhead. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2015-09-24 Teresa Johnson * opts.c (finish_options): Unset -freorder-blocks-and-partition if not using profile. Index: opts.c === --- opts.c (revision 228062) +++ opts.c (working copy) @@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g opts->x_flag_reorder_blocks = 1; } + /* Disable -freorder-blocks-and-partition when -fprofile-use is not in + effect. Function splitting was not actually being performed in that case, + as probably_never_executed_bb_p does not distinguish any basic blocks as + being cold vs hot when there is no profile data. Leaving it enabled, + however, causes the assembly code generator to create (empty) cold + sections and labels, leading to unnecessary size overhead. */ if (opts->x_flag_reorder_blocks_and_partition + && !opts_set->x_flag_profile_use) +opts->x_flag_reorder_blocks_and_partition = 0; + + if (opts->x_flag_reorder_blocks_and_partition && !opts_set->x_flag_reorder_functions) opts->x_flag_reorder_functions = 1; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead
On Fri, Sep 25, 2015 at 6:58 AM, Bernd Schmidt wrote: > On 09/24/2015 07:16 PM, Teresa Johnson wrote: >> >> This patch unsets -freorder-blocks-and-partition when -fprofile-use >> is not specified. Function splitting was not actually being performed >> in that case, as probably_never_executed_bb_p does not distinguish >> any basic blocks as being cold vs hot when there is no profile data. >> >> Leaving it enabled, however, causes the assembly code generator to create >> (empty) cold sections and labels, leading to unnecessary size overhead. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2015-09-24 Teresa Johnson >> >> * opts.c (finish_options): Unset -freorder-blocks-and-partition >> if not using profile. > > > Hmm, I'd noticed I was getting that enabled by default. It looks like you > added this default with: > > 2013-11-19 Teresa Johnson > >* common/config/i386/i386-common.c: Enable >-freorder-blocks-and-partition at -O2 and up for x86. >* doc/invoke.texi: Update -freorder-blocks-and-partition default. >* opts.c (finish_options): Only warn if -freorder-blocks-and- >partition was set on command line. > > (Note that this ChangeLog entry should have mentioned > ix86_option_optimization_table as the variable that was changed). Yeah, looks like I accidentally left the function name out of the i386-common.c entry. I can fix the ChangeLog entry. > > What has changed between then and now? Also, do we not use > estimates/heuristics when not using a profile? Nothing has changed - splitting effectively never kicked in without a profile. Honza and I had discussed the idea that static profile heuristics could eventually be used to distinguish hot vs cold bbs, but that hasn't happened. What I didn't notice until recently was the size increase in the .o files from varasm adding in unnecessary sections and labels when this option was on. Unless and until static heuristics are used to distinguish cold bbs in probably_never_executed_bb_p, I don't think it makes sense to do anything finer grained that just disabling the option. Teresa > > > Bernd -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead
On Fri, Sep 25, 2015 at 7:34 AM, Jan Hubicka wrote: >> > What has changed between then and now? Also, do we not use >> > estimates/heuristics when not using a profile? >> >> Nothing has changed - splitting effectively never kicked in without a >> profile. Honza and I had discussed the idea that static profile >> heuristics could eventually be used to distinguish hot vs cold bbs, > > Yep, the basic idea was to move EH clenaups to the cold section for start. > The > cleanup code can get surprisingly large for some C++ apps. > >> but that hasn't happened. What I didn't notice until recently was the >> size increase in the .o files from varasm adding in unnecessary >> sections and labels when this option was on. Unless and until static > > Perhaps we also may just teach varasm to not output those when function is not > split. I am happy with the patch as it is because it is pointless to run the > code when no splitting happens. Right, that will need to happen if splitting is tuned for static frequencies. For now I committed this patch. I also fixed the old change log entry that Bernd pointed out. Thanks, Teresa > > Honza >> heuristics are used to distinguish cold bbs in >> probably_never_executed_bb_p, I don't think it makes sense to do >> anything finer grained that just disabling the option. >> >> Teresa >> >> > >> > >> > Bernd >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead
Woops, we crossed wires. I just committed this patch. Would you like me to revert it? Teresa On Fri, Sep 25, 2015 at 9:57 AM, Andi Kleen wrote: > Teresa Johnson writes: > >> This patch unsets -freorder-blocks-and-partition when -fprofile-use >> is not specified. Function splitting was not actually being performed >> in that case, as probably_never_executed_bb_p does not distinguish >> any basic blocks as being cold vs hot when there is no profile data. > > Actually I'm experimenting with a patch to fix that by allowing > function splitting even without profile feed back. See PR66890 > which has the patch. I would prefer to keep and fix it. > > -Andi > > > -- > a...@linux.intel.com -- Speaking for myself only -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead
This patch backports r228136 from trunk to Google/4_9. The only difference is that I have added the option disabling to our Google-specific routine for removing profile options when the specified profile gcda file is not found. Passes regression tests. Ok for google/4_9? Thanks, Teresa 2015-09-25 Teresa Johnson Google ref b/24265250 * opts.c (finish_options): Unset -freorder-blocks-and-partition if not using profile. (set_profile_use_options): Unset -freorder-blocks-and-partition if input profile not found and it wasn't explicitly set. Index: opts.c === --- opts.c (revision 228063) +++ opts.c (working copy) @@ -787,6 +787,16 @@ finish_options (struct gcc_options *opts, struct g } if (opts->x_flag_reorder_blocks_and_partition + && !opts_set->x_flag_profile_use) +opts->x_flag_reorder_blocks_and_partition = 0; + + /* Disable -freorder-blocks-and-partition when -fprofile-use is not in + effect. Function splitting was not actually being performed in that case, + as probably_never_executed_bb_p does not distinguish any basic blocks as + being cold vs hot when there is no profile data. Leaving it enabled, + however, causes the assembly code generator to create (empty) cold + sections and labels, leading to unnecessary size overhead. */ + if (opts->x_flag_reorder_blocks_and_partition && !opts_set->x_flag_reorder_functions) opts->x_flag_reorder_functions = 1; @@ -1365,6 +1375,8 @@ set_profile_use_options (struct gcc_options *opts, (PARAM_MAX_INLINE_INSNS_AUTO, default_param_value (PARAM_MAX_INLINE_INSNS_AUTO), opts->x_param_values, opts_set->x_param_values); + if (!opts_set->x_flag_reorder_blocks_and_partition) +opts->x_flag_reorder_blocks_and_partition = false; } if (!opts_set->x_flag_branch_probabilities || reset) -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Reduce max-vartrack-size
Reduce the maximum variable tracking size by 20% to avoid extreme compilation times. Ok for google-4_9? 2015-10-13 Teresa Johnson Google ref b/24569916 * params.def (PARAM_MAX_VARTRACK_SIZE): Reduce default to 40M. Index: params.def === --- params.def (revision 228063) +++ params.def (working copy) @@ -1160,7 +1160,7 @@ DEFPARAM (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO, DEFPARAM (PARAM_MAX_VARTRACK_SIZE, "max-vartrack-size", "Max. size of var tracking hash tables", - 5000, 0, 0) + 4000, 0, 0) /* Set maximum recursion depth for var tracking expression expansion and resolution. */ -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH]Add -fprofile-use option for check_effective_target_freorder.
Looks good to me, but I don't have approval rights. Thanks for fixing, Teresa On Mon, Oct 26, 2015 at 2:00 AM, Renlin Li wrote: > Hi all, > > After r228136, flag_reorder_blocks_and_partition is canceled when > -fprofile-use is not specified. > > In this case check_effective_target_freorder() is not able to check the > proper target support. > This is a simple patch to add "-fprofile-use" option that effective target > check. > > Okay to commit on the trunk? > > Regards, > Renlin Li > > gcc/testsuite/ChangeLog: > > 2015-10-26 Renlin Li > > * lib/target-supports.exp (check_effective_target_freorder): Add > -fprofile-use flag. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Remove overly-aggressive LIPO assert
Remove an assert that was overly-strict and already partially redundant with an immediately prior assert. In this case we had a hidden visibility function clone that was created after the LIPO link due to indirect call promotion. It is a cgraph_is_aux_decl_external node. Fixes failures and passes regression tests. Ok for Google branch? 2015-12-02 Teresa Johnson Google ref b/25925223. * l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict assert. Index: l-ipo.c === --- l-ipo.c (revision 231131) +++ l-ipo.c (working copy) @@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d gcc_assert (DECL_EXTERNAL (decl) || cgraph_is_aux_decl_external (n) || DECL_VIRTUAL_P (decl)); - gcc_assert (/* This is the case for explicit extern instantiation, - when cgraph node is not created before link. */ - DECL_EXTERNAL (decl)); cgraph_link_node (n); return n; } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Remove overly-aggressive LIPO assert
Ping. Thanks, Teresa On Wed, Dec 2, 2015 at 12:46 PM, Teresa Johnson wrote: > Remove an assert that was overly-strict and already partially redundant > with an immediately prior assert. In this case we had a hidden visibility > function clone that was created after the LIPO link due to indirect call > promotion. It is a cgraph_is_aux_decl_external node. > > Fixes failures and passes regression tests. Ok for Google branch? > > 2015-12-02 Teresa Johnson > > Google ref b/25925223. > * l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict > assert. > > Index: l-ipo.c > === > --- l-ipo.c (revision 231131) > +++ l-ipo.c (working copy) > @@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d >gcc_assert (DECL_EXTERNAL (decl) >|| cgraph_is_aux_decl_external (n) >|| DECL_VIRTUAL_P (decl)); > - gcc_assert (/* This is the case for explicit extern > instantiation, > - when cgraph node is not created before link. */ > - DECL_EXTERNAL (decl)); > cgraph_link_node (n); >return n; > } > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute
On Fri, Aug 8, 2014 at 3:22 PM, Yi Yang wrote: > Friendly ping. Sorry, was OOO. The solution of preventing splitting for named sections is good - but it looks like there is already code that should prevent this. See the user_defined_section_attribute check here - why is that not set? Looks like it should be set in handle_section_attribute() in c-family/c-common.c. Teresa > > > On Wed, Aug 6, 2014 at 5:20 PM, Dehao Chen wrote: >> >> OK for google-4_8 and google-4_9. David and Teresa may have further >> comments. >> >> Dehao >> >> On Wed, Aug 6, 2014 at 3:36 PM, Yi Yang wrote: >> > This currently puts split sections together again in the specified >> > section and breaks DWARF output. This patch disables the partitioning >> > for such functions. >> > >> > -- >> > >> > 2014-08-06 Yi Yang >> > >> > gcc: >> > * bb-reorder.c (gate_handle_partition_blocks): Add a check for >> > "section" >> > attribute. >> > >> > diff --git gcc/bb-reorder.c gcc/bb-reorder.c >> > index fa6f62f..09449c6 100644 >> > --- gcc/bb-reorder.c >> > +++ gcc/bb-reorder.c >> > @@ -2555,6 +2555,7 @@ gate_handle_partition_blocks (void) >> > we are going to omit the reordering. */ >> >&& optimize_function_for_speed_p (cfun) >> >&& !DECL_ONE_ONLY (current_function_decl) >> > + && !DECL_SECTION_NAME (current_function_decl) >> >&& !user_defined_section_attribute); >> > } > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute
Ok, thanks. This seems reasonable. Can you send the patch to trunk as well? Teresa On Mon, Aug 11, 2014 at 12:35 PM, Yi Yang wrote: > Patch v2 > > .. > > diff --git gcc/bb-reorder.c gcc/bb-reorder.c > index fa6f62f..a1b3e65 100644 > --- gcc/bb-reorder.c > +++ gcc/bb-reorder.c > @@ -95,7 +95,6 @@ > #include "expr.h" > #include "params.h" > #include "diagnostic-core.h" > -#include "toplev.h" /* user_defined_section_attribute */ > #include "tree-pass.h" > #include "df.h" > #include "bb-reorder.h" > @@ -2555,7 +2554,7 @@ gate_handle_partition_blocks (void) > we are going to omit the reordering. */ >&& optimize_function_for_speed_p (cfun) >&& !DECL_ONE_ONLY (current_function_decl) > - && !user_defined_section_attribute); > + && !DECL_SECTION_NAME (current_function_decl)); > } > > /* This function is the main 'entrance' for the optimization that > diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c > index 65c25bf..9923928 100644 > --- gcc/c-family/c-common.c > +++ gcc/c-family/c-common.c > @@ -7378,8 +7378,6 @@ handle_section_attribute (tree *node, tree > ARG_UNUSED (name), tree args, > >if (targetm_common.have_named_sections) > { > - user_defined_section_attribute = true; > - >if ((TREE_CODE (decl) == FUNCTION_DECL > || TREE_CODE (decl) == VAR_DECL) >&& TREE_CODE (TREE_VALUE (args)) == STRING_CST) > diff --git gcc/final.c gcc/final.c > index 9af0b2b..38c90b2 100644 > --- gcc/final.c > +++ gcc/final.c > @@ -4501,8 +4501,6 @@ rest_of_handle_final (void) > >assemble_end_function (current_function_decl, fnname); > > - user_defined_section_attribute = false; > - >/* Free up reg info memory. */ >free_reg_info (); > > diff --git gcc/toplev.c gcc/toplev.c > index 9b8d313..4c8c965 100644 > --- gcc/toplev.c > +++ gcc/toplev.c > @@ -153,11 +153,6 @@ HOST_WIDE_INT random_seed; > the support provided depends on the backend. */ > rtx stack_limit_rtx; > > -/* True if the user has tagged the function with the 'section' > - attribute. */ > - > -bool user_defined_section_attribute = false; > - > struct target_flag_state default_target_flag_state; > #if SWITCHABLE_TARGET > struct target_flag_state *this_target_flag_state = > &default_target_flag_state; > diff --git gcc/toplev.h gcc/toplev.h > index 0290be3..65e38e7 100644 > --- gcc/toplev.h > +++ gcc/toplev.h > @@ -53,11 +53,6 @@ extern void target_reinit (void); > /* A unique local time stamp, might be zero if none is available. */ > extern unsigned local_tick; > > -/* True if the user has tagged the function with the 'section' > - attribute. */ > - > -extern bool user_defined_section_attribute; > - > /* See toplev.c. */ > extern int flag_rerun_cse_after_global_opts; > > -- > > On Mon, Aug 11, 2014 at 12:22 PM, Yi Yang wrote: >> Thanks for pointing out this! >> >> It seems to me that this user_defined_section_attribute variable is >> useless now and should be removed. It is intended to work in this way: >> >> for each function { >>parse into tree (setting user_defined_section_attribute) >>do tree passes >>do RTL passes (using user_defined_section_attribute) >>resetting (user_defined_section_attribute = false) >> } >> >> But now GCC works this way: >> >> for each function { >>parse into tree (setting user_defined_section_attribute) >> } >> do IPA passes >> for each function { >>do tree passes >>do RTL passes (using user_defined_section_attribute) >>resetting (user_defined_section_attribute = false) >> } >> >> Therefore the first function will use the actual >> user_defined_section_attribute of the last function, and all other >> functions will always see user_defined_section_attribute=0. >> >> I suggest removing user_defined_section_attribute and simply check >> !DECL_SECTION_NAME (current_function_decl). >> >> On Mon, Aug 11, 2014 at 8:00 AM, Teresa Johnson wrote: >>> On Fri, Aug 8, 2014 at 3:22 PM, Yi Yang wrote: >>>> Friendly ping. >>> >>> Sorry, was OOO. >>> >>> The solution of preventing splitting for named sections is good - but >>> it looks like there is already code that should prevent this. See the >>> user_defined_section_attribute check here - why is that not set? Looks >>> like it should be set in handle_section_attribute() in >>> c-family/c-common.
Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute
On Mon, Aug 11, 2014 at 1:44 PM, Yi Yang wrote: > I ported this to trunk. > > Shall I commit this patch to the Google 4_8/4_9 branches first? Ok with me. Teresa > > On Mon, Aug 11, 2014 at 12:46 PM, Teresa Johnson wrote: >> Ok, thanks. This seems reasonable. Can you send the patch to trunk as well? >> Teresa >> >> On Mon, Aug 11, 2014 at 12:35 PM, Yi Yang wrote: >>> Patch v2 >>> >>> .. >>> >>> diff --git gcc/bb-reorder.c gcc/bb-reorder.c >>> index fa6f62f..a1b3e65 100644 >>> --- gcc/bb-reorder.c >>> +++ gcc/bb-reorder.c >>> @@ -95,7 +95,6 @@ >>> #include "expr.h" >>> #include "params.h" >>> #include "diagnostic-core.h" >>> -#include "toplev.h" /* user_defined_section_attribute */ >>> #include "tree-pass.h" >>> #include "df.h" >>> #include "bb-reorder.h" >>> @@ -2555,7 +2554,7 @@ gate_handle_partition_blocks (void) >>> we are going to omit the reordering. */ >>>&& optimize_function_for_speed_p (cfun) >>>&& !DECL_ONE_ONLY (current_function_decl) >>> - && !user_defined_section_attribute); >>> + && !DECL_SECTION_NAME (current_function_decl)); >>> } >>> >>> /* This function is the main 'entrance' for the optimization that >>> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c >>> index 65c25bf..9923928 100644 >>> --- gcc/c-family/c-common.c >>> +++ gcc/c-family/c-common.c >>> @@ -7378,8 +7378,6 @@ handle_section_attribute (tree *node, tree >>> ARG_UNUSED (name), tree args, >>> >>>if (targetm_common.have_named_sections) >>> { >>> - user_defined_section_attribute = true; >>> - >>>if ((TREE_CODE (decl) == FUNCTION_DECL >>> || TREE_CODE (decl) == VAR_DECL) >>>&& TREE_CODE (TREE_VALUE (args)) == STRING_CST) >>> diff --git gcc/final.c gcc/final.c >>> index 9af0b2b..38c90b2 100644 >>> --- gcc/final.c >>> +++ gcc/final.c >>> @@ -4501,8 +4501,6 @@ rest_of_handle_final (void) >>> >>>assemble_end_function (current_function_decl, fnname); >>> >>> - user_defined_section_attribute = false; >>> - >>>/* Free up reg info memory. */ >>>free_reg_info (); >>> >>> diff --git gcc/toplev.c gcc/toplev.c >>> index 9b8d313..4c8c965 100644 >>> --- gcc/toplev.c >>> +++ gcc/toplev.c >>> @@ -153,11 +153,6 @@ HOST_WIDE_INT random_seed; >>> the support provided depends on the backend. */ >>> rtx stack_limit_rtx; >>> >>> -/* True if the user has tagged the function with the 'section' >>> - attribute. */ >>> - >>> -bool user_defined_section_attribute = false; >>> - >>> struct target_flag_state default_target_flag_state; >>> #if SWITCHABLE_TARGET >>> struct target_flag_state *this_target_flag_state = >>> &default_target_flag_state; >>> diff --git gcc/toplev.h gcc/toplev.h >>> index 0290be3..65e38e7 100644 >>> --- gcc/toplev.h >>> +++ gcc/toplev.h >>> @@ -53,11 +53,6 @@ extern void target_reinit (void); >>> /* A unique local time stamp, might be zero if none is available. */ >>> extern unsigned local_tick; >>> >>> -/* True if the user has tagged the function with the 'section' >>> - attribute. */ >>> - >>> -extern bool user_defined_section_attribute; >>> - >>> /* See toplev.c. */ >>> extern int flag_rerun_cse_after_global_opts; >>> >>> -- >>> >>> On Mon, Aug 11, 2014 at 12:22 PM, Yi Yang wrote: >>>> Thanks for pointing out this! >>>> >>>> It seems to me that this user_defined_section_attribute variable is >>>> useless now and should be removed. It is intended to work in this way: >>>> >>>> for each function { >>>>parse into tree (setting user_defined_section_attribute) >>>>do tree passes >>>>do RTL passes (using user_defined_section_attribute) >>>>resetting (user_defined_section_attribute = false) >>>> } >>>> >>>> But now GCC works this way: >>>> >>>> for each function { >>>>parse into tree (setting user_defined_section_attribute) >>>> } >>>>
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang wrote: > This bug is caused by my last patch, which did not differentiate > between explicit section names (via attributes) and implicit section > names (via -ffunction-section). > > This patch fixes that. > > -- > > diff --git gcc/bb-reorder.c gcc/bb-reorder.c > index 8f8c420..2115b01 100644 > --- gcc/bb-reorder.c > +++ gcc/bb-reorder.c > @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) > we are going to omit the reordering. */ > && optimize_function_for_speed_p (cfun) > && !DECL_ONE_ONLY (current_function_decl) > - && !DECL_SECTION_NAME (current_function_decl)); > + && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); > } > > /* This function is the main 'entrance' for the optimization that > diff --git gcc/tree.h gcc/tree.h > index 817507f..738675a 100644 > --- gcc/tree.h > +++ gcc/tree.h > @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { > #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ >(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p) > > +/* Speficy whether the section name was explicitly set with decl_attributes. > */ > +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ > + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ > + !!DECL_SECTION_NAME(NODE)) Personally, I think it is clearer to simply write this as: #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ (DECL_SECTION_NAME(NODE) != NULL_TREE \ && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) Teresa > + > struct GTY(()) tree_decl_with_vis { > struct tree_decl_with_rtl common; > tree assembler_name; > -- -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Google/4_9] A couple gcov-tool fixes
Fix a couple problems found during testing. Backport from trunk (r212694) failed to fixup gcov_read_counter invocations in google-specific code. Also, forward port r211800 from google/4_8 to tolerate differences after COMDAT fixup. Passes manual testing, ok if passes regression tests? Thanks, Teresa 2014-08-14 Teresa Johnson * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter, Relax the check after COMDAT fixup. (__gcov_merge_icall_topn): Use gcov_get_counter. Index: libgcov-merge.c === --- libgcov-merge.c (revision 213975) +++ libgcov-merge.c (working copy) @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c gcc_assert (!(n_counters % 2)); for (i = 0; i < n_counters; i += 2) { - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); /* Note that global id counter may never have been set if no calls were made from this call-site. */ @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c else if (__gcov_is_gid_insane (global_id)) global_id = counters[i]; - gcc_assert (counters[i] == global_id); + /* In the case of inconsistency, use the src's target. */ + if (counters[i] != global_id) +fprintf (stderr, "Warning: Inconsistent call targets in" + " direct-call profile.\n"); } else if (global_id) counters[i] = global_id; @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi } /* Skip the number_of_eviction entry. */ - gcov_read_counter (); + gcov_get_counter (); for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) { int found = 0; - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); for (m = 0; m < j; m += 2) { if (tmp_array[m] == global_id) -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Google/4_9] A couple gcov-tool fixes
On Thu, Aug 14, 2014 at 11:36 AM, Xinliang David Li wrote: > Ok. > > The interfaces of counter reading/getting now becomes confusing. Should it > be better documented somewhere so that developer knows what is the right one > to use in a certain context? I think it is documented in libgcov.h and accesses should be through the new gcov_get_counter interface, but these just got missed when it was ported from trunk. Teresa > > David > > > > On Thu, Aug 14, 2014 at 11:27 AM, Teresa Johnson > wrote: >> >> Fix a couple problems found during testing. >> >> Backport from trunk (r212694) failed to fixup gcov_read_counter >> invocations in google-specific code. Also, forward port >> r211800 from google/4_8 to tolerate differences after COMDAT >> fixup. >> >> Passes manual testing, ok if passes regression tests? >> >> Thanks, >> Teresa >> >> 2014-08-14 Teresa Johnson >> >> * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter, >> Relax the check after COMDAT fixup. >> (__gcov_merge_icall_topn): Use gcov_get_counter. >> >> Index: libgcov-merge.c >> === >> --- libgcov-merge.c (revision 213975) >> +++ libgcov-merge.c (working copy) >> @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c >>gcc_assert (!(n_counters % 2)); >>for (i = 0; i < n_counters; i += 2) >> { >> - gcov_type global_id = gcov_read_counter (); >> - gcov_type call_count = gcov_read_counter (); >> + gcov_type global_id = gcov_get_counter_target (); >> + gcov_type call_count = gcov_get_counter (); >> >>/* Note that global id counter may never have been set if no calls >> were >> made from this call-site. */ >> @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c >>else if (__gcov_is_gid_insane (global_id)) >> global_id = counters[i]; >> >> - gcc_assert (counters[i] == global_id); >> + /* In the case of inconsistency, use the src's target. */ >> + if (counters[i] != global_id) >> +fprintf (stderr, "Warning: Inconsistent call targets in" >> + " direct-call profile.\n"); >> } >>else if (global_id) >> counters[i] = global_id; >> @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi >> } >> >>/* Skip the number_of_eviction entry. */ >> - gcov_read_counter (); >> + gcov_get_counter (); >>for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) >> { >>int found = 0; >> - gcov_type global_id = gcov_read_counter (); >> - gcov_type call_count = gcov_read_counter (); >> + gcov_type global_id = gcov_get_counter_target (); >> + gcov_type call_count = gcov_get_counter (); >>for (m = 0; m < j; m += 2) >> { >>if (tmp_array[m] == global_id) >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang wrote: > Patch v2. > > Trunk no longer set SECTION_NAME for implicit section names, so this > probably does not apply to trunk. It's probably not necessary for > trunk either. > > Tested for Google 4.8(albeit unnecessary) and 4.9 branch. > > diff --git gcc/bb-reorder.c gcc/bb-reorder.c > index a1b3e65..b9a829e 100644 > --- gcc/bb-reorder.c > +++ gcc/bb-reorder.c > @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void) > we are going to omit the reordering. */ >&& optimize_function_for_speed_p (cfun) >&& !DECL_ONE_ONLY (current_function_decl) > - && !DECL_SECTION_NAME (current_function_decl)); > + && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); > } > > /* This function is the main 'entrance' for the optimization that > diff --git gcc/tree.h gcc/tree.h > index b656b7b..308eef8 100644 > --- gcc/tree.h > +++ gcc/tree.h > @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree); > #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ >(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p) > > +/* Speficy whether the section name was explicitly set with decl_attributes. > */ Typo "Specify". Otherwise looks ok to me. Thanks, Teresa > +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ > + (DECL_SECTION_NAME(NODE) != NULL_TREE \ > + && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) > + > extern tree decl_debug_expr_lookup (tree); > extern void decl_debug_expr_insert (tree, tree); > > -- > > On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson wrote: >> On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang wrote: >>> This bug is caused by my last patch, which did not differentiate >>> between explicit section names (via attributes) and implicit section >>> names (via -ffunction-section). >>> >>> This patch fixes that. >>> >>> -- >>> >>> diff --git gcc/bb-reorder.c gcc/bb-reorder.c >>> index 8f8c420..2115b01 100644 >>> --- gcc/bb-reorder.c >>> +++ gcc/bb-reorder.c >>> @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) >>> we are going to omit the reordering. */ >>> && optimize_function_for_speed_p (cfun) >>> && !DECL_ONE_ONLY (current_function_decl) >>> - && !DECL_SECTION_NAME (current_function_decl)); >>> + && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); >>> } >>> >>> /* This function is the main 'entrance' for the optimization that >>> diff --git gcc/tree.h gcc/tree.h >>> index 817507f..738675a 100644 >>> --- gcc/tree.h >>> +++ gcc/tree.h >>> @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { >>> #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ >>>(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p) >>> >>> +/* Speficy whether the section name was explicitly set with >>> decl_attributes. */ >>> +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ >>> + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ >>> + !!DECL_SECTION_NAME(NODE)) >> >> Personally, I think it is clearer to simply write this as: >> >> #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ >> (DECL_SECTION_NAME(NODE) != NULL_TREE \ >> && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) >> >> Teresa >> >>> + >>> struct GTY(()) tree_decl_with_vis { >>> struct tree_decl_with_rtl common; >>> tree assembler_name; >>> -- >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Fix AutoFDO LIPO ICE due to eliminated abstract origin
This patch ensures we don't prematurely delete an abstract origin node, leading to creating a new one after LIPO linking when we invoke similar handling in symtab_remove_unreachable_nodes, which then does not have a resolved node. It makes the handling of such nodes equivalent to that in symtab_remove_unreachable_nodes. Tested with regression tests and internal benchmarks. Ok for google/4_9 2014-08-23 Teresa Johnson Google ref b/16731481 * cgraphunit.c (analyze_functions): Don't remove origin node. Index: cgraphunit.c === --- cgraphunit.c(revision 214320) +++ cgraphunit.c(working copy) @@ -1051,6 +1051,7 @@ analyze_functions (void) struct cgraph_node *origin_node = cgraph_get_node (DECL_ABSTRACT_ORIGIN (decl)); origin_node->used_as_abstract_origin = true; + enqueue_node (origin_node); } } else -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Fix -fopt-info seg fault in AutoFDO LIPO mode
Fixes seg fault when using -fopt-info with AutoFDO LIPO. Ensure that the mapping between module name and ident is setup. Tested with regression tests and internal benchmarks. Ok for google/4_9 2014-08-23 Teresa Johnson Google ref b/17124135 * auto-profile.c (read_aux_modules): Record the module name/ident pair. * coverage.c (record_module_name): Make non-static. * l-ipo.h (record_module_name): Ditto. Index: auto-profile.c === --- auto-profile.c (revision 214320) +++ auto-profile.c (working copy) @@ -958,6 +958,7 @@ read_aux_modules (void) module_infos = XCNEWVEC (gcov_module_info *, num_aux_modules + 1); module_infos[0] = module; primary_module_id = module->ident; + record_module_name (module->ident, lbasename (in_fnames[0])); if (aux_modules == NULL) return; unsigned curr_module = 1, max_group = PARAM_VALUE (PARAM_MAX_LIPO_GROUP); @@ -1004,6 +1005,7 @@ read_aux_modules (void) } module_infos[curr_module++] = aux_module; add_input_filename (*iter); + record_module_name (aux_module->ident, lbasename (*iter)); } } Index: coverage.c === --- coverage.c (revision 214320) +++ coverage.c (working copy) @@ -658,7 +658,7 @@ typedef struct { static vec *mod_names; -static void +void record_module_name (unsigned int mod_id, const char *name) { mod_id_to_name_t t; Index: l-ipo.h === --- l-ipo.h (revision 214320) +++ l-ipo.h (working copy) @@ -63,6 +63,7 @@ int equivalent_struct_types_for_tbaa (const_tree t void lipo_link_and_fixup (void); extern void copy_defined_module_set (tree, tree); extern bool is_parsing_done_p (void); +extern void record_module_name (unsigned int, const char *); extern const char* get_module_name (unsigned int); #endif -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fix AutoFDO LIPO ICE due to eliminated abstract origin
The ICE is, because we create a new node when querying the node during symtab_remove_unreachable_nodes after LIPO linking is complete, and then try to access its resolved node, for which there is none. Trunk has the same code in these two locations, but the creation of a new node after deleting the first does not cause any problems. Teresa On Sat, Aug 23, 2014 at 11:50 AM, Xinliang David Li wrote: > Is it a problem specific to LIPO? > > David > > On Sat, Aug 23, 2014 at 11:41 AM, Teresa Johnson wrote: >> This patch ensures we don't prematurely delete an abstract origin node, >> leading >> to creating a new one after LIPO linking when we invoke similar handling in >> symtab_remove_unreachable_nodes, which then does not have a resolved node. It >> makes the handling of such nodes equivalent to that in >> symtab_remove_unreachable_nodes. >> >> Tested with regression tests and internal benchmarks. Ok for google/4_9 >> >> 2014-08-23 Teresa Johnson >> >> Google ref b/16731481 >> * cgraphunit.c (analyze_functions): Don't remove origin node. >> >> Index: cgraphunit.c >> === >> --- cgraphunit.c(revision 214320) >> +++ cgraphunit.c(working copy) >> @@ -1051,6 +1051,7 @@ analyze_functions (void) >> struct cgraph_node *origin_node >> = cgraph_get_node (DECL_ABSTRACT_ORIGIN (decl)); >> origin_node->used_as_abstract_origin = true; >> + enqueue_node (origin_node); >> } >> } >> else >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Fixup varpool references after LIPO linking
This patch fixes up varpool nodes after LIPO linking as we were doing for cgraph node references. While, here I made some fixes to the cgraph fixup as well (mark address taken as appropriate) and removed old references. The latter exposed an issue with resolved cgraph nodes getting eliminated when they were only referenced from vtables and the LIPO linking selected an external copy as the resolved node. Addressed this by forcing the LIPO linking to prefer the non-external copy. Passes regression testing, internal benchmark testing in progress. Ok for google/4_9 if that succeeds? Teresa 2014-08-28 Teresa Johnson Google ref b/17038802. * l-ipo.c (resolve_cgraph_node): Pick non-external node. (fixup_reference_list): Fixup varpool references, remove old references, mark cgraph nodes as address taken as needed. Index: l-ipo.c === --- l-ipo.c (revision 213975) +++ l-ipo.c (working copy) @@ -1564,6 +1564,15 @@ resolve_cgraph_node (struct cgraph_sym **slot, str (*slot)->rep_decl = decl2; return; } + /* Similarly, pick the non-external symbol, since external + symbols may be eliminated by symtab_remove_unreachable_nodes + after ipa inlining (see process_references). */ + if (DECL_EXTERNAL (decl1) && !DECL_EXTERNAL (decl2)) +{ + (*slot)->rep_node = node; + (*slot)->rep_decl = decl2; + return; +} has_prof1 = has_profile_info (decl1); bool is_aux1 = cgraph_is_auxiliary (decl1); @@ -2304,31 +2313,44 @@ fixup_reference_list (struct varpool_node *node) int i; struct ipa_ref *ref; struct ipa_ref_list *list = &node->ref_list; - vec new_refered; + vec new_refered; vec new_refered_type; - struct cgraph_node *c; + struct symtab_node *sym_node; enum ipa_ref_use use_type = IPA_REF_LOAD; new_refered.create (10); new_refered_type.create (10); for (i = 0; ipa_ref_list_reference_iterate (list, i, ref); i++) { - if (!is_a (ref->referred)) -continue; - - struct cgraph_node *cnode = ipa_ref_node (ref); - struct cgraph_node *r_cnode -= cgraph_lipo_get_resolved_node (cnode->decl); - if (r_cnode != cnode) + if (is_a (ref->referred)) { + struct cgraph_node *cnode = ipa_ref_node (ref); + struct cgraph_node *r_cnode += cgraph_lipo_get_resolved_node (cnode->decl); new_refered.safe_push (r_cnode); use_type = ref->use; new_refered_type.safe_push ((int) use_type); + gcc_assert (use_type != IPA_REF_ADDR + || cnode->global.inlined_to + || cnode->address_taken); + if (use_type == IPA_REF_ADDR) +cgraph_mark_address_taken_node (r_cnode); } + else if (is_a (ref->referred)) +{ + struct varpool_node *var = ipa_ref_varpool_node (ref); + struct varpool_node *r_var = real_varpool_node (var->decl); + new_refered.safe_push (r_var); + use_type = ref->use; + new_refered_type.safe_push ((int) use_type); +} + else +gcc_assert (false); } - for (i = 0; new_refered.iterate (i, &c); ++i) + ipa_remove_all_references (&node->ref_list); + for (i = 0; new_refered.iterate (i, &sym_node); ++i) { - ipa_record_reference (node, c, + ipa_record_reference (node, sym_node, (enum ipa_ref_use) new_refered_type[i], NULL); } } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fixup varpool references after LIPO linking
On Thu, Aug 28, 2014 at 1:40 PM, Xinliang David Li wrote: > > Do you know why the previous check is not enough ? > > cgraph_can_remove_if_no_direct_calls_and_refs_p (struct cgraph_node *node) This will return true for the external node, but it also returns true for the non-external node. The non-external node is a COMDAT, as as the comments in that routine indicate, COMDATs can be removed even if they are externally_visible. Teresa > > David > > > On Thu, Aug 28, 2014 at 1:29 PM, Teresa Johnson > wrote: >> >> This patch fixes up varpool nodes after LIPO linking as we were doing >> for cgraph node references. While, here I made some fixes to the >> cgraph fixup as well (mark address taken as appropriate) and removed >> old references. The latter exposed an issue with resolved cgraph nodes >> getting eliminated when they were only referenced from vtables and the >> LIPO linking selected an external copy as the resolved node. Addressed >> this by forcing the LIPO linking to prefer the non-external copy. >> >> Passes regression testing, internal benchmark testing in progress. Ok >> for google/4_9 if that succeeds? >> >> Teresa >> >> 2014-08-28 Teresa Johnson >> >> Google ref b/17038802. >> * l-ipo.c (resolve_cgraph_node): Pick non-external node. >> (fixup_reference_list): Fixup varpool references, remove old >> references, mark cgraph nodes as address taken as needed. >> >> Index: l-ipo.c >> === >> --- l-ipo.c (revision 213975) >> +++ l-ipo.c (working copy) >> @@ -1564,6 +1564,15 @@ resolve_cgraph_node (struct cgraph_sym **slot, str >>(*slot)->rep_decl = decl2; >>return; >> } >> + /* Similarly, pick the non-external symbol, since external >> + symbols may be eliminated by symtab_remove_unreachable_nodes >> + after ipa inlining (see process_references). */ >> + if (DECL_EXTERNAL (decl1) && !DECL_EXTERNAL (decl2)) >> +{ >> + (*slot)->rep_node = node; >> + (*slot)->rep_decl = decl2; >> + return; >> +} >> >>has_prof1 = has_profile_info (decl1); >>bool is_aux1 = cgraph_is_auxiliary (decl1); >> @@ -2304,31 +2313,44 @@ fixup_reference_list (struct varpool_node *node) >>int i; >>struct ipa_ref *ref; >>struct ipa_ref_list *list = &node->ref_list; >> - vec new_refered; >> + vec new_refered; >>vec new_refered_type; >> - struct cgraph_node *c; >> + struct symtab_node *sym_node; >>enum ipa_ref_use use_type = IPA_REF_LOAD; >> >>new_refered.create (10); >>new_refered_type.create (10); >>for (i = 0; ipa_ref_list_reference_iterate (list, i, ref); i++) >> { >> - if (!is_a (ref->referred)) >> -continue; >> - >> - struct cgraph_node *cnode = ipa_ref_node (ref); >> - struct cgraph_node *r_cnode >> -= cgraph_lipo_get_resolved_node (cnode->decl); >> - if (r_cnode != cnode) >> + if (is_a (ref->referred)) >> { >> + struct cgraph_node *cnode = ipa_ref_node (ref); >> + struct cgraph_node *r_cnode >> += cgraph_lipo_get_resolved_node (cnode->decl); >>new_refered.safe_push (r_cnode); >>use_type = ref->use; >>new_refered_type.safe_push ((int) use_type); >> + gcc_assert (use_type != IPA_REF_ADDR >> + || cnode->global.inlined_to >> + || cnode->address_taken); >> + if (use_type == IPA_REF_ADDR) >> +cgraph_mark_address_taken_node (r_cnode); >> } >> + else if (is_a (ref->referred)) >> +{ >> + struct varpool_node *var = ipa_ref_varpool_node (ref); >> + struct varpool_node *r_var = real_varpool_node (var->decl); >> + new_refered.safe_push (r_var); >> + use_type = ref->use; >> + new_refered_type.safe_push ((int) use_type); >> +} >> + else >> +gcc_assert (false); >> } >> - for (i = 0; new_refered.iterate (i, &c); ++i) >> + ipa_remove_all_references (&node->ref_list); >> + for (i = 0; new_refered.iterate (i, &sym_node); ++i) >> { >> - ipa_record_reference (node, c, >> + ipa_record_reference (node, sym_node, >> (enum ipa_ref_use) new_refered_type[i], >> NULL); >> } >> } >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
Ping. Thanks, Teresa On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson wrote: > On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka wrote: >>> Here is the patch updated to use the new parameter from r203830. >>> Passed bootstrap and regression tests. >>> >>> 2013-10-18 Jan Hubicka >>> Teresa Johnson >>> >>> * predict.c (handle_missing_profiles): New function. >>> (counts_to_freqs): Don't overwrite estimated frequencies >>> when function has no profile counts. >>> * predict.h (handle_missing_profiles): Declare. >>> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. >>> >>> Index: predict.c >>> === >>> --- predict.c (revision 203830) >>> +++ predict.c (working copy) >>> @@ -2758,6 +2758,40 @@ estimate_loops (void) >>>BITMAP_FREE (tovisit); >>> } >>> >> >> You need block comment. It should explain the problem of COMDATs and how >> they are handled. >> It is not an obvious thing. > > Done. > >> >>> +void >>> +handle_missing_profiles (void) >>> +{ >>> + struct cgraph_node *node; >>> + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >> extra line >>> + /* See if 0 count function has non-0 count callers. In this case we >>> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >>> + FOR_EACH_DEFINED_FUNCTION (node) >>> +{ >>> + struct cgraph_edge *e; >>> + gcov_type call_count = 0; >>> + struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl); >> Extra line >>> + if (node->count) >>> +continue; >>> + for (e = node->callers; e; e = e->next_caller) >>> +call_count += e->count; >> What about checking if the sum is way off even for non-0 counts. >> I.e. for case where function was inlined to some calls but not to others? In >> that case we may want to scale up the counts (with some resonable care for >> capping) > > In this patch I am not changing any counts, so I am leaving this one > for follow-on work (even for the routines missing counts completely > like these, we don't apply any counts, just mark them as guessed. I > have a follow-on patch to send once this goes in that does apply > counts to these 0-count routines only when we decide to inline as we > discussed). > >> >> Also I think the code really should propagate - i.e. have simple worklist and >> look for functions that are COMDAT and are called by other COMDAT functions >> where we decided to drop the profile. Having non-trivial chains of comdats >> is >> common thing. > > Done. > >> >> What about outputting user visible warning/error on the incosnsistency when >> no COMDAT function is involved same way as we do for BB profile? > > Done. This one caught the fact that we have this situation for "extern > template" functions as well when I tested on cpu2006. I added in a > check to ignore those when issuing the warning (they are not emitted > thus don't get any profile counts). > > Updated patch below. > > Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on > profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk? > > Thanks, > Teresa > > 2013-10-30 Teresa Johnson > > * predict.c (drop_profile): New function. > (handle_missing_profiles): Ditto. > (counts_to_freqs): Don't overwrite estimated frequencies > when function has no profile counts. > * predict.h (handle_missing_profiles): Declare. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. > > Index: predict.c > === > --- predict.c (revision 204213) > +++ predict.c (working copy) > @@ -2765,6 +2765,107 @@ estimate_loops (void) >BITMAP_FREE (tovisit); > } > > +/* Drop the profile for NODE to guessed, and update its frequency based on > + whether it is expected to be HOT. */ > + > +static void > +drop_profile (struct cgraph_node *node, bool hot) > +{ > + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + if (dump_file) > +fprintf (dump_file, > + "Dropping 0 profile for %s/%i. %s based on calls.\n", > + cgraph_node_name (node), node->order, > + hot ? "Function is h
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher wrote: > On Wed, Apr 24, 2013 at 12:58 AM, Sriraman Tallam wrote: >> Hi, >> >> This patch generates labels for cold function parts that are split when >> using the option -freorder-blocks-and-partition. The cold label name >> is generated by suffixing ".cold" to the assembler name of the hot >> function. >> >> This is useful when getting back traces from gdb when the cold function >> part does get executed. > > Can you show how this changes your gdb back traces? Can you not > already let GDB inform you that it's in another .text section? > > I don't like this label idea much. It seems so arbitrary, unless > you're only interested in knowing when a piece of function from > another .text section is executed. But I'm personally more interested > in the whole function, i.e. also in not-so-hot (or even cold) blocks > that are not in the .text.unlikely section. That means somehow passing > the profile-use information from the compiler to the debugger. > > So what I would really like to have instead, is some kind of DWARF > annotations for profile information in the function, with some way in > GDB to inspect and modify the values, and to add break points based on > execution count threshold. > > I haven't really thought this through yet, but I have this dream of > having the data of a gcno file encoded in DWARF instead, and to have > profile information included in the FDO-compiled binary to add the > ability to verify that those parts of the program that you expect to > be hot are really also considered hot by the compiler. The only way to > check this now is by inspecting some dumps and that's IMHO not very > practical. > > Ciao! > Steven I'd like to revisit this old patch from Sri to generate a label for the split out cold section, now that all the patches to fix the failures and insanities from -freorder-blocks-and-partition are in (and I want to send a patch to enable it by default with fdo). The problem is that the split code doesn't currently have any label to identify where it came from, so in the final binary it appears to be part of whatever non-cold function the linker happens to place it after. This confuses tools like objdump, profilers and the debugger. For example, from mcf with -freorder-blocks-and-partition: main: .LFB40: .cfi_startproc ... jne .L27 < jump to main's text.unlikely ... .cfi_endproc .section.text.unlikely .cfi_startproc .L27: <--- start of main's text.unlikely ... $ objdump -d mcf 00400aa0 : ... 400b1a: 0f 85 1b fb ff ff jne40063b < jump to main's text.unlikely 004004c0 : ... 40063b: bf 06 8a 49 00 mov$0x498a06,%edi<--- start of main's text.unlikely ... 40065e: e9 0d 05 00 00 jmpq 400b70 <--- jump back to main's hot text Note that objdump thinks we are jumping to/from another function. If we end up executing the cold section (e.g. due to non-representative profiles), profiling tools and gdb are going to think we are executing replace_weaker_arc. With Sri's patch, this becomes much clearer: 00400aa0 : ... 400b1a: 0f 85 1b fb ff ff jne40063b 0040063b : 40063b: bf 06 8a 49 00 mov$0x498a06,%edi ... 40065e: e9 0d 05 00 00 jmpq 400b70 Steven, I think the ideas you describe above about passing profile information in DWARF and doing some special handling in gdb to use that seem orthogonal to this. Given that, is this patch ok for trunk (pending successful re-runs of regression testing) Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
On Fri, Nov 8, 2013 at 8:14 AM, Steven Bosscher wrote: > On Fri, Nov 8, 2013 at 4:45 PM, Teresa Johnson wrote: >> On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher wrote: >> I'd like to revisit this old patch from Sri to generate a label for >> the split out cold section, now that all the patches to fix the >> failures and insanities from -freorder-blocks-and-partition are in >> (and I want to send a patch to enable it by default with fdo). The >> problem is that the split code doesn't currently have any label to >> identify where it came from, so in the final binary it appears to be >> part of whatever non-cold function the linker happens to place it >> after. This confuses tools like objdump, profilers and the debugger. > > Surely there are .loc directives in the code to clarify where the code > came from? So at least the debugger should know what function the code > belongs to. > >> For example, from mcf with -freorder-blocks-and-partition: >> >> main: >> .LFB40: >> .cfi_startproc >> ... >> jne .L27 < jump to main's text.unlikely >> ... >> .cfi_endproc >> .section.text.unlikely >> .cfi_startproc >> .L27: <--- start of main's text.unlikely >> ... >> >> $ objdump -d mcf >> >> 00400aa0 : >> ... >> 400b1a: 0f 85 1b fb ff ff jne40063b >>< jump to main's text.unlikely >> >> 004004c0 : >> ... >> 40063b: bf 06 8a 49 00 mov$0x498a06,%edi<--- >> start of main's text.unlikely >> ... >> 40065e: e9 0d 05 00 00 jmpq 400b70 >> <--- jump back to main's hot text >> >> >> Note that objdump thinks we are jumping to/from another function. If >> we end up executing the cold section (e.g. due to non-representative >> profiles), profiling tools and gdb are going to think we are executing >> replace_weaker_arc. With Sri's patch, this becomes much clearer: >> >> 00400aa0 : >> ... >> 400b1a: 0f 85 1b fb ff ff jne40063b >> >> 0040063b : >> 40063b: bf 06 8a 49 00 mov$0x498a06,%edi >> ... >> 40065e: e9 0d 05 00 00 jmpq 400b70 >> > > Does -ffunction-sections help here? No. It just moves the cold section to a different spot (so the function it appears part of is different, but same issue). > > >> Steven, I think the ideas you describe above about passing profile >> information in DWARF and doing some special handling in gdb to use >> that seem orthogonal to this. > > It's not just about profile information but about what function some > code belongs to. I just cannot believe that there isn't already > something in DWARF to mark address ranges as being part of a given > function. Cary, any ideas here? > > >> Given that, is this patch ok for trunk (pending successful re-runs of >> regression testing) > > > IMHO, no. This is conceptually wrong. You create artificial named > labels in the compiler that might clash with user labels. Is that > likely to happen? No. But it could and therefore the approach of > adding functionname.cold labels is wrong. Is there a format for compiler-defined labels that would not be able to clash with other user-generated labels? Teresa > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
On Tue, Nov 5, 2013 at 6:02 AM, Teresa Johnson wrote: > Ping. > Thanks, Teresa > I updated the patch to include the follow-on work to apply call counts to 0-weight functions using the estimated frequencies when inlining. This helps avoid 0-weight blocks of inlined code which are particularly bad for function splitting. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk? Thanks, Teresa 2013-11-08 Teresa Johnson Jan Hubicka * predict.c (drop_profile): New function. (handle_missing_profiles): Ditto. (counts_to_freqs): Don't overwrite estimated frequencies when function has no profile counts. * predict.h (handle_missing_profiles): Declare. * tree-inline.c (freqs_to_counts): New function. (copy_cfg_body): Invoke freqs_to_counts as needed. * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. Index: predict.c === --- predict.c (revision 204516) +++ predict.c (working copy) @@ -2765,6 +2765,107 @@ estimate_loops (void) BITMAP_FREE (tovisit); } +/* Drop the profile for NODE to guessed, and update its frequency based on + whether it is expected to be HOT. */ + +static void +drop_profile (struct cgraph_node *node, bool hot) +{ + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + + if (dump_file) +fprintf (dump_file, + "Dropping 0 profile for %s/%i. %s based on calls.\n", + cgraph_node_name (node), node->order, + hot ? "Function is hot" : "Function is normal"); + /* We only expect to miss profiles for functions that are reached + via non-zero call edges in cases where the function may have + been linked from another module or library (COMDATs and extern + templates). See the comments below for handle_missing_profiles. */ + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) +warning (0, + "Missing counts for called function %s/%i", + cgraph_node_name (node), node->order); + + profile_status_for_function (fn) + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); + node->frequency + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; +} + +/* In the case of COMDAT routines, multiple object files will contain the same + function and the linker will select one for the binary. In that case + all the other copies from the profile instrument binary will be missing + profile counts. Look for cases where this happened, due to non-zero + call counts going to 0-count functions, and drop the profile to guessed + so that we can use the estimated probabilities and avoid optimizing only + for size. + + The other case where the profile may be missing is when the routine + is not going to be emitted to the object file, e.g. for "extern template" + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in + all other cases of non-zero calls to 0-count functions. */ + +void +handle_missing_profiles (void) +{ + struct cgraph_node *node; + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); + vec worklist; + worklist.create (64); + + /* See if 0 count function has non-0 count callers. In this case we + lost some profile. Drop its function profile to PROFILE_GUESSED. */ + FOR_EACH_DEFINED_FUNCTION (node) +{ + struct cgraph_edge *e; + gcov_type call_count = 0; + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + + if (node->count) +continue; + for (e = node->callers; e; e = e->next_caller) +call_count += e->count; + if (call_count + && fn && fn->cfg + && (call_count * unlikely_count_fraction >= profile_info->runs)) +{ + bool maybe_hot = maybe_hot_count_p (NULL, call_count); + + drop_profile (node, maybe_hot); + worklist.safe_push (node); +} +} + + /* Propagate the profile dropping to other 0-count COMDATs that are + potentially called by COMDATs we already dropped the profile on. */ + while (worklist.length () > 0) +{ + struct cgraph_edge *e; + + node = worklist.pop (); + for (e = node->callees; e; e = e->next_caller) +{ + struct cgraph_node *callee = e->callee; + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); + + if (callee->count > 0) +continue; + if (DECL_COMDAT (callee->decl) && fn && fn->cfg + && profile_status_for_function (fn) == PROFILE_READ) +{ + /* Since there are no non-0 call counts to this function, + we don't know for sur
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka wrote: >> 2013-11-08 Teresa Johnson >> Jan Hubicka >> >> * predict.c (drop_profile): New function. >> (handle_missing_profiles): Ditto. >> (counts_to_freqs): Don't overwrite estimated frequencies >> when function has no profile counts. >> * predict.h (handle_missing_profiles): Declare. >> * tree-inline.c (freqs_to_counts): New function. >> (copy_cfg_body): Invoke freqs_to_counts as needed. >> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. >> >> Index: predict.c >> === >> --- predict.c (revision 204516) >> +++ predict.c (working copy) >> @@ -2765,6 +2765,107 @@ estimate_loops (void) >>BITMAP_FREE (tovisit); >> } >> >> +/* Drop the profile for NODE to guessed, and update its frequency based on >> + whether it is expected to be HOT. */ >> + >> +static void >> +drop_profile (struct cgraph_node *node, bool hot) >> +{ >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (dump_file) >> +fprintf (dump_file, >> + "Dropping 0 profile for %s/%i. %s based on calls.\n", >> + cgraph_node_name (node), node->order, >> + hot ? "Function is hot" : "Function is normal"); >> + /* We only expect to miss profiles for functions that are reached >> + via non-zero call edges in cases where the function may have >> + been linked from another module or library (COMDATs and extern >> + templates). See the comments below for handle_missing_profiles. */ >> + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) >> +warning (0, >> + "Missing counts for called function %s/%i", >> + cgraph_node_name (node), node->order); >> + >> + profile_status_for_function (fn) >> + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >> + node->frequency >> + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; >> +} >> + >> +/* In the case of COMDAT routines, multiple object files will contain the >> same >> + function and the linker will select one for the binary. In that case >> + all the other copies from the profile instrument binary will be missing >> + profile counts. Look for cases where this happened, due to non-zero >> + call counts going to 0-count functions, and drop the profile to guessed >> + so that we can use the estimated probabilities and avoid optimizing only >> + for size. >> + >> + The other case where the profile may be missing is when the routine >> + is not going to be emitted to the object file, e.g. for "extern template" >> + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in >> + all other cases of non-zero calls to 0-count functions. */ >> + >> +void >> +handle_missing_profiles (void) >> +{ >> + struct cgraph_node *node; >> + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >> + vec worklist; >> + worklist.create (64); >> + >> + /* See if 0 count function has non-0 count callers. In this case we >> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >> + FOR_EACH_DEFINED_FUNCTION (node) >> +{ >> + struct cgraph_edge *e; >> + gcov_type call_count = 0; >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (node->count) >> +continue; >> + for (e = node->callers; e; e = e->next_caller) >> +call_count += e->count; >> + if (call_count >> + && fn && fn->cfg >> + && (call_count * unlikely_count_fraction >= profile_info->runs)) >> +{ >> + bool maybe_hot = maybe_hot_count_p (NULL, call_count); >> + >> + drop_profile (node, maybe_hot); >> + worklist.safe_push (node); >> +} > > Perhaps we should add an note/error on mishandled profile when function is > not COMDAT? > That way we may notice further bugs in this area. I have a warning like that already in drop_profile(). Is that sufficient? Also, Steven Bosscher suggested putting that warning under OPT_Wdisabled_optimization instead of '0', what do you prefer for that? >> +} >> + >> + /* Propagate the pro
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka wrote: >> I have a warning like that already in drop_profile(). Is that > > I think it should be warning (or silent) for COMDAT and error/note for > other functions (depending on flag_profile_correction). > I guess drop_profile is better place for it indeed. I don't want to warn for COMDAT since this is currently an expected issue in that case, so I'm afraid it will be too noisy (but there is a note emitted to the dump file to track how often this occurs). So currently the warning is only emitted in cases where we don't expect it to occur (e.g. non-comdat). > >> sufficient? Also, Steven Bosscher suggested putting that warning under >> OPT_Wdisabled_optimization instead of '0', what do you prefer for >> that? > > It is a bit different case than other disabled optimizations we have (where > the optimization does not happen because of --param tunable limits), but I > am fine with both alternatives. > The warning may end up quite noisy so having way to silence it definitely > makes sense. I didn't find any warnings being emitted when running this for spec or internal benchmarks, so how about instead of a warning, I handle it like you suggest above: depending on the setting of flag_profile_correction either error or emit a note to the dump file under MSG_MISSED_OPTIMIZATION? >> >> >> +} >> >> + >> >> + /* Propagate the profile dropping to other 0-count COMDATs that are >> >> + potentially called by COMDATs we already dropped the profile on. */ >> >> + while (worklist.length () > 0) >> >> +{ >> >> + struct cgraph_edge *e; >> >> + >> >> + node = worklist.pop (); >> >> + for (e = node->callees; e; e = e->next_caller) >> >> +{ >> >> + struct cgraph_node *callee = e->callee; >> >> + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >> >> + >> >> + if (callee->count > 0) >> >> +continue; >> >> + if (DECL_COMDAT (callee->decl) && fn && fn->cfg >> >> + && profile_status_for_function (fn) == PROFILE_READ) >> > >> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on >> > profile estimate >> > to give us false only for really known to be unlikely paths? (i.e. EH >> > handling, noreturns >> > etc.) >> >> Ok, let me try this. >> >> >> +{ >> >> + /* Since there are no non-0 call counts to this function, >> >> + we don't know for sure whether it is hot. Indicate to >> >> + the drop_profile routine that function should be marked >> >> + normal, rather than hot. */ >> >> + drop_profile (node, false); >> >> + worklist.safe_push (callee); >> >> +} >> >> +} >> >> +} >> >> + worklist.release (); >> > >> > I would add a pointer set so we avoid duplicates in worklist. This is >> > potentially quadratic >> > for large programs. >> >> I'll add a visited_nodes set to keep track of processed nodes so they >> don't get added to the worklist multiple times. > > Perhaps you can also track this by testing profile_status_for_function. Both > solutions are fine for me. Ah - I just realized I am already checking profile_status_for_function above and adding to the worklist if it is still PROFILE_READ. Since I call drop_profile before adding to the worklist, we can't end up adding multiple times. So I don't think there is currently an issue with this. Thanks, Teresa > > Honza >> >> Thanks, >> Teresa >> >> > >> > OK, with these changes. >> > >> > Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka wrote: >> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka wrote: >> >> I have a warning like that already in drop_profile(). Is that >> > >> > I think it should be warning (or silent) for COMDAT and error/note for >> > other functions (depending on flag_profile_correction). >> > I guess drop_profile is better place for it indeed. >> >> I don't want to warn for COMDAT since this is currently an expected >> issue in that case, so I'm afraid it will be too noisy (but there is a >> note emitted to the dump file to track how often this occurs). So >> currently the warning is only emitted in cases where we don't expect >> it to occur (e.g. non-comdat). > > I see, I misread it. > The non-comdat warning IMO ought go the same way as the other profile > confussions. > I guess something like > if (!flag_profile_correction) > error (...); > like existing cases in profile.c Ok, will do (emitting a note to the dump file as in other cases in profile.c that do this same thing). >> >> I didn't find any warnings being emitted when running this for spec or >> internal benchmarks, so how about instead of a warning, I handle it >> like you suggest above: depending on the setting of >> flag_profile_correction either error or emit a note to the dump file >> under MSG_MISSED_OPTIMIZATION? > > Yep, lets just go with error with !flag_profile_correction and being silent > otherwise. > If we want to go with warnings, we need to change all the similar places > in profile.c and elsewhere. >> >> Ah - I just realized I am already checking profile_status_for_function >> above and adding to the worklist if it is still PROFILE_READ. Since I >> call drop_profile before adding to the worklist, we can't end up >> adding multiple times. So I don't think there is currently an issue >> with this. > > Yep, indeed. > > The patch is OK with the error message as discussed above. Ok, will do that and fix a couple of minor issues mentioned by Steven (missing comment and incorrect indentation in one spot). Will retest just to make sure I didn't miss any warnings which will now be errors. Thanks, Teresa > > Thanks. > Honza >> >> Thanks, >> Teresa >> >> > >> > Honza >> >> >> >> Thanks, >> >> Teresa >> >> >> >> > >> >> > OK, with these changes. >> >> > >> >> > Honza >> >> >> >> >> >> >> >> -- >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
On Fri, Nov 8, 2013 at 8:20 AM, Teresa Johnson wrote: > On Fri, Nov 8, 2013 at 8:14 AM, Steven Bosscher wrote: >> On Fri, Nov 8, 2013 at 4:45 PM, Teresa Johnson wrote: >>> On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher wrote: >>> I'd like to revisit this old patch from Sri to generate a label for >>> the split out cold section, now that all the patches to fix the >>> failures and insanities from -freorder-blocks-and-partition are in >>> (and I want to send a patch to enable it by default with fdo). The >>> problem is that the split code doesn't currently have any label to >>> identify where it came from, so in the final binary it appears to be >>> part of whatever non-cold function the linker happens to place it >>> after. This confuses tools like objdump, profilers and the debugger. >> >> Surely there are .loc directives in the code to clarify where the code >> came from? So at least the debugger should know what function the code >> belongs to. >> >>> For example, from mcf with -freorder-blocks-and-partition: >>> >>> main: >>> .LFB40: >>> .cfi_startproc >>> ... >>> jne .L27 < jump to main's text.unlikely >>> ... >>> .cfi_endproc >>> .section.text.unlikely >>> .cfi_startproc >>> .L27: <--- start of main's text.unlikely >>> ... >>> >>> $ objdump -d mcf >>> >>> 00400aa0 : >>> ... >>> 400b1a: 0f 85 1b fb ff ff jne40063b >>>< jump to main's text.unlikely >>> >>> 004004c0 : >>> ... >>> 40063b: bf 06 8a 49 00 mov$0x498a06,%edi<--- >>> start of main's text.unlikely >>> ... >>> 40065e: e9 0d 05 00 00 jmpq 400b70 >>> <--- jump back to main's hot text >>> >>> >>> Note that objdump thinks we are jumping to/from another function. If >>> we end up executing the cold section (e.g. due to non-representative >>> profiles), profiling tools and gdb are going to think we are executing >>> replace_weaker_arc. With Sri's patch, this becomes much clearer: >>> >>> 00400aa0 : >>> ... >>> 400b1a: 0f 85 1b fb ff ff jne40063b >>> >>> 0040063b : >>> 40063b: bf 06 8a 49 00 mov$0x498a06,%edi >>> ... >>> 40065e: e9 0d 05 00 00 jmpq 400b70 >>> >> >> Does -ffunction-sections help here? > > No. It just moves the cold section to a different spot (so the > function it appears part of is different, but same issue). > >> >> >>> Steven, I think the ideas you describe above about passing profile >>> information in DWARF and doing some special handling in gdb to use >>> that seem orthogonal to this. >> >> It's not just about profile information but about what function some >> code belongs to. I just cannot believe that there isn't already >> something in DWARF to mark address ranges as being part of a given >> function. > > Cary, any ideas here? I spoke with Cary a little bit (Cary, feel free to add to this). Regarding this question and the follow-up email: > Isn't this something that should be expressed in DWARF with > DW_AT_ranges? See DWARF4, section 2.17, > > Does GCC generate such ranges? GCC does generate these ranges. However, according to Cary many tools do not rely on dwarf info for locating the corresponding function name, they just look at the symbols to identify what function an address resides in. Nor would we want tools such as objdump and profilers to rely on dwarf for locating the function names as this would not work for binaries that were generated without -g options or had their debug info stripped. > >> >> >>> Given that, is this patch ok for trunk (pending successful re-runs of >>> regression testing) >> >> >> IMHO, no. This is conceptually wrong. You create artificial named >> labels in the compiler that might clash with user labels. Is that >> likely to happen? No. But it could and therefore the approach of >> adding functionname.cold labels is wrong. > > Is there a format for compiler-defined labels that would not be able > to clash with other user-generated labels? My understanding is that the "." in the generated name ensures that it will not clash with user-generated labels which cannot contain ".". So this should not be an issue. Thanks, Teresa > > Teresa > >> >> Ciao! >> Steven > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson wrote: > On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka wrote: >>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka wrote: >>> >> I have a warning like that already in drop_profile(). Is that >>> > >>> > I think it should be warning (or silent) for COMDAT and error/note for >>> > other functions (depending on flag_profile_correction). >>> > I guess drop_profile is better place for it indeed. >>> >>> I don't want to warn for COMDAT since this is currently an expected >>> issue in that case, so I'm afraid it will be too noisy (but there is a >>> note emitted to the dump file to track how often this occurs). So >>> currently the warning is only emitted in cases where we don't expect >>> it to occur (e.g. non-comdat). >> >> I see, I misread it. >> The non-comdat warning IMO ought go the same way as the other profile >> confussions. >> I guess something like >> if (!flag_profile_correction) >> error (...); >> like existing cases in profile.c > > Ok, will do (emitting a note to the dump file as in other cases in > profile.c that do this same thing). > >>> >>> I didn't find any warnings being emitted when running this for spec or >>> internal benchmarks, so how about instead of a warning, I handle it >>> like you suggest above: depending on the setting of >>> flag_profile_correction either error or emit a note to the dump file >>> under MSG_MISSED_OPTIMIZATION? >> >> Yep, lets just go with error with !flag_profile_correction and being silent >> otherwise. >> If we want to go with warnings, we need to change all the similar places >> in profile.c and elsewhere. >>> >>> Ah - I just realized I am already checking profile_status_for_function >>> above and adding to the worklist if it is still PROFILE_READ. Since I >>> call drop_profile before adding to the worklist, we can't end up >>> adding multiple times. So I don't think there is currently an issue >>> with this. >> >> Yep, indeed. >> >> The patch is OK with the error message as discussed above. > > Ok, will do that and fix a couple of minor issues mentioned by Steven > (missing comment and incorrect indentation in one spot). Will retest > just to make sure I didn't miss any warnings which will now be errors. > > Thanks, > Teresa > >> >> Thanks. >> Honza >>> >>> Thanks, >>> Teresa >>> >>> > >>> > Honza >>> >> >>> >> Thanks, >>> >> Teresa >>> >> >>> >> > >>> >> > OK, with these changes. >>> >> > >>> >> > Honza This was committed earlier this morning as r204704. Then I hit the new error when doing an lto-bootstrap profiledbootstrap for an unrelated change. I'd like to change this error to a warning for now to avoid breaking the lto profiledbootstrap while I investigate. I don't think this really needs to be a hard error at this point. Running regression tests and lto profiledbootstrap right now. Ok for trunk if testing succeeds? 2013-11-12 Teresa Johnson * predict.c (drop_profile): Error is currently too strict. Index: predict.c ======= --- predict.c (revision 204704) +++ predict.c (working copy) @@ -2792,8 +2792,8 @@ drop_profile (struct cgraph_node *node, bool hot) cgraph_node_name (node), node->order); } else -error ("Missing counts for called function %s/%i", - cgraph_node_name (node), node->order); +warning (0, "Missing counts for called function %s/%i", + cgraph_node_name (node), node->order); } profile_status_for_function (fn) Thanks, Teresa >>> >> >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
On Tue, Nov 12, 2013 at 11:08 AM, Teresa Johnson wrote: > On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson wrote: >> On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka wrote: >>>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka wrote: >>>> >> I have a warning like that already in drop_profile(). Is that >>>> > >>>> > I think it should be warning (or silent) for COMDAT and error/note for >>>> > other functions (depending on flag_profile_correction). >>>> > I guess drop_profile is better place for it indeed. >>>> >>>> I don't want to warn for COMDAT since this is currently an expected >>>> issue in that case, so I'm afraid it will be too noisy (but there is a >>>> note emitted to the dump file to track how often this occurs). So >>>> currently the warning is only emitted in cases where we don't expect >>>> it to occur (e.g. non-comdat). >>> >>> I see, I misread it. >>> The non-comdat warning IMO ought go the same way as the other profile >>> confussions. >>> I guess something like >>> if (!flag_profile_correction) >>> error (...); >>> like existing cases in profile.c >> >> Ok, will do (emitting a note to the dump file as in other cases in >> profile.c that do this same thing). >> >>>> >>>> I didn't find any warnings being emitted when running this for spec or >>>> internal benchmarks, so how about instead of a warning, I handle it >>>> like you suggest above: depending on the setting of >>>> flag_profile_correction either error or emit a note to the dump file >>>> under MSG_MISSED_OPTIMIZATION? >>> >>> Yep, lets just go with error with !flag_profile_correction and being silent >>> otherwise. >>> If we want to go with warnings, we need to change all the similar places >>> in profile.c and elsewhere. >>>> >>>> Ah - I just realized I am already checking profile_status_for_function >>>> above and adding to the worklist if it is still PROFILE_READ. Since I >>>> call drop_profile before adding to the worklist, we can't end up >>>> adding multiple times. So I don't think there is currently an issue >>>> with this. >>> >>> Yep, indeed. >>> >>> The patch is OK with the error message as discussed above. >> >> Ok, will do that and fix a couple of minor issues mentioned by Steven >> (missing comment and incorrect indentation in one spot). Will retest >> just to make sure I didn't miss any warnings which will now be errors. >> >> Thanks, >> Teresa >> >>> >>> Thanks. >>> Honza >>>> >>>> Thanks, >>>> Teresa >>>> >>>> > >>>> > Honza >>>> >> >>>> >> Thanks, >>>> >> Teresa >>>> >> >>>> >> > >>>> >> > OK, with these changes. >>>> >> > >>>> >> > Honza > > This was committed earlier this morning as r204704. Then I hit the new > error when doing an lto-bootstrap profiledbootstrap for an unrelated > change. I'd like to change this error to a warning for now to avoid > breaking the lto profiledbootstrap while I investigate. I don't think > this really needs to be a hard error at this point. More info on the lto bootstrap failure: /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1: warning: Missing counts for called function pex_child_error.isra.1/73 [enabled by default] } This is an error handling routine that invokes _exit. Looking at the ipa-profile dump, I can see that all the counts read in for pex_child_error have the value 0. But there is a non-zero callsite, that not surprisingly complains of a profile insanity in the bb's outgoing edge weights, since pex_child_error never returns: ;; basic block 38, loop depth 1, count 192, freq 5000 ;; Invalid sum of outgoing counts 0, should be 192 ... pex_child_error.isra.1D.5005 (_92, executable_40(D), [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c : 677] "execv", _91); ;;succ: 3 [100.0%] (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE) Not sure why the counts were all 0 for this routine though, I wouldn't think the early exit should prevent us from updating the counts. But IMO the best thing to do here is to issue a warning, since I don't want more issues like this to cause compile errors when we handled them fine before. Let me kn
Re: [PATCH] decide edge's hotness when there is profile info
On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka wrote: >> More info on the lto bootstrap failure: >> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1: >> warning: Missing counts for called function pex_child_error.isra.1/73 >> [enabled by default] >> } >> >> This is an error handling routine that invokes _exit. Looking at the >> ipa-profile dump, I can see that all the counts read in for >> pex_child_error have the value 0. But there is a non-zero callsite, >> that not surprisingly complains of a profile insanity in the bb's >> outgoing edge weights, since pex_child_error never returns: >> >> ;; basic block 38, loop depth 1, count 192, freq 5000 >> ;; Invalid sum of outgoing counts 0, should be 192 >> ... >> pex_child_error.isra.1D.5005 (_92, executable_40(D), >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c : >> 677] "execv", _91); >> ;;succ: 3 [100.0%] (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE) >> >> Not sure why the counts were all 0 for this routine though, I wouldn't >> think the early exit should prevent us from updating the counts. But >> IMO the best thing to do here is to issue a warning, since I don't >> want more issues like this to cause compile errors when we handled >> them fine before. >> >> Let me know if the patch below is ok. > > OK, so _exit actually makes us to miss streaming of the path leading to the > error message? Handling of vfork is somewhat broken, as I noticed with Martin > Liska. One problem is that gcov_exit is not clearing the counts. With vfork > being used in addition to execvs we stream out the data but the counts stays > in > the oriignal process memory and then are streamed again. If execve fails and > leads to _exit I think we can get the miscompare you see. > > Does calling gcov_clear within gcov_exit help? (It is what I have in our > tree) I thought this was already handled by the __gcov_execv wrapper around execv which invokes __gcov_flush that in turns does a gcov_exit followed by gcov_clear? I think the issue is that we register gcov_exit with atexit(), but _exit does not execute any atexit functions by definition. So that would explain why the counters from pex_child_error didn't get dumped. The caller bb invokes execv before invoking pex_child_error: execv (executable, to_ptr32 (argv)); pex_child_error (obj, executable, "execv", errno); So the counters of the caller bb are dumped (hence the caller bb has non-zero counts) and cleared, and the pex_child_error does not dump its counters since it uses _exit instead of exit and therefore doesn't invoke gcov_exit atexit. If that is what is going on, then I don't see how adding a call to gcov_clear to gcov_exit would help. Dealing with situations like this is why I think it is better to convert this to a warning. Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] decide edge's hotness when there is profile info
On Tue, Nov 12, 2013 at 2:35 PM, Jan Hubicka wrote: >> On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka wrote: >> >> More info on the lto bootstrap failure: >> >> >> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1: >> >> warning: Missing counts for called function pex_child_error.isra.1/73 >> >> [enabled by default] >> >> } >> >> >> >> This is an error handling routine that invokes _exit. Looking at the >> >> ipa-profile dump, I can see that all the counts read in for >> >> pex_child_error have the value 0. But there is a non-zero callsite, >> >> that not surprisingly complains of a profile insanity in the bb's >> >> outgoing edge weights, since pex_child_error never returns: >> >> >> >> ;; basic block 38, loop depth 1, count 192, freq 5000 >> >> ;; Invalid sum of outgoing counts 0, should be 192 >> >> ... >> >> pex_child_error.isra.1D.5005 (_92, executable_40(D), >> >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c : >> >> 677] "execv", _91); >> >> ;;succ: 3 [100.0%] >> >> (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE) >> >> >> >> Not sure why the counts were all 0 for this routine though, I wouldn't >> >> think the early exit should prevent us from updating the counts. But >> >> IMO the best thing to do here is to issue a warning, since I don't >> >> want more issues like this to cause compile errors when we handled >> >> them fine before. >> >> >> >> Let me know if the patch below is ok. >> > >> > OK, so _exit actually makes us to miss streaming of the path leading to the >> > error message? Handling of vfork is somewhat broken, as I noticed with >> > Martin >> > Liska. One problem is that gcov_exit is not clearing the counts. With >> > vfork >> > being used in addition to execvs we stream out the data but the counts >> > stays in >> > the oriignal process memory and then are streamed again. If execve fails >> > and >> > leads to _exit I think we can get the miscompare you see. >> > >> > Does calling gcov_clear within gcov_exit help? (It is what I have in our >> > tree) >> >> I thought this was already handled by the __gcov_execv wrapper around >> execv which invokes __gcov_flush that in turns does a gcov_exit >> followed by gcov_clear? >> >> I think the issue is that we register gcov_exit with atexit(), but >> _exit does not execute any atexit functions by definition. So that >> would explain why the counters from pex_child_error didn't get dumped. >> The caller bb invokes execv before invoking pex_child_error: >> execv (executable, to_ptr32 (argv)); >> pex_child_error (obj, executable, "execv", errno); >> So the counters of the caller bb are dumped (hence the caller bb has >> non-zero counts) and cleared, and the pex_child_error does not dump >> its counters since it uses _exit instead of exit and therefore doesn't >> invoke gcov_exit atexit. > > Hmm, I see, the problem here is that execv gets BB splitted (and profile is > correct) > but before we get into ipa_profile the BBs get re-merged and we incorrectly > put > count of 1 to pex_child_error. > > I suppose this will always happen when you have function terminating program > (execve) before other function call. Perhaps we can warn only when the > difference in counts > is greater than number of train runs? Ok, that sounds good. Here is the new patch. Is this ok for trunk if testing (bootstrap regression and lto profiledbootstrap) succeeds? Thanks, Teresa 2013-11-13 Teresa Johnson * predict.c (drop_profile): Error is currently too strict. (handle_missing_profiles): Pass call_count to drop_profile. Index: predict.c === --- predict.c (revision 204704) +++ predict.c (working copy) @@ -2766,12 +2766,17 @@ estimate_loops (void) } /* Drop the profile for NODE to guessed, and update its frequency based on - whether it is expected to be HOT. */ + whether it is expected to be hot given the CALL_COUNT. */ static void -drop_profile (struct cgraph_node *node, bool hot) +drop_profile (struct cgraph_node *node, gcov_type call_count) { struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + /* In the case where this was called by another function with a + dropped profile, call_count will be 0. Since there are no +
[PATCH] Fix PR ipa/58862 (overflow in edge_badness computation)
The following fixes PR ipa/58862, which caused failures in lto profiledbootstrap and in several spec cpu2006 profile-use builds. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that it fixed the lto profiledbootstrap and cpu2006 failures. Ok for trunk? Thanks, Teresa 2013-11-13 Teresa Johnson PR ipa/58862 * ipa-inline.c (edge_badness): Fix overflow. Index: ipa-inline.c === --- ipa-inline.c(revision 204703) +++ ipa-inline.c(working copy) @@ -909,7 +909,7 @@ edge_badness (struct cgraph_edge *edge, bool dump) /* Capping edge->count to max_count. edge->count can be larger than max_count if an inline adds new edges which increase max_count after max_count is computed. */ - int edge_count = edge->count > max_count ? max_count : edge->count; + gcov_type edge_count = edge->count > max_count ? max_count : edge->count; sreal_init (&relbenefit_real, relbenefit, 0); sreal_init (&growth_real, growth, 0); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
When testing with -freorder-blocks-and-partition enabled, I hit a verification failure in an LTO profiledbootstrap. Edge forwarding performed when we went into cfg layout mode after bb reordering (during compgotos) created a situation where a hot block was then dominated by a cold block and was therefore remarked as cold. Because bb reorder was complete at that point, it was not moved in the physical layout, and we incorrectly went in and out of the cold section multiple times. The following patch addresses that by fixing the layout when we move blocks to the cold section after bb reordering is complete. Tested with an LTO profiledbootstrap with -freorder-blocks-and-partition enabled. Ok for trunk? Thanks, Teresa 2013-11-15 Teresa Johnson * cfgrtl.c (fixup_partitions): Reorder blocks if necessary. Index: cfgrtl.c === --- cfgrtl.c(revision 204792) +++ cfgrtl.c(working copy) @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include "ggc.h" #include "tree-pass.h" #include "df.h" +#include "pointer-set.h" /* Holds the interesting leading and trailing notes for the function. Only applicable if the CFG is in cfglayout mode. */ @@ -2332,14 +2333,69 @@ fixup_partitions (void) forwarding and unreachable block deletion. */ vec bbs_to_fix = find_partition_fixes (false); + unsigned new_cold_count = bbs_to_fix.length(); + if (! new_cold_count) +return; + /* Do the partition fixup after all necessary blocks have been converted to cold, so that we only update the region crossings the minimum number of places, which can require forcing edges to be non fallthru. */ + struct pointer_set_t *bbs_moved_to_cold = pointer_set_create (); while (! bbs_to_fix.is_empty ()) { bb = bbs_to_fix.pop (); fixup_new_cold_bb (bb); + pointer_set_insert (bbs_moved_to_cold, bb); } + + /* If this happens after bb reordering is done we need to adjust the layout + via the next_bb pointers. Otherwise we will incorrectly have multiple + switches in and out of the cold section, which is illegal and will be + flagged by verify_flow_info. Currently, the compgotos pass goes back into + and out of cfglayout mode after bb reordering is complete. When + going into cfglayout mode, cfg optimizations may detect opportunities + for edge forwarding, which could leave a block in the hot section + dominated by a cold block, which will cause the above code to move + into to the cold section. By adjusting the next_bb/prev_bb pointers we + ensure that fixup_reorder_chain correctly relinks the blocks on exit + from cfglayout mode. */ + if (crtl->bb_reorder_complete) +{ + /* We should only be doing optimizations that require moving + blocks between sections after reordering when we are in cfglayout + mode. */ + gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT); + + /* Walk through in layout order, rip out each bb that was moved to + the cold section and physically move it to the end of the cold + section in layout order. Doing it in this order will guarantee + that any fallthrough blocks that were both reassigned to cold stay + in the right order. */ + unsigned moved_bbs = 0; + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb) +{ + if (BB_PARTITION (bb) != BB_COLD_PARTITION) +continue; + + /* If this was already a cold bb (it is not in the set + bbs_moved_to_cold), then we have hit the original cold + section blocks and we are done. */ + if (!pointer_set_contains (bbs_moved_to_cold, bb)) +{ + /* Make sure we processed all the newly cold bbs. */ + gcc_assert (moved_bbs == new_cold_count); + break; +} + + /* Remove bb from current layout position. */ + unlink_block (bb); + /* Move bb to the end of the cold section. */ + link_block (bb, EXIT_BLOCK_PTR->prev_bb); + + moved_bbs++; +} +} +pointer_set_destroy (bbs_moved_to_cold); } /* Verify, in the basic block chain, that there is at most one switch -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka wrote: >> When testing with -freorder-blocks-and-partition enabled, I hit a >> verification failure in an LTO profiledbootstrap. Edge forwarding >> performed when we went into cfg layout mode after bb reordering >> (during compgotos) created a situation where a hot block was then >> dominated by a cold block and was therefore remarked as cold. Because >> bb reorder was complete at that point, it was not moved in the >> physical layout, and we incorrectly went in and out of the cold >> section multiple times. >> >> The following patch addresses that by fixing the layout when we move >> blocks to the cold section after bb reordering is complete. >> >> Tested with an LTO profiledbootstrap with >> -freorder-blocks-and-partition enabled. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2013-11-15 Teresa Johnson >> >> * cfgrtl.c (fixup_partitions): Reorder blocks if necessary. > > computed_gotos just unfactors unified blocks that we use to avoid CFGs with > O(n^2) edges. This is mostly to avoid problems with nonlinearity of other > passes > and to reduce the quadratic memory use case to one function at a time. > > I wonder if it won't be cleaner to simply unfactor those just before > pass_reorder_blocks. > > Computed gotos are used e.g. in libjava interpreter to optimize the tight > interpretting > loop. I think those cases would benefit from having at least > scheduling/reordering > and alignments done right. > > Of course it depends on how bad the compile time implications are (I think in > addition > to libjava, there was a lucier's testcase that made us to go for this trick) , > but I would prefer it over ading yet another hack into cfgrtl... > We also may just avoid cfglayout cleanup_cfg while doing computed gotos... Note I haven't done an extensive check to see if compgotos is the only phase that goes back into cfglayout mode after bb reordering is done, that's just the one that hit this. Eventually it might be good to prevent going into cfglayout mode after bb reordering. For now we could either fix up the layout as I am doing here. Or as you suggest, prevent some cleanup/cfg optimization after bb reordering is done. I thought about preventing the forwarding optimization after bb reordering when splitting was on initially, but didn't want enabling -freorder-blocks-and-partition to unnecessarily prevent optimization. The reordering seemed reasonably straightforward so I went with that solution in this patch. Let me know if you'd rather have the solution of preventing the forwarding (or maybe all all of try_optimize_cfg to be safe) under -freorder-blocks-and-partition after bb reordering. Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka wrote: >> When testing with -freorder-blocks-and-partition enabled, I hit a >> verification failure in an LTO profiledbootstrap. Edge forwarding >> performed when we went into cfg layout mode after bb reordering >> (during compgotos) created a situation where a hot block was then >> dominated by a cold block and was therefore remarked as cold. Because >> bb reorder was complete at that point, it was not moved in the >> physical layout, and we incorrectly went in and out of the cold >> section multiple times. >> >> The following patch addresses that by fixing the layout when we move >> blocks to the cold section after bb reordering is complete. >> >> Tested with an LTO profiledbootstrap with >> -freorder-blocks-and-partition enabled. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2013-11-15 Teresa Johnson >> >> * cfgrtl.c (fixup_partitions): Reorder blocks if necessary. > > computed_gotos just unfactors unified blocks that we use to avoid CFGs with > O(n^2) edges. This is mostly to avoid problems with nonlinearity of other > passes > and to reduce the quadratic memory use case to one function at a time. > > I wonder if it won't be cleaner to simply unfactor those just before > pass_reorder_blocks. > > Computed gotos are used e.g. in libjava interpreter to optimize the tight > interpretting > loop. I think those cases would benefit from having at least > scheduling/reordering > and alignments done right. > > Of course it depends on how bad the compile time implications are (I think in > addition > to libjava, there was a lucier's testcase that made us to go for this trick) , > but I would prefer it over ading yet another hack into cfgrtl... > We also may just avoid cfglayout cleanup_cfg while doing computed gotos... I am testing a new patch right now that simply moves compgotos to just before bb reordering, and ads an assert to cfg_layout_initialize to detect when we attempt that after bb reordering. I looked at the other places that go into cfg layout and I think compgotos is currently the only one after bb reordering. >From a bb layout perspective it seems like it would be beneficial to do compgotos before layout. Was the current position just to try to reduce compile time by keeping the block unified as long as possible? For your comment "I think those cases would benefit from having at least scheduling/reordering and alignments done right." do you mean that it would be better from a code quality perspective for those to have compgotos done earlier before those passes? That seems to make sense to me as well. I'm doing an lto profiledbootstrap with the change right now. Is there other testing that should be done for this change? Can you point me at lucier's testcase that you refer to above? I found that PR15242 was the bug that inspired adding compgotos, but it is a small test case so I'm not sure what I will learn from trying that other than whether compgotos still kicks in as expected. Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher wrote: > On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote: >> From a bb layout perspective it seems like it would be beneficial to >> do compgotos before layout. Was the current position just to try to >> reduce compile time by keeping the block unified as long as possible? > > It was more a hack that got out of hand. Apparently it hurt some > interpreters (branch prediction!) when the unified computed goto is > not "unfactored". There was a PR for this, and the unfactoring code I > added only fixed part of the problem. > > >> For your comment "I think those cases would benefit from having at >> least scheduling/reordering and alignments done right." do you mean >> that it would be better from a code quality perspective for those to >> have compgotos done earlier before those passes? That seems to make >> sense to me as well. > > Theoretically: Yes, perhaps. In practice there isn't much to gain. > Unfactoring before bb-reorder is probably the most helpful thing, to > get better results for basic block alignment and placement. But > scheduling punts on computed gotos (or explodes in some non-linear > bottleneck). > > What used to help is profile-based branch speculation, i.e. > > if (*target_addr == most_frequent_target_addr) > goto most_frequent_target_add; > else > goto *target_addr; > > But I'm not sure if our value profiling based optimizations still have > this case. > > >> I'm doing an lto profiledbootstrap with the change right now. Is there >> other testing that should be done for this change? Can you point me at >> lucier's testcase that you refer to above? I found that PR15242 was >> the bug that inspired adding compgotos, but it is a small test case so >> I'm not sure what I will learn from trying that other than whether >> compgotos still kicks in as expected. > > ISTR it's http://gcc.gnu.org/PR26854 Ok, I have confirmed that moving compgotos earlier, to just before bb reordering, passes an LTO profiledbootstrap with -freorder-blocks-and-partition forced on (with my prior patch to fixup the layout removed). I also added an assert to ensure we aren't going into cfglayout mode after bb reordering is complete. Currently I am running a normal bootstrap and regression test without -freorder-blocks-and-partition force enabled. I confirmed that we still apply compgotos and successfully unfactor the computed goto in the testcase from PR15242. I also tried the two test cases in PR26854 (all.i and compiler.i) both with -O1 -fschedule-insns2 (which shouldn't actually be affected since compgotos/bbro not on) and with -O3 -fschedule-insns, as described in the bug report. I compared the time and mem reports before and after my change to the compgotos pass placement and there is no significant difference in the time or memory used from bb reordering or other downstream passes. Here is the patch I am testing with a normal bootstrap and regression test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming that passes? Thanks, Teresa 2013-11-18 Teresa Johnson * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we try to go into cfglayout after bb reordering. * gcc/passes.def: Move compgotos before bb reordering since it goes into cfglayout. Index: gcc/cfgrtl.c === --- gcc/cfgrtl.c(revision 204977) +++ gcc/cfgrtl.c(working copy) @@ -4204,6 +4204,8 @@ cfg_layout_initialize (unsigned int flags) rtx x; basic_block bb; + gcc_assert (!crtl->bb_reorder_complete); + initialize_original_copy_tables (); cfg_layout_rtl_register_cfg_hooks (); Index: gcc/passes.def === --- gcc/passes.def (revision 204977) +++ gcc/passes.def (working copy) @@ -387,6 +387,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_regrename); NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_fast_rtl_dce); + NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); NEXT_PASS (pass_leaf_regs); @@ -398,7 +399,6 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_stack_regs_run); POP_INSERT_PASSES () NEXT_PASS (pass_compute_alignments); - NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_variable_tracking); NEXT_PASS (pass_free_cfg); NEXT_PASS (pass_machine_reorg); > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
On Mon, Nov 18, 2013 at 12:23 PM, Jeff Law wrote: > On 11/18/13 12:33, Teresa Johnson wrote: >> >> On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher >> wrote: >>> >>> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote: >>>> >>>> From a bb layout perspective it seems like it would be beneficial to >>>> do compgotos before layout. Was the current position just to try to >>>> reduce compile time by keeping the block unified as long as possible? >>> >>> >>> It was more a hack that got out of hand. Apparently it hurt some >>> interpreters (branch prediction!) when the unified computed goto is >>> not "unfactored". There was a PR for this, and the unfactoring code I >>> added only fixed part of the problem. >>> >>> >>>> For your comment "I think those cases would benefit from having at >>>> least scheduling/reordering and alignments done right." do you mean >>>> that it would be better from a code quality perspective for those to >>>> have compgotos done earlier before those passes? That seems to make >>>> sense to me as well. >>> >>> >>> Theoretically: Yes, perhaps. In practice there isn't much to gain. >>> Unfactoring before bb-reorder is probably the most helpful thing, to >>> get better results for basic block alignment and placement. But >>> scheduling punts on computed gotos (or explodes in some non-linear >>> bottleneck). >>> >>> What used to help is profile-based branch speculation, i.e. >>> >>> if (*target_addr == most_frequent_target_addr) >>>goto most_frequent_target_add; >>> else >>>goto *target_addr; >>> >>> But I'm not sure if our value profiling based optimizations still have >>> this case. >>> >>> >>>> I'm doing an lto profiledbootstrap with the change right now. Is there >>>> other testing that should be done for this change? Can you point me at >>>> lucier's testcase that you refer to above? I found that PR15242 was >>>> the bug that inspired adding compgotos, but it is a small test case so >>>> I'm not sure what I will learn from trying that other than whether >>>> compgotos still kicks in as expected. >>> >>> >>> ISTR it's http://gcc.gnu.org/PR26854 >> >> >> Ok, I have confirmed that moving compgotos earlier, to just before bb >> reordering, passes an LTO profiledbootstrap with >> -freorder-blocks-and-partition forced on (with my prior patch to fixup >> the layout removed). I also added an assert to ensure we aren't going >> into cfglayout mode after bb reordering is complete. Currently I am >> running a normal bootstrap and regression test without >> -freorder-blocks-and-partition force enabled. >> >> I confirmed that we still apply compgotos and successfully unfactor >> the computed goto in the testcase from PR15242. >> >> I also tried the two test cases in PR26854 (all.i and compiler.i) both >> with -O1 -fschedule-insns2 (which shouldn't actually be affected since >> compgotos/bbro not on) and with -O3 -fschedule-insns, as described in >> the bug report. I compared the time and mem reports before and after >> my change to the compgotos pass placement and there is no significant >> difference in the time or memory used from bb reordering or other >> downstream passes. >> >> Here is the patch I am testing with a normal bootstrap and regression >> test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming >> that passes? >> >> Thanks, >> Teresa >> >> 2013-11-18 Teresa Johnson >> >> * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we >> try to go into cfglayout after bb reordering. >> * gcc/passes.def: Move compgotos before bb reordering >> since it goes into cfglayout. > > This is fine once the bootstrap/regression testing passes. The bootstrap/regression test passed. Added a comment as Honza suggested and committed as r204985. Thanks, Teresa > > Thanks, > jeff -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH i386] Enable -freorder-blocks-and-partition
This patch enables -freorder-blocks-and-partition by default for x86 at -O2 and up. It is showing some modest gains in cpu2006 performance with profile feedback and -O2 on an Intel Westmere system. Specifically, I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were configured with --enable-languages=all,obj-c++ and tested for both 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". It would be good to enable this for additional targets as a follow on, but it needs more testing for both correctness and performance on those other targets (i.e for correctness because I see a number of places in other config/*/*.c files that do some special handling under this option for different targets or simply disable it, so I am not sure how well-tested it is under different architectural constraints). Ok for trunk? Thanks, Teresa 2013-11-19 Teresa Johnson * common/config/i386/i386-common.c: Enable -freorder-blocks-and-partition at -O2 and up for x86. * opts.c (finish_options): Only warn if -freorder-blocks-and- partition was set on command line. Index: common/config/i386/i386-common.c === --- common/config/i386/i386-common.c(revision 205001) +++ common/config/i386/i386-common.c(working copy) @@ -789,6 +789,8 @@ static const struct default_options ix86_option_op { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, +/* Enable function splitting at -O2 and higher. */ +{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 }, /* Turn off -fschedule-insns by default. It tends to make the problem with not enough registers even worse. */ { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, Index: opts.c === --- opts.c (revision 205001) +++ opts.c (working copy) @@ -737,9 +737,10 @@ finish_options (struct gcc_options *opts, struct g && opts->x_flag_reorder_blocks_and_partition && (ui_except == UI_SJLJ || ui_except >= UI_TARGET)) { - inform (loc, - "-freorder-blocks-and-partition does not work " - "with exceptions on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) +inform (loc, +"-freorder-blocks-and-partition does not work " +"with exceptions on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } @@ -752,9 +753,10 @@ finish_options (struct gcc_options *opts, struct g && opts->x_flag_reorder_blocks_and_partition && (ui_except == UI_SJLJ || ui_except >= UI_TARGET)) { - inform (loc, - "-freorder-blocks-and-partition does not support " - "unwind info on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) +inform (loc, +"-freorder-blocks-and-partition does not support " +"unwind info on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } @@ -769,9 +771,10 @@ finish_options (struct gcc_options *opts, struct g && targetm_common.unwind_tables_default && (ui_except == UI_SJLJ || ui_except >= UI_TARGET { - inform (loc, - "-freorder-blocks-and-partition does not work " - "on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) +inform (loc, +"-freorder-blocks-and-partition does not work " +"on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH i386] Enable -freorder-blocks-and-partition
On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka wrote: > Martin, > can you, please, generate the updated systemtap with > -freorder-blocks-and-partition enabled? > > I am in favour of enabling this - it is usefull pass and it is pointless ot > have passes that are not enabled by default. > Is there reason why this would not work on other ELF target? Is it working > with Darwin and Windows? I don't know how to test these (I don't see any machines listed in the gcc compile farm of those types). For Windows, I assume you mean MinGW, which should be enabled as it is under i386. Should I disable it there and for Darwin? > >> This patch enables -freorder-blocks-and-partition by default for x86 >> at -O2 and up. It is showing some modest gains in cpu2006 performance >> with profile feedback and -O2 on an Intel Westmere system. Specifically, >> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk >> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. > > This actually sounds very good ;) > > Lets see how the systemtap graphs goes. If we will end up with problem > of too many accesses to cold section, I would suggest making cold section > subdivided into .unlikely and .unlikely.part (we could have better name) > with the second consisting only of unlikely parts of hot&normal functions. > > This should reduce the problems we are seeing with mistakely identifying > code to be cold because of roundoff errors (and it probably makes sense > in general, too). > We will however need to update gold and ld for that. Note that I don't think this would help much unless the linker is changed to move the cold split section close to the hot section. There is probably some fine-tuning we could do eventually in the linker under -ffunction-sections without putting the split portions in a separate section. I.e. clump the split parts together within unlikely. But hopefully this can all be done later on as follow-on work to boost the performance further. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal >> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were >> configured with --enable-languages=all,obj-c++ and tested for both >> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". >> >> It would be good to enable this for additional targets as a follow on, >> but it needs more testing for both correctness and performance on those >> other targets (i.e for correctness because I see a number of places >> in other config/*/*.c files that do some special handling under this >> option for different targets or simply disable it, so I am not sure >> how well-tested it is under different architectural constraints). >> >> Ok for trunk? >> >> Thanks, >> Teresa >> >> 2013-11-19 Teresa Johnson >> >> * common/config/i386/i386-common.c: Enable >> -freorder-blocks-and-partition at -O2 and up for x86. >> * opts.c (finish_options): Only warn if -freorder-blocks-and- >> partition was set on command line. > > You probably mis doc/invoke.texi update. > Thank you for working on this! Yes, thanks. Here is the patch with the invoke.texi update. Teresa 2013-11-19 Teresa Johnson * common/config/i386/i386-common.c: Enable -freorder-blocks-and-partition at -O2 and up for x86. * doc/invoke.texi: Update -freorder-blocks-and-partition default. * opts.c (finish_options): Only warn if -freorder-blocks-and- partition was set on command line. Index: common/config/i386/i386-common.c === --- common/config/i386/i386-common.c(revision 205001) +++ common/config/i386/i386-common.c(working copy) @@ -789,6 +789,8 @@ static const struct default_options ix86_option_op { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, +/* Enable function splitting at -O2 and higher. */ +{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 }, /* Turn off -fschedule-insns by default. It tends to make the problem with not enough registers even worse. */ { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, Index: doc/invoke.texi === --- doc/invoke.texi (revision 205001) +++ doc/invoke.texi (working copy) @@ -8172,6 +8172,8 @@ exception handling, for linkonce sections, for fun section attribute and on any architecture that does not support named sections. +Enabled for x86 at levels @option{-O2}, @option{-O3}. + @item -freorder-functions @opindex freorder-functions Reorder
Re: [PATCH i386] Enable -freorder-blocks-and-partition
On Tue, Nov 19, 2013 at 1:00 PM, Andi Kleen wrote: > Teresa Johnson writes: > >> This patch enables -freorder-blocks-and-partition by default for x86 >> at -O2 and up. It is showing some modest gains in cpu2006 performance >> with profile feedback and -O2 on an Intel Westmere system. Specifically, >> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk >> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. > > One thing that worries me is what this will do to profilers. > > I had to hack some assembler code using out of line sections > to able to handle the libunwind based perf dwarf unwinder. > > My understanding is that these out of line sections can be > only described properly in dwarf3, and there's still some > dwarf2 based unwinder code around. > > So this may cause problems with profiling and debugging. > > It's probably still a good idea, just may need some extra > care. > > -Andi Sri has approval for a patch that should address this by giving the split cold sections a label. It should go in today as well: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02143.html Thanks, Teresa > > -- > a...@linux.intel.com -- Speaking for myself only -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH i386] Enable -freorder-blocks-and-partition
On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law wrote: > On 11/19/13 10:24, Teresa Johnson wrote: >> >> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka wrote: >>> >>> Martin, >>> can you, please, generate the updated systemtap with >>> -freorder-blocks-and-partition enabled? >>> >>> I am in favour of enabling this - it is usefull pass and it is pointless >>> ot >>> have passes that are not enabled by default. >>> Is there reason why this would not work on other ELF target? Is it >>> working >>> with Darwin and Windows? >> >> >> I don't know how to test these (I don't see any machines listed in the >> gcc compile farm of those types). For Windows, I assume you mean >> MinGW, which should be enabled as it is under i386. Should I disable >> it there and for Darwin? >> >>> >>>> This patch enables -freorder-blocks-and-partition by default for x86 >>>> at -O2 and up. It is showing some modest gains in cpu2006 performance >>>> with profile feedback and -O2 on an Intel Westmere system. Specifically, >>>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk >>>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. >>> >>> >>> This actually sounds very good ;) >>> >>> Lets see how the systemtap graphs goes. If we will end up with problem >>> of too many accesses to cold section, I would suggest making cold section >>> subdivided into .unlikely and .unlikely.part (we could have better name) >>> with the second consisting only of unlikely parts of hot&normal >>> functions. >>> >>> This should reduce the problems we are seeing with mistakely identifying >>> code to be cold because of roundoff errors (and it probably makes sense >>> in general, too). >>> We will however need to update gold and ld for that. >> >> >> Note that I don't think this would help much unless the linker is >> changed to move the cold split section close to the hot section. There >> is probably some fine-tuning we could do eventually in the linker >> under -ffunction-sections without putting the split portions in a >> separate section. I.e. clump the split parts together within unlikely. >> But hopefully this can all be done later on as follow-on work to boost >> the performance further. >> >>>> >>>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal >>>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were >>>> configured with --enable-languages=all,obj-c++ and tested for both >>>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". >>>> >>>> It would be good to enable this for additional targets as a follow on, >>>> but it needs more testing for both correctness and performance on those >>>> other targets (i.e for correctness because I see a number of places >>>> in other config/*/*.c files that do some special handling under this >>>> option for different targets or simply disable it, so I am not sure >>>> how well-tested it is under different architectural constraints). >>>> >>>> Ok for trunk? >>>> >>>> Thanks, >>>> Teresa >>>> >>>> 2013-11-19 Teresa Johnson >>>> >>>> * common/config/i386/i386-common.c: Enable >>>> -freorder-blocks-and-partition at -O2 and up for x86. >>>> * opts.c (finish_options): Only warn if -freorder-blocks-and- >>>> partition was set on command line. >>> >>> >>> You probably mis doc/invoke.texi update. >>> Thank you for working on this! >> >> >> Yes, thanks. Here is the patch with the invoke.texi update. >> >> Teresa >> >> >> 2013-11-19 Teresa Johnson >> >> * common/config/i386/i386-common.c: Enable >> -freorder-blocks-and-partition at -O2 and up for x86. >> * doc/invoke.texi: Update -freorder-blocks-and-partition default. >> * opts.c (finish_options): Only warn if -freorder-blocks-and- >> partition was set on command line. > > This is fine. Glad to see we're getting some good benefits from this code. Ok, thanks. It is in as r205058. > > FWIW, I would expect the PA to show benefits from this work -- HP showed > good results for block positioning and procedure splitting eons ago. A > secondary effect you would expect to see would be an increase in the number > of long branch stubs (static count), but a decrease in the number of those > stubs that actually execute. Measuring those effects would obviously be > out-of-scope. > > I totally understand that the PA is irrelevant these days, but seeing good > results on another architecture would go a long way to answering the > question "should this be enabled by default across the board?" Yep, we saw benefits for IPF on hp-ux as well. It would be great to eventually enable this across the board and only selectively disable. I know there were people interested in trying this with ARM, that would be a good relevant arch to try this with now to see if it can be enabled more widely. Teresa > > jeff > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR target/59233 (ICE on *-apple-darwin* with -m32 exposed by -freorder-blocks-and-partition)
On Thu, Nov 21, 2013 at 3:31 PM, Steven Bosscher wrote: > On Thu, Nov 21, 2013 at 8:57 PM, Teresa Johnson wrote: >> There are two problems I am fixing here (see >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59233 for full analysis). >> >> The first is the original ICE in crossjump optimization, which was >> exposed by enabling -freorder-blocks-and-partition which was >> conservatively preventing some upstream block combining optimizations. >> The issue is that the block ended in a NOTE_INSN_DELETED, which >> old_insns_match_p could not handle. I am fixing this by passing >> old_insns_match_p the last instruction in the block that it knows how >> to handle. >> >> The second issue this exposed was that we were unnecessarily marking >> landing pad branches EDGE_PRESERVE since >> flag_reorder_blocks_and_partition was on, even though this was -Os and >> we will gate partitioning off. > > So we keep an edge to a landing pad... Why is this a problem? It just seems silly to limit optimization unnecessarily when we can deduce that we won't actually attempt partitioning. > > >> * bb-reorder.c (do_partition_blocks): New function. >> (gate_handle_partition_blocks): Call do_partition_blocks. >> * bb-reorder.h (do_partition_blocks): Declare. >> * except.c (dw2_build_landing_pads): Call do_partition_blocks. > > Exporting this gate function from bb-reorder.c shouldn't be necessary. > Better fix this at the root, in except.c. How would we do that? I didn't want to clone the logic that determines whether we will partition, so I thought outlining the checks from the gate function and exporting them would be cleaner. But I am open to suggestions. > > >> * cfgcleanup.c (outgoing_edges_match): Walk up past note instructions >> not understood by old_insns_match_p. > > This part is OK. Ok, let me get this part in first, since it will address the ICE. Thanks, Teresa > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Fix PR target/59233 (ICE on *-apple-darwin* with -m32 exposed by -freorder-blocks-and-partition)
There are two problems I am fixing here (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59233 for full analysis). The first is the original ICE in crossjump optimization, which was exposed by enabling -freorder-blocks-and-partition which was conservatively preventing some upstream block combining optimizations. The issue is that the block ended in a NOTE_INSN_DELETED, which old_insns_match_p could not handle. I am fixing this by passing old_insns_match_p the last instruction in the block that it knows how to handle. The second issue this exposed was that we were unnecessarily marking landing pad branches EDGE_PRESERVE since flag_reorder_blocks_and_partition was on, even though this was -Os and we will gate partitioning off. Added a method for checking whether we will really invoke partitioning. Fixes the original problem (confirmed via a cross-compiler for x86_64-apple-darwin10) and was bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2013-11-21 Teresa Johnson * bb-reorder.c (do_partition_blocks): New function. (gate_handle_partition_blocks): Call do_partition_blocks. * bb-reorder.h (do_partition_blocks): Declare. * cfgcleanup.c (outgoing_edges_match): Walk up past note instructions not understood by old_insns_match_p. * except.c (dw2_build_landing_pads): Call do_partition_blocks. Index: bb-reorder.c === --- bb-reorder.c(revision 205063) +++ bb-reorder.c(working copy) @@ -2532,8 +2532,13 @@ make_pass_duplicate_computed_gotos (gcc::context * return new pass_duplicate_computed_gotos (ctxt); } -static bool -gate_handle_partition_blocks (void) +/* This function will return true if hot/cold partitioning will + be performed, and is declared non-static so that other passes + (namely landing pad expansion in except.c) can more accurately + determine whether they need to assume that partitioning will occur. */ + +bool +do_partition_blocks (void) { /* The optimization to partition hot/cold basic blocks into separate sections of the .o file does not work well with linkonce or with @@ -2548,6 +2553,12 @@ make_pass_duplicate_computed_gotos (gcc::context * && !user_defined_section_attribute); } +static bool +gate_handle_partition_blocks (void) +{ + return do_partition_blocks (); +} + /* This function is the main 'entrance' for the optimization that partitions hot and cold basic blocks into separate sections of the .o file (to improve performance and cache locality). Ideally it Index: bb-reorder.h === --- bb-reorder.h(revision 205063) +++ bb-reorder.h(working copy) @@ -37,4 +37,5 @@ extern int get_uncond_jump_length (void); extern void insert_section_boundary_note (void); +extern bool do_partition_blocks (void); #endif Index: cfgcleanup.c === --- cfgcleanup.c(revision 205063) +++ cfgcleanup.c(working copy) @@ -1743,12 +1743,20 @@ outgoing_edges_match (int mode, basic_block bb1, b } } + /* Find the last non-debug non-note instruction in each bb, except + stop when we see the NOTE_INSN_BASIC_BLOCK, as old_insns_match_p + handles that case specially. old_insns_match_p does not handle + other types of instruction notes. */ rtx last1 = BB_END (bb1); rtx last2 = BB_END (bb2); - if (DEBUG_INSN_P (last1)) -last1 = prev_nondebug_insn (last1); - if (DEBUG_INSN_P (last2)) -last2 = prev_nondebug_insn (last2); + while (!NOTE_INSN_BASIC_BLOCK_P (last1) && + (DEBUG_INSN_P (last1) || NOTE_P (last1))) +last1 = PREV_INSN (last1); + while (!NOTE_INSN_BASIC_BLOCK_P (last2) && + (DEBUG_INSN_P (last2) || NOTE_P (last2))) +last2 = PREV_INSN (last2); + gcc_assert (last1 && last2); + /* First ensure that the instructions match. There may be many outgoing edges so this test is generally cheaper. */ if (old_insns_match_p (mode, last1, last2) != dir_both) Index: except.c === --- except.c(revision 205063) +++ except.c(working copy) @@ -125,6 +125,7 @@ along with GCC; see the file COPYING3. If not see #include "except.h" #include "hard-reg-set.h" #include "basic-block.h" +#include "bb-reorder.h" #include "output.h" #include "dwarf2asm.h" #include "dwarf2out.h" @@ -1014,7 +1015,7 @@ dw2_build_landing_pads (void) new landing pads later, which means that we need to hold on to the post-landing-pad block. Prevent it from being merged away. We'll remove this bit after partitioning. */ - if (flag_reorder_blocks_and_partition) + if (do_partition_blocks ()) e_flags |= EDGE_P
[Google] Increase max-lipo-mem default to 4000000
The current default is 300, but increasing to 4M gives better performance on internal applications and benchmarks. Passes regression testing. Ok for google/4_8? Thanks, Teresa 2013-11-24 Teresa Johnson * params.def (PARAM_MAX_LIPO_MEMORY): Increase default to 4M. Index: params.def === --- params.def (revision 205234) +++ params.def (working copy) @@ -972,7 +972,7 @@ DEFPARAM (PARAM_MAX_LIPO_GROUP, DEFPARAM (PARAM_MAX_LIPO_MEMORY, "max-lipo-mem", "don't import aux files if memory consumption exceeds this value", - 300, 0, 0) + 400, 0, 0) /* In LIPO profile-gen, use this parameter to record the cutoff value used at profile collection runtime. */ -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Handle missing BINFO for LIPO
We may have missing BINFO on a type if that type is a builtin, since in LIPO mode we will reset builtin types to their original tree nodes before parsing subsequent modules. Handle incomplete information by returning false so we won't put an entry in the type inheritance graph for optimization. Passes regression tests. Ok for google branches? Teresa 2014-10-07 Teresa Johnson Google ref b/16511102. * ipa-devirt.c (polymorphic_type_binfo_p): Handle missing BINFO. Index: ipa-devirt.c === --- ipa-devirt.c(revision 215830) +++ ipa-devirt.c(working copy) @@ -177,7 +177,10 @@ static inline bool polymorphic_type_binfo_p (tree binfo) { /* See if BINFO's type has an virtual table associtated with it. */ - return BINFO_VTABLE (TYPE_BINFO (BINFO_TYPE (binfo))); + tree type_binfo = TYPE_BINFO (BINFO_TYPE (binfo)); + if (!type_binfo) +return false; + return BINFO_VTABLE (type_binfo); } /* One Definition Rule hashtable helpers. */ -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Fix PR bootstrap/63432 in jump threading
This patch addresses PR bootstrap/63432 which was an insanity in the probabilities created during jump threading. This was caused by threading multiple times along the same path leading to the second jump thread path being corrupted, which in turn caused the profile update code to fail. There was code in mark_threaded_blocks that intended to catch and suppress these cases of threading multiple times along the same path, but it was sensitive to the order in which the paths were discovered and recorded. This patch changes the detection to do two passes and removes the ordering sensitivity. Also, while fixing this I realized that the previous method of checking the entry BB's profile count was not an accurate way to determine whether the function has any non-zero profile counts. Created a new routine to walk the path and see if it has all zero profile counts and estimated frequencies. Bootstrapped and tested on x86_64-unknown-linux-gnu. Also did an LTO profiledbootstrap. Ok for trunk? Thanks, Teresa 2014-10-07 Teresa Johnson PR bootstrap/63432. * tree-ssa-threadupdate.c (estimated_freqs_path): New function. (ssa_fix_duplicate_block_edges): Invoke it. (mark_threaded_blocks): Make two passes to avoid ordering dependences. Index: tree-ssa-threadupdate.c === --- tree-ssa-threadupdate.c (revision 215830) +++ tree-ssa-threadupdate.c (working copy) @@ -959,6 +959,43 @@ update_joiner_offpath_counts (edge epath, basic_bl } +/* Check if the paths through RD all have estimated frequencies but zero + profile counts. This is more accurate than checking the entry block + for a zero profile count, since profile insanities sometimes creep in. */ + +static bool +estimated_freqs_path (struct redirection_data *rd) +{ + edge e = rd->incoming_edges->e; + vec *path = THREAD_PATH (e); + edge ein; + edge_iterator ei; + bool non_zero_freq = false; + FOR_EACH_EDGE (ein, ei, e->dest->preds) +{ + if (ein->count) +return false; + non_zero_freq |= ein->src->frequency != 0; +} + + for (unsigned int i = 1; i < path->length (); i++) +{ + edge epath = (*path)[i]->e; + if (epath->src->count) +return false; + non_zero_freq |= epath->src->frequency != 0; + edge esucc; + FOR_EACH_EDGE (esucc, ei, epath->src->succs) +{ + if (esucc->count) +return false; + non_zero_freq |= esucc->src->frequency != 0; +} +} + return non_zero_freq; +} + + /* Invoked for routines that have guessed frequencies and no profile counts to record the block and edge frequencies for paths through RD in the profile count fields of those blocks and edges. This is because @@ -1058,9 +1095,11 @@ ssa_fix_duplicate_block_edges (struct redirection_ data we first take a snapshot of the existing block and edge frequencies by copying them into the empty profile count fields. These counts are then used to do the incremental updates, and cleared at the end of this - routine. */ + routine. If the function is marked as having a profile, we still check + to see if the paths through RD are using estimated frequencies because + the routine had zero profile counts. */ bool do_freqs_to_counts = (profile_status_for_fn (cfun) != PROFILE_READ - || !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count); + || estimated_freqs_path (rd)); if (do_freqs_to_counts) freqs_to_counts_path (rd); @@ -2077,35 +2116,52 @@ mark_threaded_blocks (bitmap threaded_blocks) /* Now iterate again, converting cases where we want to thread through a joiner block, but only if no other edge on the path - already has a jump thread attached to it. */ + already has a jump thread attached to it. We do this in two passes, + to avoid situations where the order in the paths vec can hide overlapping + threads (the path is recorded on the incoming edge, so we would miss + cases where the second path starts at a downstream edge on the same + path). First record all joiner paths, deleting any in the unexpected + case where there is already a path for that incoming edge. */ for (i = 0; i < paths.length (); i++) { vec *path = paths[i]; if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK) +{ + /* Attach the path to the starting edge if none is yet recorded. */ + if ((*path)[0]->e->aux == NULL) +(*path)[0]->e->aux = path; + else if (dump_file && (dump_flags & TDF_DETAILS)) + dump_jump_thread_path (dump_file, *path, false); +} +} + /* Second, look for paths that have any other jump thread attached to + them, and either finish converting them or cancel them. */ + for (i =
[Google] Put time profiling under param, disable by default
Guard the new time profiling under a parameter, off by default for Google since it isn't used and the extra instrumentation was adding bloat. Enabled time profiling for two test cases that explicitly test it. For func_reorder_gold_plugin_1.C the change in instrumentation had an effect on the ordering of the callgraph group in the linker plugin dump. The test case already appeared to try to catch different orders, but missed the new one, so I added a leading ".*" to catch it (already had a trailing ".*"). old: # Callgraph group : main _Z3barv _Z3foov new: # Callgraph group : _Z3foov _Z3barv main Passes regression tests. Ok for google/4_9? 2014-10-12 Teresa Johnson Google ref b/17945455. gcc/ * params.def (PARAM_PROFILE_VALUES_TIME): New parameter. * value-prof.c (gimple_find_values_to_profile): Use new param. gcc/testsuite: * gcc.dg/tree-prof/time-profiler-1.c: Use new parameter. * gcc.dg/tree-prof/time-profiler-2.c: Ditto. * g++.dg/tree-prof/func_reorder_gold_plugin_1.C: Fix dumper scan. Index: params.def === --- params.def (revision 215830) +++ params.def (working copy) @@ -1101,6 +1101,11 @@ DEFPARAM (PARAM_PROFILE_GENERATE_SAMPLING_PERIOD, "sampling rate with -fprofile-generate-sampling", 100, 0, 20) +DEFPARAM (PARAM_PROFILE_VALUES_TIME, + "profile-values-time", + "Enable time profiling when value profiling", + 0, 0, 1) + DEFPARAM (PARAM_COVERAGE_CALLBACK, "coverage-callback", "callback a user-define function when for arc counter increments.", Index: value-prof.c === --- value-prof.c(revision 215830) +++ value-prof.c(working copy) @@ -2355,7 +2355,10 @@ gimple_find_values_to_profile (histogram_values *v for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) gimple_values_to_profile (gsi_stmt (gsi), values); - values->safe_push (gimple_alloc_histogram_value (cfun, HIST_TYPE_TIME_PROFILE, 0, 0)); + if (PARAM_VALUE (PARAM_PROFILE_VALUES_TIME)) +values->safe_push (gimple_alloc_histogram_value (cfun, + HIST_TYPE_TIME_PROFILE, + 0, 0)); FOR_EACH_VEC_ELT (*values, i, hist) { Index: testsuite/gcc.dg/tree-prof/time-profiler-1.c === --- testsuite/gcc.dg/tree-prof/time-profiler-1.c(revision 215830) +++ testsuite/gcc.dg/tree-prof/time-profiler-1.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-O2 -fdump-ipa-profile" } */ +/* { dg-options "-O2 -fdump-ipa-profile --param=profile-values-time=1" } */ __attribute__ ((noinline)) int foo() Index: testsuite/gcc.dg/tree-prof/time-profiler-2.c === --- testsuite/gcc.dg/tree-prof/time-profiler-2.c(revision 215830) +++ testsuite/gcc.dg/tree-prof/time-profiler-2.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-O2 -fdump-ipa-profile" } */ +/* { dg-options "-O2 -fdump-ipa-profile --param=profile-values-time=1" } */ #include Index: testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C === --- testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C (revision 215830) +++ testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C (working copy) @@ -38,7 +38,7 @@ int main () /* { dg-final-use { scan-assembler "\.string \"Weight 1000 1000\"" } } */ /* { dg-final-use { scan-assembler "\.string \"Weight 1001 1001\"" } } */ /* Check if main is next to foo or bar */ -/* { dg-final-use { scan-file linker.dump "Callgraph group : *\(_Z3foov main|main _Z3foov|_Z3barv main|main _Z3barv\).*\n" } } */ +/* { dg-final-use { scan-file linker.dump "Callgraph group : *\.*(_Z3foov main|main _Z3foov|_Z3barv main|main _Z3barv\).*\n" } } */ /* { dg-final-use { scan-file linker.dump ".text\..*\._Z9notcalledv entry count = 0 computed = 0 max count = 0" } } */ /* { dg-final-use { scan-file linker.dump "Moving .* section\\(s\\) to new segment" } } */ /* { dg-final-use { cleanup-saved-temps } } */ -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Better tolerance of incoming profile insanities in jump threading
The below patch fixes the overflow detection when recomputing probabilities after jump threading, in case of incoming profile insanities. It detects more cases where the computation will overflow not only the max probability but the max int and possibly wrap around. LTO profilebootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2014-10-14 Teresa Johnson PR bootstrap/63432 * tree-ssa-threadupdate.c (recompute_probabilities): Better overflow checking. Index: tree-ssa-threadupdate.c === --- tree-ssa-threadupdate.c (revision 216150) +++ tree-ssa-threadupdate.c (working copy) @@ -871,21 +871,23 @@ recompute_probabilities (basic_block bb) edge_iterator ei; FOR_EACH_EDGE (esucc, ei, bb->succs) { - if (bb->count) + if (!bb->count) +continue; + + /* Prevent overflow computation due to insane profiles. */ + if (esucc->count < bb->count) esucc->probability = GCOV_COMPUTE_SCALE (esucc->count, bb->count); - if (esucc->probability > REG_BR_PROB_BASE) -{ - /* Can happen with missing/guessed probabilities, since we -may determine that more is flowing along duplicated -path than joiner succ probabilities allowed. -Counts and freqs will be insane after jump threading, -at least make sure probability is sane or we will -get a flow verification error. -Not much we can do to make counts/freqs sane without -redoing the profile estimation. */ - esucc->probability = REG_BR_PROB_BASE; - } + else +/* Can happen with missing/guessed probabilities, since we + may determine that more is flowing along duplicated + path than joiner succ probabilities allowed. + Counts and freqs will be insane after jump threading, + at least make sure probability is sane or we will + get a flow verification error. + Not much we can do to make counts/freqs sane without + redoing the profile estimation. */ +esucc->probability = REG_BR_PROB_BASE; } } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Treat artificial from aux modules as non-external
The following patch fixes a LIPO issue with a virtual implicitly instantiated destructor from an aux module that was needed when we devirtualized a call from the primary module, but it was not available (not promoted to global) in aux module since it is DECL_ARTIFICIAL. Fixes compilation issue, passes regression tests. Ok for google/4_9? Thanks, Teresa 2014-10-16 Teresa Johnson Google ref b/17971995. * l-ipo.c (cgraph_is_aux_decl_external): Artificial functions may not be available in primary module. Index: l-ipo.c === --- l-ipo.c (revision 216286) +++ l-ipo.c (working copy) @@ -1140,8 +1140,11 @@ cgraph_is_aux_decl_external (struct cgraph_node *n /* Comdat or weak functions in aux modules are not external -- there is no guarantee that the definitition will be emitted - in the primary compilation of this auxiliary module. */ - if (DECL_COMDAT (decl) || DECL_WEAK (decl)) + in the primary compilation of this auxiliary module. + Functions marked artificial (e.g. an implicitly instantiated virtual + destructor) are also not guaranteed to be available in the primary module, + as they are not promoted by process_module_scope_static_func. */ + if (DECL_COMDAT (decl) || DECL_WEAK (decl) || DECL_ARTIFICIAL (decl)) return false; /* virtual functions won't be deleted in the primary module. */ -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Disable -fdevirtualize by default
Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Patch attached. Passes internal testing and regression tests. Ok for google/4_9? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Passes internal testing and regression tests. Ok for google/4_9? 2014-10-18 Teresa Johnson gcc/: Google ref b/17945455 * opts.c (finish_options): Disable -fdevirtualize gcc/testsuite/: Google ref b/17945455 * testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test. * testsuite/g++.dg/ipa/devirt-11.C: Ditto. * testsuite/g++.dg/ipa/devirt-12.C: Ditto. * testsuite/g++.dg/ipa/devirt-13.C: Ditto. * testsuite/g++.dg/ipa/devirt-14.C: Ditto. * testsuite/g++.dg/ipa/devirt-15.C: Ditto. * testsuite/g++.dg/ipa/devirt-16.C: Ditto. * testsuite/g++.dg/ipa/devirt-17.C: Ditto. * testsuite/g++.dg/ipa/devirt-18.C: Ditto. * testsuite/g++.dg/ipa/devirt-19.C: Ditto. * testsuite/g++.dg/ipa/devirt-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-20.C: Ditto. * testsuite/g++.dg/ipa/devirt-21.C: Ditto. * testsuite/g++.dg/ipa/devirt-22.C: Ditto. * testsuite/g++.dg/ipa/devirt-23.C: Ditto. * testsuite/g++.dg/ipa/devirt-24.C: Ditto. * testsuite/g++.dg/ipa/devirt-25.C: Ditto. * testsuite/g++.dg/ipa/devirt-26.C: Ditto. * testsuite/g++.dg/ipa/devirt-27.C: Ditto. * testsuite/g++.dg/ipa/devirt-28.C: Ditto. * testsuite/g++.dg/ipa/devirt-29.C: Ditto. * testsuite/g++.dg/ipa/devirt-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-30.C: Ditto. * testsuite/g++.dg/ipa/devirt-31.C: Ditto. * testsuite/g++.dg/ipa/devirt-39.C: Ditto. * testsuite/g++.dg/ipa/devirt-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-9.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-8.C: Ditto. * testsuite/g++.dg/ipa/devirt-d-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-g-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-1.C: Ditto. * testsuite/g++.dg/ipa/ivinline-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-3.C: Ditto. * testsuite/g++.dg/ipa/ivinline-4.C: Ditto. * testsuite/g++.dg/ipa/ivinline-5.C: Ditto. * testsuite/g++.dg/ipa/ivinline-7.C: Ditto. * testsuite/g++.dg/ipa/ivinline-8.C: Ditto. * testsuite/g++.dg/ipa/ivinline-9.C: Ditto. * testsuite/g++.dg/ipa/pr60600.C: Ditto. * testsuite/g++.dg/ipa/pr60640-4.C: Ditto. * testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto. * testsuite/g++.dg/opt/devirt1.C: Ditto. * testsuite/g++.dg/opt/devirt2.C: Ditto. * testsuite/g++.dg/opt/devirt3.C: Ditto. Index: opts.c === --- opts.c (revision 216286) +++ opts.c (working copy) @@ -485,7 +485,6 @@ static const struct default_options default_option { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 }, -{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 }, Index: testsuite/g++.dg/ipa/devirt-10.C === --- testsuite/g++.dg/ipa/devirt-10.C(revision 216286) +++ testsuite/g++.dg/ipa/devirt-10.C(working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining" } */ +/* { dg-options "-O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining -fdevirtualize" } */ class wxPaintEvent { }; struct wxDCBase { Index: testsuite/g++.dg/ipa/devirt-11.C === --- testsuite/g++.dg/ipa/devirt-11.C
[GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use
Increasing the number of early inliner iterations from 1 to 2 enables more indirect calls to be promoted/inlined before instrumentation. This in turn reduces the instrumentation overhead, particularly for more expensive indirect call topn profiling. Passes internal testing and regression tests. Ok for google/4_9? 2014-10-18 Teresa Johnson Google ref b/17934523 * opts.c (finish_options): Increase max-early-inliner-iterations to 2 for profile-gen and profile-use builds. Index: opts.c === --- opts.c (revision 216286) +++ opts.c (working copy) @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g opts->x_param_values, opts_set->x_param_values); } + if (opts->x_profile_arc_flag + || opts->x_flag_branch_probabilities) +{ + maybe_set_param_value + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, +opts->x_param_values, opts_set->x_param_values); +} + if (!(opts->x_flag_auto_profile || (opts->x_profile_arc_flag || opts->x_flag_branch_probabilities))) { -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use
On Sat, Oct 18, 2014 at 4:26 PM, Jan Hubicka wrote: >> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka wrote: >> >> The difference in instrumentation runtime is huge -- as topn profiler >> >> is pretty expensive to run. >> >> >> >> With FDO, it is probably better to make early inlining more aggressive >> >> in order to get more context sensitive profiling. >> > >> > I agree with that, I just would like to understand where increasing the >> > iterations >> > helps and if we can handle it without iterating (because Richi originally >> > requested to >> > drop the iteration for correcness issues) >> > Do you have some examples? >> >> We can do FDO experiment by shutting down einline. (Note that >> increasing iteration to 2 did not actually improve performance with >> our benchmarks). > > I would be more interested in case where increasing iteration to 2 actually > improves train run perfomrance. (einline was originally invented to make > profiling useable on tramp3d ;) > It seems to me that the cases handled by iteration are rather rare, so I am > suprised you get important benefit from these. Perhaps we miss something > obvious here. The specific case was actually a call to upper_bound in bits/stl_algo.h with a specialized compare function. In the more recent versions of upper_bound, the call to the comparator was outlined into __upper_bound. With only one iteration of early inlining, we were inlining __upper_bound into upper_bound and into the caller. But the indirect call to the comparator was not promoted until the fre2 pass, so it didn't get early inlined. With 2 iterations of early inlining, enough optimization is apparently done between iterations to propagate the actual target and promote the indirect call after we inline __upper_bound and upper_bound that it is inlined in the second iteration. Thanks, Teresa > > Honza >> >> David >> >> > Honza >> >> >> >> David >> >> >> >> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka wrote: >> >> >> Increasing the number of early inliner iterations from 1 to 2 enables >> >> >> more >> >> >> indirect calls to be promoted/inlined before instrumentation. This in >> >> >> turn >> >> >> reduces the instrumentation overhead, particularly for more expensive >> >> >> indirect >> >> >> call topn profiling. >> >> > >> >> > How much difference you get here? One posibility would be also to run >> >> > specialized >> >> > ipa-cp before profile instrumentation. >> >> > >> >> > Honza >> >> >> >> >> >> Passes internal testing and regression tests. Ok for google/4_9? >> >> >> >> >> >> 2014-10-18 Teresa Johnson >> >> >> >> >> >> Google ref b/17934523 >> >> >> * opts.c (finish_options): Increase >> >> >> max-early-inliner-iterations to 2 >> >> >> for profile-gen and profile-use builds. >> >> >> >> >> >> Index: opts.c >> >> >> === >> >> >> --- opts.c (revision 216286) >> >> >> +++ opts.c (working copy) >> >> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g >> >> >> opts->x_param_values, opts_set->x_param_values); >> >> >> } >> >> >> >> >> >> + if (opts->x_profile_arc_flag >> >> >> + || opts->x_flag_branch_probabilities) >> >> >> +{ >> >> >> + maybe_set_param_value >> >> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, >> >> >> +opts->x_param_values, opts_set->x_param_values); >> >> >> +} >> >> >> + >> >> >>if (!(opts->x_flag_auto_profile >> >> >> || (opts->x_profile_arc_flag || >> >> >> opts->x_flag_branch_probabilities))) >> >> >> { >> >> >> >> >> >> >> >> >> -- >> >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | >> >> >> 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
Hi Honza, As David says, we will do some more experiments with the change you suggest and speculative devirtualization, but we needed to make this change in part to get an internal release out. One of the issues was a recent change to cp/decl2.c to make virtual function decls needed under flag_devirtualize. Thanks! Teresa On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka wrote: >> >> The virtual functions will be emitted in some modules, right? If they >> are hot, they will be included with the auxiliary modules. Note that >> LIPO indirect call profile will point to the comdat copy that is >> picked by the linker in the instrumentation build, so it guarantees to >> exist. > > If you have COMDAT virtual inline function that is used by module FOO and > LIPO's indirect inlining will work out that it goes to comdat copy of module > BAR > (that won the merging). It will make C++ parser to also parse BAR to get > the function, but callgarph builder will ignore it, too. > (this is new logic in analyze_function that with -fno-devirtualize will just > prevent all virtual functions not directly reachable to appear in symbol > table) > > I am surprised you hit the size limits with 4.9 only - for quite some time > we keep all virtual functions in callgarph until inlining. In fact 4.9 is > first > that works harder to drop them early (because I hit the problem with LTO > where they artifically bloat the size of LTO object files) > > Honza >> >> David >> >> >> > >> > Honza >> >> >> >> David >> >> >> >> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: >> >> >> Disabling devirtualization reduces code size, both for instrumentation >> >> >> (because >> >> >> many more virtual functions are kept longer and therefore >> >> >> instrumented) and for >> >> >> normal optimization. >> >> > >> >> > OK, with profile instrumentation (that you seem to try to minimize) i >> >> > can see >> >> > how you get noticeably more counters because virtual functions are kept >> >> > longer. >> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual >> >> > functions >> >> > than earlier compilers). >> >> > >> >> > Instead of disabling -fdevirtualize completely (that will get you more >> >> > indirect >> >> > calls and thus more topn profiling) you may consider just hacking >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets as >> >> > reachable. (see the conditional on before_inlining_p). >> >> > >> >> > Of course this will get you less devirtualization (but with LTO the >> >> > difference >> >> > should not be big - perhaps I could make switch for that for mainline) >> >> > and less >> >> > accurate profiles when you get speculative devirtualization via topn. >> >> > >> >> > I would be very interested to see how much difference this makes. >> >> > >> >> > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Fix LIPO resolved node reference fixup
This patch makes a fix to the reference fixups performed after LIPO node resolution, to better handle the case where we are updating the base address of a reference. Fixes google benchmark and passes regression tests. Ok for google/4_9? Thanks, Teresa 2014-10-24 Teresa Johnson Google ref b/18110567. * cgraphbuild.c (get_base_address_expr): New function. (fixup_ref): Update the op expression for new base address. Index: cgraphbuild.c === --- cgraphbuild.c (revision 216667) +++ cgraphbuild.c (working copy) @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool pointer_set_destroy (visited_nodes); } +/* Similar to get_base_address but returns the ADDR_EXPR pointing + to the base address corresponding to T. */ + +static tree +get_base_address_expr (tree t) +{ + while (handled_component_p (t)) +t = TREE_OPERAND (t, 0); + + if ((TREE_CODE (t) == MEM_REF + || TREE_CODE (t) == TARGET_MEM_REF) + && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR) +return TREE_OPERAND (t, 0); + + return NULL_TREE; +} + /* Update any function decl references in base ADDR of operand OP to refer to the resolved node. */ static bool fixup_ref (gimple, tree addr, tree op) { + tree orig_addr = addr; addr = get_base_address (addr); + /* If the address was changed, update the operand OP to be the + ADDR_EXPR pointing to the new base address. */ + if (orig_addr != addr) +op = get_base_address_expr (orig_addr); if (addr && TREE_CODE (addr) == FUNCTION_DECL) { gcc_assert (TREE_CODE (op) == ADDR_EXPR); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fix LIPO resolved node reference fixup
On Fri, Oct 24, 2014 at 10:55 AM, Xinliang David Li wrote: > When orgin_addr == addr, is there a guarantee that this assert: > > gcc_assert (TREE_OPERAND (op,0) == addr); > > is always true? It should be, that is the assumption of the code that we are trying to enforce with the assert. Teresa > > David > > > > On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson wrote: >> This patch makes a fix to the reference fixups performed after LIPO >> node resolution, to better handle the case where we are updating the >> base address of a reference. >> >> Fixes google benchmark and passes regression tests. Ok for google/4_9? >> >> Thanks, >> Teresa >> >> 2014-10-24 Teresa Johnson >> >> Google ref b/18110567. >> * cgraphbuild.c (get_base_address_expr): New function. >> (fixup_ref): Update the op expression for new base address. >> >> Index: cgraphbuild.c >> === >> --- cgraphbuild.c (revision 216667) >> +++ cgraphbuild.c (working copy) >> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool >>pointer_set_destroy (visited_nodes); >> } >> >> +/* Similar to get_base_address but returns the ADDR_EXPR pointing >> + to the base address corresponding to T. */ >> + >> +static tree >> +get_base_address_expr (tree t) >> +{ >> + while (handled_component_p (t)) >> +t = TREE_OPERAND (t, 0); >> + >> + if ((TREE_CODE (t) == MEM_REF >> + || TREE_CODE (t) == TARGET_MEM_REF) >> + && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR) >> +return TREE_OPERAND (t, 0); >> + >> + return NULL_TREE; >> +} >> + >> /* Update any function decl references in base ADDR of operand OP to refer >> to >> the resolved node. */ >> >> static bool >> fixup_ref (gimple, tree addr, tree op) >> { >> + tree orig_addr = addr; >>addr = get_base_address (addr); >> + /* If the address was changed, update the operand OP to be the >> + ADDR_EXPR pointing to the new base address. */ >> + if (orig_addr != addr) >> +op = get_base_address_expr (orig_addr); >>if (addr && TREE_CODE (addr) == FUNCTION_DECL) >> { >>gcc_assert (TREE_CODE (op) == ADDR_EXPR); >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fix LIPO resolved node reference fixup
Here is the new patch that walks op looking for the reference to addr. Passes internal benchmarks and regression tests. Ok for google/4_9? Thanks, Teresa 2014-10-27 Teresa Johnson Google ref b/18110567. * cgraphbuild.c (fixup_all_refs_1): New function. (fixup_all_refs): Ditto. (fixup_ref): Walk tree to find and replace ref. Index: cgraphbuild.c === --- cgraphbuild.c (revision 216735) +++ cgraphbuild.c (working copy) @@ -665,6 +665,45 @@ record_references_in_initializer (tree decl, bool pointer_set_destroy (visited_nodes); } +typedef struct _fixup_decl_info { + tree orig_decl; + tree new_decl; +} fixup_decl_info; + +/* Check the tree at TP to see if it contains the original decl stored in + DATA and if so replace it with the new decl. If original decl is + found set WALK_SUBTREES to 0 so the subtree under TP is not traversed. + Returns the updated parent tree T or NULL if no update performed. */ + +static tree +fixup_all_refs_1 (tree *tp, int *walk_subtrees, void *data) +{ + tree t = *tp; + fixup_decl_info *info = (fixup_decl_info *) data; + + /* The original function decl is always the first tree operand. */ + if (TREE_OPERAND (t,0) == info->orig_decl) +{ + TREE_OPERAND (t,0) = info->new_decl; + *walk_subtrees = 0; + return t; +} + return NULL_TREE; +} + +/* Walk the whole tree rooted at TP and invoke fixup_all_refs_1 to + replace any references to the original decl with the new decl + stored in INFO. */ + +static inline void +fixup_all_refs (tree *tp, fixup_decl_info *info) +{ + tree t = walk_tree (tp, fixup_all_refs_1, info, NULL); + /* This is invoked when we found the original decl, so we expect + to have replaced a reference. */ + gcc_assert (t != NULL_TREE); +} + /* Update any function decl references in base ADDR of operand OP to refer to the resolved node. */ @@ -674,13 +713,16 @@ fixup_ref (gimple, tree addr, tree op) addr = get_base_address (addr); if (addr && TREE_CODE (addr) == FUNCTION_DECL) { - gcc_assert (TREE_CODE (op) == ADDR_EXPR); - gcc_assert (TREE_OPERAND (op,0) == addr); struct cgraph_node *real_callee; real_callee = cgraph_lipo_get_resolved_node (addr); if (addr == real_callee->decl) return false; - TREE_OPERAND (op,0) = real_callee->decl; + /* We need to locate and update the tree operand within OP + that contains ADDR and update it to the real callee's decl. */ + fixup_decl_info info; + info.orig_decl = addr; + info.new_decl = real_callee->decl; + fixup_all_refs (&op, &info); } return false; } On Fri, Oct 24, 2014 at 11:46 AM, Teresa Johnson wrote: > On Fri, Oct 24, 2014 at 10:55 AM, Xinliang David Li > wrote: >> When orgin_addr == addr, is there a guarantee that this assert: >> >> gcc_assert (TREE_OPERAND (op,0) == addr); >> >> is always true? > > It should be, that is the assumption of the code that we are trying to > enforce with the assert. > > Teresa > >> >> David >> >> >> >> On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson >> wrote: >>> This patch makes a fix to the reference fixups performed after LIPO >>> node resolution, to better handle the case where we are updating the >>> base address of a reference. >>> >>> Fixes google benchmark and passes regression tests. Ok for google/4_9? >>> >>> Thanks, >>> Teresa >>> >>> 2014-10-24 Teresa Johnson >>> >>> Google ref b/18110567. >>> * cgraphbuild.c (get_base_address_expr): New function. >>> (fixup_ref): Update the op expression for new base address. >>> >>> Index: cgraphbuild.c >>> === >>> --- cgraphbuild.c (revision 216667) >>> +++ cgraphbuild.c (working copy) >>> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool >>>pointer_set_destroy (visited_nodes); >>> } >>> >>> +/* Similar to get_base_address but returns the ADDR_EXPR pointing >>> + to the base address corresponding to T. */ >>> + >>> +static tree >>> +get_base_address_expr (tree t) >>> +{ >>> + while (handled_component_p (t)) >>> +t = TREE_OPERAND (t, 0); >>> + >>> + if ((TREE_CODE (t) == MEM_REF >>> + || TREE_CODE (t) == TARGET_MEM_REF) >>> + && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR) >>> +return TREE_OPERAND (t, 0); >>> + >>> + return NULL_TREE; >>>
Re: [PATCH] rebuild frequency after vrp
On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen wrote: > Just tried with Teresa's patch, the ICE in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved. I will take a look to see why this wasn't fixed by my profile redesign. Teresa > > Dehao > > On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law wrote: >> On 06/02/14 10:17, Dehao Chen wrote: >>> >>> We need to rebuild frequency after vrp, otherwise the following code >>> in tree-ssa-threadupdate.c will make the frequency larger than >>> upper-bound. >>> >>>/* Excessive jump threading may make frequencies large enough >>> so >>> the computation overflows. */ >>>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) >>> rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); >>> >>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 >> >> Can you try this with Teresa's revamping of the jump threading frequency >> updates? It's still in my queue of things to review. >> >> Fixing this stuff in the updater would be better than rebuilding the >> frequencies, IMHO. >> >> Jeff >> -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] rebuild frequency after vrp
On Mon, Jun 2, 2014 at 11:04 AM, Teresa Johnson wrote: > On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen wrote: >> Just tried with Teresa's patch, the ICE in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved. > > I will take a look to see why this wasn't fixed by my profile redesign. Turns out this is a case of garbage-in, garbage-out. Jump threading is producing a block with a frequency >10K, but it is directly due to a block with an invalid frequency after the upstream ccp2 pass. After ccp2: ;; basic block 3, loop depth 1, count 0, freq 9100, maybe hot ;;prev block 2, next block 4, flags: (NEW, REACHABLE) ;;pred: 10 [91.0%] (TRUE_VALUE,EXECUTABLE) ... ;;succ: 5 [50.0%] (TRUE_VALUE,EXECUTABLE) ;;4 [50.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 1, count 0, freq 6825, maybe hot ;; Invalid sum of incoming frequencies 4550, should be 6825 ;;prev block 3, next block 5, flags: (NEW, REACHABLE) ;;pred: 3 [50.0%] (FALSE_VALUE,EXECUTABLE) ;; , discriminator 4 ;;succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) Jump threading introduced BB 13, with predecessors 3 and 4 shown above: ;; basic block 13, loop depth 1, count 0, freq 11375, maybe hot ;; Invalid sum of outgoing probabilities 40.0% ;;prev block 12, next block 1, flags: (NEW, REACHABLE) ;;pred: 4 [100.0%] (FALLTHRU,EXECUTABLE) ;;3 [50.0%] (TRUE_VALUE,EXECUTABLE) 11375 = 9100/2 + 6825 If bb 4 had the correct frequency of 4550 (9100/2), this block would have gotten the correct count of 9100. BB 13's successor has the correct frequency of 9100. Teresa > > Teresa > >> >> Dehao >> >> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law wrote: >>> On 06/02/14 10:17, Dehao Chen wrote: >>>> >>>> We need to rebuild frequency after vrp, otherwise the following code >>>> in tree-ssa-threadupdate.c will make the frequency larger than >>>> upper-bound. >>>> >>>>/* Excessive jump threading may make frequencies large enough >>>> so >>>> the computation overflows. */ >>>>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) >>>> rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); >>>> >>>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 >>> >>> Can you try this with Teresa's revamping of the jump threading frequency >>> updates? It's still in my queue of things to review. >>> >>> Fixing this stuff in the updater would be better than rebuilding the >>> frequencies, IMHO. >>> >>> Jeff >>> > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Fix ICE due to memory corruption in SRA
This patch fixes an ICE due to memory corruption discovered while building a large application with FDO and LIPO on the google branch. I don't have a small reproducer, but the same code appears on trunk, and I believe it could also silently result in incorrect code generation. The problem occurs if SRA is applied on a recursive call. In this case, the redirect_callers vec below contains the recursive edge from node->node. When rebuild_cgraph_edges is invoked, it will free the callee edges of node, including the one recorded in redirect_callers. In the case I looked at, after rebuilding the cgraph edges for node, the address recorded in redirect_callers now pointed to a different cgraph edge, and we later got an ICE because the (incorrect) callee that we tried to modify had the wrong number of arguments. To fix, I simply moved the collection of caller edges to after the cgraph edge rebuilding. Google ref b/15383777. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2014-06-02 Teresa Johnson * tree-sra.c (modify_function): Record caller nodes after rebuild. Index: tree-sra.c === --- tree-sra.c (revision 211139) +++ tree-sra.c (working copy) @@ -4925,12 +4925,15 @@ modify_function (struct cgraph_node *node, ipa_par { struct cgraph_node *new_node; bool cfg_changed; - vec redirect_callers = collect_callers_of_node (node); rebuild_cgraph_edges (); free_dominance_info (CDI_DOMINATORS); pop_cfun (); + /* This must be done after rebuilding cgraph edges for node above. + Otherwise any recursive calls to node that are recorded in + redirect_callers will be corrupted. */ + vec redirect_callers = collect_callers_of_node (node); new_node = cgraph_function_versioning (node, redirect_callers, NULL, NULL, false, NULL, NULL, "isra"); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Google/4_8] Fix testsuite/gcc.dg/ipa/ipa-sra-6.c
I just committed the following patch to google/4_8 as 211279. This fixes a test failure with my backport of tree-sra fix r211180 from trunk. It turns out that the bug I fixed affected this test case, but because the dumping format is slightly different between google/4_8 and both google/4_9 and trunk, it didn't show up as a test failure for either google/4_9 or trunk. Essentially, after my fix, both in google/4_8 and google/4_9 (and presumably trunk), I am getting more output in the eipa_sra dump output. Looks like we in fact were previously not properly updating a recursive call due to this same issue that I was fixing, although it didn't manifest as an ICE like in the case I fixed. With the fix, we now properly update a recursive call being optimized by SRA. The test case is expecting to see exactly one occurrence of "foo " in the dump output. In google/4_8, one of the new dump lines matches because it looks like: Adjusting call (0 -> 2) foo -> foo.isra.0 In google/4_9 and trunk, the additional dump lines don't match because the node's order number is being printed after the name: Adjusting call foo/0 -> foo.isra.0/2 2014-06-05 Teresa Johnson * testsuite/gcc.dg/ipa/ipa-sra-6.c: Update to handle recent tree-sra.c fix. Index: testsuite/gcc.dg/ipa/ipa-sra-6.c === --- testsuite/gcc.dg/ipa/ipa-sra-6.c (revision 210862) +++ testsuite/gcc.dg/ipa/ipa-sra-6.c (working copy) @@ -30,5 +30,5 @@ int main (int argc, char *argv[]) return foo (&cow, 0); } -/* { dg-final { scan-tree-dump-times "foo " 1 "eipa_sra" } } */ +/* { dg-final { scan-tree-dump-times "foo " 2 "eipa_sra" } } */ /* { dg-final { cleanup-tree-dump "eipa_sra" } } */ Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Google/4_8] Reduce memory overhead of LIPO COMDAT fixups
(cc'ing a few additional people to help with review as David is out and I'm not sure Rong is available) This patch greatly reduces the memory overhead of the new COMDAT fixup analysis, by changing the second level hash tables to linked lists. I found that almost none of the second level hash tables contained more than one entry, but each hash table required a lot of memory overhead. I also now query the fixup type before adding the checksums to the pointer sets during callgraph building, which would have enabled a workaround for the memory issue. Tested with regression tests and internal tests (see ref below for details). Google ref b/15415042. 2014-06-05 Teresa Johnson * dyn-ipa.c (struct lineno_checksum_alias): Replaced pointer set. (struct checksum_alias_info): Enabled linked list. (cfg_checksum_get_key): Removed. (find_cfg_checksum): New function. (cfg_checksum_insert): Operate on linked list. (checksum_set_insert): Ditto. (gcov_build_callgraph): Allow disabling checksum insertion. (gcov_find_new_ic_target): Operate on linked list. (gcov_fixup_counters_checksum): Ditto. (gcov_fixup_counters_lineno): Ditto. (__gcov_compute_module_groups): Compute fixup type earlier. Index: dyn-ipa.c === --- dyn-ipa.c (revision 211288) +++ dyn-ipa.c (working copy) @@ -79,16 +79,15 @@ struct dyn_cgraph unsigned num_nodes_executed; /* used by new algorithm */ struct modu_node *modu_nodes; - /* Set indexed by lineno_checksum, returns another dyn_pointer_set*, - indexed by cfg_checksum. That returns a checksum_alias_info struct. */ + /* Set indexed by lineno_checksum, returns a linked list of + checksum_alias_info structs. */ struct dyn_pointer_set *lineno_pointer_sets; }; /* Struct holding information for functions with the same lineno_checksum. */ struct lineno_checksum_alias { - /* Set indexed by cfg_checksum, holding a checksum_alias_info struct. */ - struct dyn_pointer_set *cfg_pointer_set; + struct checksum_alias_info *cfg_checksum_list; unsigned lineno_checksum; }; @@ -96,6 +95,7 @@ struct lineno_checksum_alias checksums. */ struct checksum_alias_info { + struct checksum_alias_info *next_cfg_checksum; struct checksum_alias *alias_list; unsigned cfg_checksum; }; @@ -205,6 +205,7 @@ pointer_set_create (unsigned (*get_key) (const voi static struct dyn_cgraph the_dyn_call_graph; static int total_zero_count = 0; static int total_insane_count = 0; +static int fixup_type = 0; enum GROUPING_ALGORITHM { @@ -374,14 +375,6 @@ lineno_checksum_get_key (const void *p) return ((const struct lineno_checksum_alias *) p)->lineno_checksum; } -/* The cfg_checksum value in P is the key for a cfg_pointer_set. */ - -static inline unsigned -cfg_checksum_get_key (const void *p) -{ - return ((const struct checksum_alias_info *) p)->cfg_checksum; -} - /* Create a new checksum_alias struct for function with GUID, FI_PTR, and ZERO_COUNTS flag. Prepends to list NEXT and returns new struct. */ @@ -398,28 +391,44 @@ new_checksum_alias (gcov_type guid, const struct g return alias; } -/* Insert a new checksum_alias struct into pointer set P for function with +/* Locate the checksum_alias_info in LIST that matches CFG_CHECKSUM. */ + +static struct checksum_alias_info * +find_cfg_checksum (struct checksum_alias_info *list, unsigned cfg_checksum) +{ + for (; list; list = list->next_cfg_checksum) +{ + if (list->cfg_checksum == cfg_checksum) +return list; +} + return NULL; +} + +/* Insert a new checksum_alias struct into LIST for function with CFG_CHECKSUM and associated GUID, FI_PTR, and ZERO_COUNTS flag. */ -static void -cfg_checksum_set_insert (struct dyn_pointer_set *p, unsigned cfg_checksum, - gcov_type guid, const struct gcov_fn_info *fi_ptr, - int zero_counts) +static struct checksum_alias_info * +cfg_checksum_insert (unsigned cfg_checksum, gcov_type guid, + const struct gcov_fn_info *fi_ptr, int zero_counts, + struct checksum_alias_info *list) { - struct checksum_alias_info **m = (struct checksum_alias_info **) -pointer_set_find_or_insert (p, cfg_checksum); - if (*m) + struct checksum_alias_info *alias_info; + alias_info = find_cfg_checksum (list, cfg_checksum); + if (alias_info) { - gcc_assert ((*m)->alias_list); - (*m)->alias_list = new_checksum_alias (guid, fi_ptr, zero_counts, - (*m)->alias_list); + gcc_assert (alias_info->alias_list); + alias_info->alias_list = new_checksum_alias (guid, fi_ptr, zero_counts, + alias_info->alias_list); + return list; } else { - *m = XNEW (struct checksu
[Google] Fix AFDO early inline ICEs due to DFE
These two patches fix multiple ICE that occurred due to DFE being recently enabled after AutoFDO LIPO linking. Passes regression and internal testing. Ok for Google/4_8? Teresa 2014-06-12 Teresa Johnson Dehao Chen Google ref b/15521327. * cgraphclones.c (cgraph_clone_edge): Use resolved node. * l-ipo.c (resolve_cgraph_node): Resolve to non-removable node. Index: cgraphclones.c === --- cgraphclones.c (revision 211386) +++ cgraphclones.c (working copy) @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-utils.h" #include "lto-streamer.h" #include "except.h" +#include "l-ipo.h" /* Create clone of E in the node N represented by CALL_EXPR the callgraph. */ struct cgraph_edge * @@ -118,7 +119,11 @@ cgraph_clone_edge (struct cgraph_edge *e, struct c if (call_stmt && (decl = gimple_call_fndecl (call_stmt))) { - struct cgraph_node *callee = cgraph_get_node (decl); + struct cgraph_node *callee; + if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done) +callee = cgraph_lipo_get_resolved_node (decl); + else +callee = cgraph_get_node (decl); gcc_checking_assert (callee); new_edge = cgraph_create_edge (n, callee, call_stmt, count, freq); } Index: l-ipo.c === --- l-ipo.c (revision 211386) +++ l-ipo.c (working copy) @@ -1542,6 +1542,18 @@ resolve_cgraph_node (struct cgraph_sym **slot, str gcc_assert (decl1_defined); add_define_module (*slot, decl2); + /* Pick the node that cannot be removed, to avoid a situation + where we remove the resolved node and later try to access + it for the remaining non-removable copy. E.g. one may be + extern and the other weak, only the extern copy can be removed. */ + if (cgraph_can_remove_if_no_direct_calls_and_refs_p ((*slot)->rep_node) + && !cgraph_can_remove_if_no_direct_calls_and_refs_p (node)) +{ + (*slot)->rep_node = node; + (*slot)->rep_decl = decl2; + return; +} + has_prof1 = has_profile_info (decl1); bool is_aux1 = cgraph_is_auxiliary (decl1); bool is_aux2 = cgraph_is_auxiliary (decl2); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Fix -femit-function-names in LIPO profile-use mode
Emit the proper module name in LIPO profile-use mode. Passes regression tests, ok for google branches? Thanks, Teresa 2014-06-24 Teresa Johnson * coverage.c (emit_function_name): Emit module name in LIPO mode. Index: coverage.c === --- coverage.c (revision 211893) +++ coverage.c (working copy) @@ -1882,7 +1882,9 @@ void emit_function_name (void) { fprintf (stderr, "Module %s FuncId %u Name %s\n", - main_input_file_name, + (L_IPO_COMP_MODE +? get_module_name (FUNC_DECL_MODULE_ID (cfun)) +: main_input_file_name), FUNC_DECL_FUNC_ID (cfun), IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (current_function_decl))); } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Add missing -fdump-* options
On Tue, May 13, 2014 at 8:19 AM, Xinliang David Li wrote: > On Tue, May 13, 2014 at 1:39 AM, Richard Biener > wrote: >> On Fri, May 9, 2014 at 5:54 PM, Teresa Johnson wrote: >>> I discovered that the support for the documented -fdump-* options >>> "optimized", "missed", "note" and "optall" was missing. Added that and >>> fixed a minor typo in the documentation. >>> >>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? >> >> I'm not sure they were intented for user-consumption. ISTR they >> are just an implementation detail exposed by -fopt-info-X (which is >> where they are documented). >> >> The typo fix is ok, also adding a comment before the dump flags >> definition to the above fact. >> >> David, do I remember correctly? > > I remember we talked about content filtering dump flags. Things like > > -fdump-xxx-ir <-- dump IR only > -fdump-xxx-transformation <-- optimization note > -fdump-xxx-debug <-- other debug traces > > Other than that, now I think 'details' and 'all' seem redundant. > 'verbose' flag/modifier can achieve the same effect depending on the > context. > > -fdump-xxx-ir-verbose <-- dump IR, and turn on IR modifiers such as > vops, lineno, etc > -fdump-xxx-transforamtion-verbose <-- dump transformations + missed > optimizations + notes > -fdump-xxx-debug-verbose <-- turn on detailed trace. The above proposal seems fine to me as a longer-term direction, but also seems somewhat orthogonal to the issue my patch is trying to solve in the short term, namely inconsistent documentation and behavior: 1) "optimized", "missed", "note" and "optall" are documented as being sub-options for -fdump-tree-* in doc/invoke.texi, but not implemented. 2) "optimized", "missed", "note" and "optall" are however enabled via -fdump-tree-all Could we at least fix these issues in the short term, as it doesn't affect the documented behavior (but rather adds the documented behavior)? Thanks, Teresa > > thanks, > > David > > >> >> Thanks, >> Richard. >> >>> Thanks, >>> Teresa >>> >>> 2014-05-09 Teresa Johnson >>> >>> * doc/invoke.texi: Fix typo. >>> * dumpfile.c: Add support for documented -fdump-* options >>> optimized/missed/note/optall. >>> >>> Index: doc/invoke.texi >>> === >>> --- doc/invoke.texi (revision 210157) >>> +++ doc/invoke.texi (working copy) >>> @@ -6278,7 +6278,7 @@ passes). >>> @item missed >>> Enable showing missed optimization information (only available in certain >>> passes). >>> -@item notes >>> +@item note >>> Enable other detailed optimization information (only available in >>> certain passes). >>> @item =@var{filename} >>> Index: dumpfile.c >>> === >>> --- dumpfile.c (revision 210157) >>> +++ dumpfile.c (working copy) >>> @@ -107,6 +107,10 @@ static const struct dump_option_value_info dump_op >>>{"nouid", TDF_NOUID}, >>>{"enumerate_locals", TDF_ENUMERATE_LOCALS}, >>>{"scev", TDF_SCEV}, >>> + {"optimized", MSG_OPTIMIZED_LOCATIONS}, >>> + {"missed", MSG_MISSED_OPTIMIZATION}, >>> + {"note", MSG_NOTE}, >>> + {"optall", MSG_ALL}, >>>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA >>> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE >>> | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)}, >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Improve -fdump-tree-all efficiency
The following patch fixes a big inefficiency when using -fdump-tree-all for large source files. I found that when using this option the compile time became unreasonably slow, and I traced this to the fact that dump_begin/dump_end are called around every function/class that are dumped via -fdump-tree-original and -fdump-class-hierarchy. I fixed this by opening the .original and .class dumps once before invoking the parser and closing them once after. For a file containing ~7000 each of functions and classes, the real time measured for the compile is: no dumping: 8s -fdump-tree-all, no patch: 10m30s -fdump-tree-all, my patch: 1m21s Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2014-06-26 Teresa Johnson * c-family/c-common.h (get_dump_info): Declare. * c-family/c-gimplify.c (c_genericize): Use saved dump files. * c-family/c-opts.c (c_common_parse_file): Begin and end dumps once around parsing invocation. (get_dump_info): New function. * cp/class.c (dump_class_hierarchy): Use saved dump files. (dump_vtable): Ditto. (dump_vtt): Ditto. Index: c-family/c-common.h === --- c-family/c-common.h (revision 211980) +++ c-family/c-common.h (working copy) @@ -835,6 +835,7 @@ extern bool c_common_post_options (const char **); extern bool c_common_init (void); extern void c_common_finish (void); extern void c_common_parse_file (void); +extern FILE *get_dump_info (int, int *); extern alias_set_type c_common_get_alias_set (tree); extern void c_register_builtin_type (tree, const char*); extern bool c_promoting_integer_type_p (const_tree); Index: c-family/c-gimplify.c === --- c-family/c-gimplify.c (revision 211980) +++ c-family/c-gimplify.c (working copy) @@ -123,7 +123,7 @@ c_genericize (tree fndecl) } /* Dump the C-specific tree IR. */ - dump_orig = dump_begin (TDI_original, &local_dump_flags); + dump_orig = get_dump_info (TDI_original, &local_dump_flags); if (dump_orig) { fprintf (dump_orig, "\n;; Function %s", @@ -140,8 +140,6 @@ c_genericize (tree fndecl) else print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl)); fprintf (dump_orig, "\n"); - - dump_end (TDI_original, dump_orig); } /* Dump all nested functions now. */ Index: c-family/c-opts.c === --- c-family/c-opts.c (revision 211980) +++ c-family/c-opts.c (working copy) @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see TARGET_FLT_EVAL_METHOD_NON_DEFAULT and TARGET_OPTF. */ #include "tm_p.h" /* For C_COMMON_OVERRIDE_OPTIONS. */ +#include "dumpfile.h" #ifndef DOLLARS_IN_IDENTIFIERS # define DOLLARS_IN_IDENTIFIERS true @@ -102,6 +103,12 @@ static size_t deferred_count; /* Number of deferred options scanned for -include. */ static size_t include_cursor; +/* Dump files/flags to use during parsing. */ +static FILE *original_dump_file = NULL; +static int original_dump_flags; +static FILE *class_dump_file = NULL; +static int class_dump_flags; + /* Whether any standard preincluded header has been preincluded. */ static bool done_preinclude; @@ -1088,6 +1095,10 @@ c_common_parse_file (void) for (;;) { c_finish_options (); + /* Open the dump files to use for the original and class dump output + here, to be used during parsing for the current file. */ + original_dump_file = dump_begin (TDI_original, &original_dump_flags); + class_dump_file = dump_begin (TDI_class, &class_dump_flags); pch_init (); push_file_scope (); c_parse_file (); @@ -1101,6 +1112,16 @@ c_common_parse_file (void) cpp_clear_file_cache (parse_in); this_input_filename = cpp_read_main_file (parse_in, in_fnames[i]); + if (original_dump_file) +{ + dump_end (TDI_original, original_dump_file); + original_dump_file = NULL; +} + if (class_dump_file) +{ + dump_end (TDI_class, class_dump_file); + class_dump_file = NULL; +} /* If an input file is missing, abandon further compilation. cpplib has issued a diagnostic. */ if (!this_input_filename) @@ -1108,6 +1129,23 @@ c_common_parse_file (void) } } +/* Returns the appropriate dump file for PHASE to dump with FLAGS. */ +FILE * +get_dump_info (int phase, int *flags) +{ + gcc_assert (phase == TDI_original || phase == TDI_class); + if (phase == TDI_original) +{ + *flags = original_dump_flags; + return original_dump_file; +} + else +{ + *flags = class_dump_flags; + return class_dump_file; +} +} +
Re: [PATCH] Add missing -fdump-* options
On Thu, Jun 26, 2014 at 12:40 AM, Richard Biener wrote: > On Wed, Jun 25, 2014 at 4:21 PM, Teresa Johnson wrote: >> On Tue, May 13, 2014 at 8:19 AM, Xinliang David Li >> wrote: >>> On Tue, May 13, 2014 at 1:39 AM, Richard Biener >>> wrote: >>>> On Fri, May 9, 2014 at 5:54 PM, Teresa Johnson >>>> wrote: >>>>> I discovered that the support for the documented -fdump-* options >>>>> "optimized", "missed", "note" and "optall" was missing. Added that and >>>>> fixed a minor typo in the documentation. >>>>> >>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? >>>> >>>> I'm not sure they were intented for user-consumption. ISTR they >>>> are just an implementation detail exposed by -fopt-info-X (which is >>>> where they are documented). >>>> >>>> The typo fix is ok, also adding a comment before the dump flags >>>> definition to the above fact. >>>> >>>> David, do I remember correctly? >>> >>> I remember we talked about content filtering dump flags. Things like >>> >>> -fdump-xxx-ir <-- dump IR only >>> -fdump-xxx-transformation <-- optimization note >>> -fdump-xxx-debug <-- other debug traces >>> >>> Other than that, now I think 'details' and 'all' seem redundant. >>> 'verbose' flag/modifier can achieve the same effect depending on the >>> context. >>> >>> -fdump-xxx-ir-verbose <-- dump IR, and turn on IR modifiers such as >>> vops, lineno, etc >>> -fdump-xxx-transforamtion-verbose <-- dump transformations + missed >>> optimizations + notes >>> -fdump-xxx-debug-verbose <-- turn on detailed trace. >> >> The above proposal seems fine to me as a longer-term direction, but >> also seems somewhat orthogonal to the issue my patch is trying to >> solve in the short term, namely inconsistent documentation and >> behavior: >> >> 1) "optimized", "missed", "note" and "optall" are documented as being >> sub-options for -fdump-tree-* in doc/invoke.texi, but not implemented. >> 2) "optimized", "missed", "note" and "optall" are however enabled via >> -fdump-tree-all >> >> Could we at least fix these issues in the short term, as it doesn't >> affect the documented behavior (but rather adds the documented >> behavior)? > > Sure. > > Richard. Thanks, retested and committed as r212040. Teresa > >> Thanks, >> Teresa >> >>> >>> thanks, >>> >>> David >>> >>> >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> Teresa >>>>> >>>>> 2014-05-09 Teresa Johnson >>>>> >>>>> * doc/invoke.texi: Fix typo. >>>>> * dumpfile.c: Add support for documented -fdump-* options >>>>> optimized/missed/note/optall. >>>>> >>>>> Index: doc/invoke.texi >>>>> === >>>>> --- doc/invoke.texi (revision 210157) >>>>> +++ doc/invoke.texi (working copy) >>>>> @@ -6278,7 +6278,7 @@ passes). >>>>> @item missed >>>>> Enable showing missed optimization information (only available in certain >>>>> passes). >>>>> -@item notes >>>>> +@item note >>>>> Enable other detailed optimization information (only available in >>>>> certain passes). >>>>> @item =@var{filename} >>>>> Index: dumpfile.c >>>>> === >>>>> --- dumpfile.c (revision 210157) >>>>> +++ dumpfile.c (working copy) >>>>> @@ -107,6 +107,10 @@ static const struct dump_option_value_info dump_op >>>>>{"nouid", TDF_NOUID}, >>>>>{"enumerate_locals", TDF_ENUMERATE_LOCALS}, >>>>>{"scev", TDF_SCEV}, >>>>> + {"optimized", MSG_OPTIMIZED_LOCATIONS}, >>>>> + {"missed", MSG_MISSED_OPTIMIZATION}, >>>>> + {"note", MSG_NOTE}, >>>>> + {"optall", MSG_ALL}, >>>>>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | >>>>> TDF_IPA >>>>> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE >>>>> | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | >>>>> TDF_SCEV)}, >>>>> >>>>> >>>>> -- >>>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] LIPO resolved node handling for inline clone references
This patch addresses an issue with looking up the resolved node rebuilding cgraph references during clone materialization. It is possible that in ipa-cp cases the resolved node was eliminated during inlining and indirect call promotion, but the clone still exists as a function call argument in an inline virtual clone. The clones would be removed later when we actually do the inline transformation, but in the meantime during clone materialization the cgraph references are rebuilt, and the temporary reference on the call still exists in the IR. Handle this by skipping such references in inline clones. Passes regression tests. Ok for google/4_9? Teresa 2014-07-01 Teresa Johnson Google ref b/15411384. * cgraphbuild.c (mark_address): Skip resolved node lookup in inline copies. Index: cgraphbuild.c === --- cgraphbuild.c (revision 211957) +++ cgraphbuild.c (working copy) @@ -389,8 +389,28 @@ mark_address (gimple stmt, tree addr, tree, void * addr = get_base_address (addr); if (TREE_CODE (addr) == FUNCTION_DECL) { + /* Before possibly cloning the node in cgraph_get_create_node, + save the current cgraph node for addr. */ + struct cgraph_node *first_clone = cgraph_get_node (addr); struct cgraph_node *node = cgraph_get_create_node (addr); - if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done) + /* In LIPO mode we use the resolved node. However, there is + a possibility that it may not exist at this point. This + can happen in cases of ipa-cp, where this is a reference + that will eventually go away during inline_transform when we + invoke cgraph_redirect_edge_call_stmt_to_callee to rewrite + the call_stmt and skip some arguments. It is possible + that earlier during inline_call the references to the original + non-cloned resolved node were all eliminated, and it was removed. + However, virtual clones may stick around until inline_transform, + due to references in other virtual clones, at which point they + will all be removed. In between inline_call and inline_transform, + however, we will materialize clones which would rebuild references + and end up here upon still seeing the reference on the call. + Handle this by skipping the resolved node lookup when the first + clone was marked global.inlined_to (i.e. it is a virtual clone, + the original is gone). */ + if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done + && first_clone && !first_clone->global.inlined_to) node = cgraph_lipo_get_resolved_node (addr); cgraph_mark_address_taken_node (node); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate
The following patch adds support for a new libgcov interface, __gcov_profiling_enabled, that can be used by applications to determine whether a binary has been instrumented for test coverage or profile optimization. Passes regression tests and manual testing with different options. Ok for google branches? Thanks, Teresa 2014-07-02 Teresa Johnson Google ref b/15378201. * gcc/tree-profile.c (gcov_test_coverage_decl): Declare. (tree_init_instrumentation): Initialize gcov_test_coverage_decl. * libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define. * libgcc/libgcov-interface.c (__gcov_profiling_enabled): New function. Index: gcc/tree-profile.c === --- gcc/tree-profile.c (revision 212044) +++ gcc/tree-profile.c (working copy) @@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL /* extern gcov_unsigned_t __gcov_profile_prefix */ static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE; +/* extern gcov_unsigned_t __gcov_test_coverage */ +static tree GTY(()) gcov_test_coverage_decl = NULL_TREE; + /* extern gcov_unsigned_t __gcov_sampling_period */ static GTY(()) tree gcov_sampling_period_decl = NULL_TREE; @@ -498,6 +501,27 @@ tree_init_instrumentation (void) DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr; varpool_finalize_decl (gcov_profile_prefix_decl); } + + if (!gcov_test_coverage_decl) +{ + /* Initialize __gcov_test_coverage to 1 if -ftest-coverage + specified, 0 otherwise. Used by libgcov to determine whether + a binary was instrumented for coverage or profile optimization. */ + gcov_test_coverage_decl = build_decl ( + UNKNOWN_LOCATION, + VAR_DECL, + get_identifier ("__gcov_test_coverage"), + get_gcov_unsigned_t ()); + TREE_PUBLIC (gcov_test_coverage_decl) = 1; + DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1; + DECL_COMDAT_GROUP (gcov_test_coverage_decl) + = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl); + TREE_STATIC (gcov_test_coverage_decl) = 1; + DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst ( + get_gcov_unsigned_t (), + flag_test_coverage ? 1 : 0); + varpool_finalize_decl (gcov_test_coverage_decl); +} } /* Initialization function for FDO sampling. */ Index: libgcc/libgcov-driver.c === --- libgcc/libgcov-driver.c (revision 212044) +++ libgcc/libgcov-driver.c (working copy) @@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void) char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled; extern void __gcov_flush (void); char *(*__gcov_dummy_ref6)(void) = &__gcov_flush; +extern unsigned int __gcov_profiling_enabled (void); +char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_enabled; /* Default callback function for profile instrumentation callback. */ extern void __coverage_callback (gcov_type, int); Index: libgcc/libgcov-interface.c === --- libgcc/libgcov-interface.c (revision 212044) +++ libgcc/libgcov-interface.c (working copy) @@ -116,6 +116,20 @@ __gcov_dump (void) set_gcov_dump_complete (); } +/* Emitted in coverage.c. */ +extern gcov_unsigned_t __gcov_test_coverage; + +unsigned int __gcov_profiling_enabled (void); + +/* Function that can be called from application to distinguish binaries + instrumented from coverage fro those instrumented for profiling + (e.g. -fprofile-generate). */ + +unsigned int __gcov_profiling_enabled (void) +{ + return !__gcov_test_coverage; +} + #endif /* L_gcov_dump */ #ifdef L_gcov_sampling -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate
On Wed, Jul 2, 2014 at 1:15 PM, Xinliang David Li wrote: > Should the interface be something like: > > int __gcov_profiling_for_test_coverage(void)? I was equating the term "profiling" with -fprofile-generate, as opposed to -ftest-coverage instrumentation. But I can change it to this if you think that is clearer. Teresa > > David > > > On Wed, Jul 2, 2014 at 12:55 PM, Teresa Johnson wrote: >> The following patch adds support for a new libgcov interface, >> __gcov_profiling_enabled, that can be used by applications to >> determine whether a binary has been instrumented for test coverage or >> profile optimization. >> >> Passes regression tests and manual testing with different options. Ok >> for google branches? >> >> Thanks, >> Teresa >> >> 2014-07-02 Teresa Johnson >> >> Google ref b/15378201. >> * gcc/tree-profile.c (gcov_test_coverage_decl): Declare. >> (tree_init_instrumentation): Initialize gcov_test_coverage_decl. >> * libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define. >> * libgcc/libgcov-interface.c (__gcov_profiling_enabled): New >> function. >> >> Index: gcc/tree-profile.c >> === >> --- gcc/tree-profile.c (revision 212044) >> +++ gcc/tree-profile.c (working copy) >> @@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL >> /* extern gcov_unsigned_t __gcov_profile_prefix */ >> static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE; >> >> +/* extern gcov_unsigned_t __gcov_test_coverage */ >> +static tree GTY(()) gcov_test_coverage_decl = NULL_TREE; >> + >> /* extern gcov_unsigned_t __gcov_sampling_period */ >> static GTY(()) tree gcov_sampling_period_decl = NULL_TREE; >> >> @@ -498,6 +501,27 @@ tree_init_instrumentation (void) >>DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr; >>varpool_finalize_decl (gcov_profile_prefix_decl); >> } >> + >> + if (!gcov_test_coverage_decl) >> +{ >> + /* Initialize __gcov_test_coverage to 1 if -ftest-coverage >> + specified, 0 otherwise. Used by libgcov to determine whether >> + a binary was instrumented for coverage or profile optimization. */ >> + gcov_test_coverage_decl = build_decl ( >> + UNKNOWN_LOCATION, >> + VAR_DECL, >> + get_identifier ("__gcov_test_coverage"), >> + get_gcov_unsigned_t ()); >> + TREE_PUBLIC (gcov_test_coverage_decl) = 1; >> + DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1; >> + DECL_COMDAT_GROUP (gcov_test_coverage_decl) >> + = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl); >> + TREE_STATIC (gcov_test_coverage_decl) = 1; >> + DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst ( >> + get_gcov_unsigned_t (), >> + flag_test_coverage ? 1 : 0); >> + varpool_finalize_decl (gcov_test_coverage_decl); >> +} >> } >> >> /* Initialization function for FDO sampling. */ >> Index: libgcc/libgcov-driver.c >> === >> --- libgcc/libgcov-driver.c (revision 212044) >> +++ libgcc/libgcov-driver.c (working copy) >> @@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void) >> char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled; >> extern void __gcov_flush (void); >> char *(*__gcov_dummy_ref6)(void) = &__gcov_flush; >> +extern unsigned int __gcov_profiling_enabled (void); >> +char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_enabled; >> >> /* Default callback function for profile instrumentation callback. */ >> extern void __coverage_callback (gcov_type, int); >> Index: libgcc/libgcov-interface.c >> === >> --- libgcc/libgcov-interface.c (revision 212044) >> +++ libgcc/libgcov-interface.c (working copy) >> @@ -116,6 +116,20 @@ __gcov_dump (void) >>set_gcov_dump_complete (); >> } >> >> +/* Emitted in coverage.c. */ >> +extern gcov_unsigned_t __gcov_test_coverage; >> + >> +unsigned int __gcov_profiling_enabled (void); >> + >> +/* Function that can be called from application to distinguish binaries >> + instrumented from coverage fro those instrumented for profiling >> + (e.g. -fprofile-generate). */ >> + >> +unsigned int __gcov_profiling_enabled (void) >> +{ >> + return !__gcov_test_coverage; >> +} >> + >> #endif /* L_gcov_dump */ >> >> #ifdef L_gcov_sampling >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate
New patch below. Retested. Ok for google branches? Thanks, Teresa 2014-07-02 Teresa Johnson Google ref b/15378201. * gcc/tree-profile.c (gcov_test_coverage_decl): Declare. (tree_init_instrumentation): Initialize gcov_test_coverage_decl. * libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define. * libgcc/libgcov-interface.c (__gcov_profiling_for_test_coverage): New function. Index: gcc/tree-profile.c === --- gcc/tree-profile.c (revision 212044) +++ gcc/tree-profile.c (working copy) @@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL /* extern gcov_unsigned_t __gcov_profile_prefix */ static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE; +/* extern gcov_unsigned_t __gcov_test_coverage */ +static tree GTY(()) gcov_test_coverage_decl = NULL_TREE; + /* extern gcov_unsigned_t __gcov_sampling_period */ static GTY(()) tree gcov_sampling_period_decl = NULL_TREE; @@ -498,6 +501,27 @@ tree_init_instrumentation (void) DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr; varpool_finalize_decl (gcov_profile_prefix_decl); } + + if (!gcov_test_coverage_decl) +{ + /* Initialize __gcov_test_coverage to 1 if -ftest-coverage + specified, 0 otherwise. Used by libgcov to determine whether + a binary was instrumented for coverage or profile optimization. */ + gcov_test_coverage_decl = build_decl ( + UNKNOWN_LOCATION, + VAR_DECL, + get_identifier ("__gcov_test_coverage"), + get_gcov_unsigned_t ()); + TREE_PUBLIC (gcov_test_coverage_decl) = 1; + DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1; + DECL_COMDAT_GROUP (gcov_test_coverage_decl) + = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl); + TREE_STATIC (gcov_test_coverage_decl) = 1; + DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst ( + get_gcov_unsigned_t (), + flag_test_coverage ? 1 : 0); + varpool_finalize_decl (gcov_test_coverage_decl); +} } /* Initialization function for FDO sampling. */ Index: libgcc/libgcov-driver.c === --- libgcc/libgcov-driver.c (revision 212044) +++ libgcc/libgcov-driver.c (working copy) @@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void) char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled; extern void __gcov_flush (void); char *(*__gcov_dummy_ref6)(void) = &__gcov_flush; +extern unsigned int __gcov_profiling_for_test_coverage (void); +char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_for_test_coverage; /* Default callback function for profile instrumentation callback. */ extern void __coverage_callback (gcov_type, int); Index: libgcc/libgcov-interface.c === --- libgcc/libgcov-interface.c (revision 212044) +++ libgcc/libgcov-interface.c (working copy) @@ -116,6 +116,20 @@ __gcov_dump (void) set_gcov_dump_complete (); } +/* Emitted in coverage.c. */ +extern gcov_unsigned_t __gcov_test_coverage; + +unsigned int __gcov_profiling_for_test_coverage (void); + +/* Function that can be called from application to distinguish binaries + instrumented for coverage from those instrumented for profile + optimization (e.g. -fprofile-generate). */ + +unsigned int __gcov_profiling_for_test_coverage (void) +{ + return __gcov_test_coverage; +} + #endif /* L_gcov_dump */ #ifdef L_gcov_sampling On Wed, Jul 2, 2014 at 1:25 PM, Xinliang David Li wrote: > The reason is that test coverage is using the same underlying > profiling. Returning false when calling '__gcov_profiling_enabled()) > in coverage mode is a little misleading. > > David > > On Wed, Jul 2, 2014 at 1:22 PM, Teresa Johnson wrote: >> On Wed, Jul 2, 2014 at 1:15 PM, Xinliang David Li wrote: >>> Should the interface be something like: >>> >>> int __gcov_profiling_for_test_coverage(void)? >> >> I was equating the term "profiling" with -fprofile-generate, as >> opposed to -ftest-coverage instrumentation. But I can change it to >> this if you think that is clearer. >> >> Teresa >> >>> >>> David >>> >>> >>> On Wed, Jul 2, 2014 at 12:55 PM, Teresa Johnson >>> wrote: >>>> The following patch adds support for a new libgcov interface, >>>> __gcov_profiling_enabled, that can be used by applications to >>>> determine whether a binary has been instrumented for test coverage or >>>> profile optimization. >>>> >>>> Passes regression tests and manual testing with different options. Ok >>>> for google branches? >>>> >>>> Thanks, >>>
Re: [PATCH] Handle more COMDAT profiling issues
Finally had some time to pick this up again. See responses and questions below. But it looks like you are commenting on a stale version of the patch. See the update I sent in March: https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01176.html On Sun, May 25, 2014 at 10:43 PM, Jan Hubicka wrote: >> This patch attempts to address the lost profile issue for COMDATs in >> more circumstances, exposed by function splitting. >> >> My earlier patch handled the case where the comdat had 0 counts since >> the linker kept the copy in a different module. In that case we >> prevent the guessed frequencies on 0-count functions from being >> dropped by counts_to_freq, and then later mark any reached via >> non-zero callgraph edges as guessed. Finally, when one such 0-count >> comdat is inlined the call count is propagated to the callee blocks >> using the guessed probabilities. >> >> However, in this case, there was a comdat that had a very small >> non-zero count, that was being inlined to a much hotter callsite. This >> could happen when there was a copy that was ipa-inlined >> in the profile gen compile, so the copy in that module gets some >> non-zero counts from the ipa inlined instance, but when the out of >> line copy was eliminated by the linker (selected from a different >> module). In this case the inliner was scaling the bb counts up quite a >> lot when inlining. The problem is that you most likely can't trust >> that the 0 count bbs in such a case are really not executed by the >> callsite it is being into, since the counts are very small and >> correspond to a different callsite. In some internal C++ applications >> I am seeing that we execute out of the split cold portion of COMDATs >> for this reason. > > With all the problems we have with COMDATs, it still seems to me that perhaps > best approach would be to merge them in runtime, as we discussed some time ago > and incrementally handle the problem how to get callstack sensitive profiles. I recently committed some functionality to a google branch to do some COMDAT profile merging at runtime, but that was to handle all-zero count copies and operates on top of the LIPO runtime infrastructure. We don't do any fixup in non-zero count COMDATs to avoid losing context-sensitive profiles. >> >> This problem is more complicated to address than the 0-count instance, >> because we need the cgraph to determine which functions to drop the >> profile on, and at that point the estimated frequencies have already >> been overwritten by counts_to_freqs. To handle this broader case, I >> have changed the drop_profile routine to simply re-estimate the >> probabilities/frequencies (and translate these into counts scaled from >> the incoming call edge counts). This unfortunately necessitates >> rebuilding the cgraph, to propagate the new synthesized counts and >> avoid checking failures downstream. But it will only be rebuilt if we >> dropped any profiles. With this solution, some of the older approach >> can be removed (e.g. propagating counts using the guessed >> probabilities during inlining). >> >> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >> Also tested on >> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >> >> Thanks, >> Teresa > > 2014-02-12 Teresa Johnson > > * graphite.c (graphite_finalize): Pass new parameter. > * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. > * predict.c (tree_estimate_probability): New parameter. > (tree_estimate_probability_worker): Renamed from > tree_estimate_probability_driver. > (tree_reestimate_probability): New function. > (tree_estimate_probability_driver): Invoke > tree_estimate_probability_worker. > (freqs_to_counts): Move here from tree-inline.c. > (drop_profile): Re-estimated profiles when dropping counts. > (handle_missing_profiles): Drop for some non-zero functions as well, > get counts from bbs to support invocation before cgraph rebuild. > (counts_to_freqs): Remove code obviated by reestimation. > * predict.h (tree_estimate_probability): Update declaration. > * tree-inline.c (freqs_to_counts): Move to predict.c. > (copy_cfg_body): Remove code obviated by reestimation. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles > before cgraph rebuild. > > Index: params.def > === > --- params.def (revision 207436) > +++ params.def (working copy) > @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_B
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
Here is the new patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Teresa 2014-11-13 gcc: PR tree-optimization/63841 * tree.c (initializer_zerop): A constructor with no elements does not zero initialize. gcc/testsuite: PR tree-optimization/63841 * g++.dg/tree-ssa/pr63841.C: New test. Index: tree.c === --- tree.c (revision 217190) +++ tree.c (working copy) @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) { unsigned HOST_WIDE_INT idx; +if (TREE_CLOBBER_P (init)) + return false; FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) if (!initializer_zerop (elt)) return false; Index: testsuite/g++.dg/tree-ssa/pr63841.C === --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include +#include + +std::string __attribute__ ((noinline)) comp_test_write() { + std::string data; + + for (int i = 0; i < 2; ++i) { +char b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +std::string __attribute__ ((noinline)) comp_test_write_good() { + std::string data; + + char b; + for (int i = 0; i < 2; ++i) { +b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +int main() { + std::string good = comp_test_write_good(); + printf("expected: %hx\n", *(short*)good.c_str()); + + std::string bad = comp_test_write(); + printf("got: %hx\n", *(short*)bad.c_str()); + + return good != bad; +} On Wed, Nov 12, 2014 at 9:40 PM, Andrew Pinski wrote: > On Wed, Nov 12, 2014 at 9:38 PM, Teresa Johnson wrote: >> On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski wrote: >>> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson >>> wrote: >>>> Added testcase. Here is the new patch: >>>> >>>> 2014-11-12 >>>> >>>> gcc: >>>> PR tree-optimization/63841 >>>> * tree.c (initializer_zerop): A constructor with no elements >>>> does not zero initialize. >>> >>> Actually an empty constructor does zero initialize. A clobber does not. >> >> Ok, thanks, I wasn't sure. >> >>> >>> The check you want is TREE_CLOBBER_P instead. >> >> So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would >> return false, right? I will make that change and retest. > > Yes that is correct. Note TREE_CLOBBER_P already checks if it is an > constructor. > > Thanks, > Andrew > >> >> Thanks, >> Teresa >> >>> >>> Thanks, >>> Andrew >>> >>> >>>> >>>> gcc/testsuite: >>>> * g++.dg/tree-ssa/pr63841.C: New test. >>>> >>>> Index: tree.c >>>> === >>>> --- tree.c (revision 217190) >>>> +++ tree.c (working copy) >>>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) >>>>{ >>>> unsigned HOST_WIDE_INT idx; >>>> >>>> +if (!CONSTRUCTOR_NELTS (init)) >>>> + return false; >>>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) >>>> if (!initializer_zerop (elt)) >>>> return false; >>>> Index: testsuite/g++.dg/tree-ssa/pr63841.C >>>> === >>>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) >>>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) >>>> @@ -0,0 +1,38 @@ >>>> +/* { dg-do run } */ >>>> +/* { dg-options "-O2" } */ >>>> + >>>> +#include >>>> +#include >>>> + >>>> +std::string __attribute__ ((noinline)) comp_test_write() { >>>> + std::string data; >>>> + >>>> + for (int i = 0; i < 2; ++i) { >>>> +char b = 1 >> (i * 8); >>>> +data.append(&b, 1); >>>> + } >>>> + >>>> + return data; >>>> +} >>>> + >>>> +std::string __attribute__ ((noinline)) comp_test_write_good() { >>>> + std::string data; >>>> + >>>> + char b; >>>> + for (int i = 0; i < 2; ++i) {
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 6:32 AM, Richard Biener wrote: > On Thu, Nov 13, 2014 at 3:32 PM, Richard Biener > wrote: >> On Thu, Nov 13, 2014 at 3:20 PM, Teresa Johnson wrote: >>> Here is the new patch. Bootstrapped and tested on >>> x86_64-unknown-linux-gnu. OK for trunk? >> >> Ok for trunk and branches. > > Err - please fix the changelog entry wording to "A clobber does > not zero inintiaize" Thanks for catching that - copied from the original patch. Will commit to trunk now then to gcc-4_9 after regression testing with the branch. Thanks, Teresa > >> Thanks, >> Richard. >> >>> Thanks, >>> Teresa >>> >>> 2014-11-13 >>> >>> gcc: >>> PR tree-optimization/63841 >>> * tree.c (initializer_zerop): A constructor with no elements >>> does not zero initialize. >>> >>> gcc/testsuite: >>> PR tree-optimization/63841 >>> * g++.dg/tree-ssa/pr63841.C: New test. >>> >>> Index: tree.c >>> === >>> --- tree.c (revision 217190) >>> +++ tree.c (working copy) >>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) >>>{ >>> unsigned HOST_WIDE_INT idx; >>> >>> +if (TREE_CLOBBER_P (init)) >>> + return false; >>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) >>> if (!initializer_zerop (elt)) >>> return false; >>> Index: testsuite/g++.dg/tree-ssa/pr63841.C >>> === >>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) >>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) >>> @@ -0,0 +1,38 @@ >>> +/* { dg-do run } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +#include >>> +#include >>> + >>> +std::string __attribute__ ((noinline)) comp_test_write() { >>> + std::string data; >>> + >>> + for (int i = 0; i < 2; ++i) { >>> +char b = 1 >> (i * 8); >>> +data.append(&b, 1); >>> + } >>> + >>> + return data; >>> +} >>> + >>> +std::string __attribute__ ((noinline)) comp_test_write_good() { >>> + std::string data; >>> + >>> + char b; >>> + for (int i = 0; i < 2; ++i) { >>> +b = 1 >> (i * 8); >>> +data.append(&b, 1); >>> + } >>> + >>> + return data; >>> +} >>> + >>> +int main() { >>> + std::string good = comp_test_write_good(); >>> + printf("expected: %hx\n", *(short*)good.c_str()); >>> + >>> + std::string bad = comp_test_write(); >>> + printf("got: %hx\n", *(short*)bad.c_str()); >>> + >>> + return good != bad; >>> +} >>> >>> On Wed, Nov 12, 2014 at 9:40 PM, Andrew Pinski wrote: >>>> On Wed, Nov 12, 2014 at 9:38 PM, Teresa Johnson >>>> wrote: >>>>> On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski wrote: >>>>>> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson >>>>>> wrote: >>>>>>> Added testcase. Here is the new patch: >>>>>>> >>>>>>> 2014-11-12 >>>>>>> >>>>>>> gcc: >>>>>>> PR tree-optimization/63841 >>>>>>> * tree.c (initializer_zerop): A constructor with no elements >>>>>>> does not zero initialize. >>>>>> >>>>>> Actually an empty constructor does zero initialize. A clobber does not. >>>>> >>>>> Ok, thanks, I wasn't sure. >>>>> >>>>>> >>>>>> The check you want is TREE_CLOBBER_P instead. >>>>> >>>>> So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would >>>>> return false, right? I will make that change and retest. >>>> >>>> Yes that is correct. Note TREE_CLOBBER_P already checks if it is an >>>> constructor. >>>> >>>> Thanks, >>>> Andrew >>>> >>>>> >>>>> Thanks, >>>>> Teresa >>>>> >>>>>> >>>>>> Thanks, >>>>>> Andrew &
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 6:36 AM, Jakub Jelinek wrote: > On Thu, Nov 13, 2014 at 06:20:16AM -0800, Teresa Johnson wrote: >> Here is the new patch. Bootstrapped and tested on >> x86_64-unknown-linux-gnu. OK for trunk? >> >> Thanks, >> Teresa >> >> 2014-11-13 >> >> gcc: >> PR tree-optimization/63841 >> * tree.c (initializer_zerop): A constructor with no elements >> does not zero initialize. >> >> gcc/testsuite: >> PR tree-optimization/63841 >> * g++.dg/tree-ssa/pr63841.C: New test. >> >> Index: tree.c >> === >> --- tree.c (revision 217190) >> +++ tree.c (working copy) >> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) >>{ >> unsigned HOST_WIDE_INT idx; >> >> +if (TREE_CLOBBER_P (init)) >> + return false; > > Wrong formatting. Sorry, not sure I understand why? My mailer does tend to eat spaces, but it is indented the correct amount and I think has the right spaces within the line. > > Also, while this perhaps is useful, I'd say the right fix is that > strlen_optimize_stmt > should just ignore gimple_clobber_p (stmt) statements (well, call > the maybe_invalidate at the end for them). > So > - else if (is_gimple_assign (stmt)) > + else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) > in strlen_optimize_stmt. Ok, I have held off on my commit for now. I considered fixing this in tree-ssa-strlen, but thought this was a potentially larger problem with initializer_zerop, where it shouldn't be considering a clobber to be a zero init and might be affecting other callers as well. If we make the change to initializer_zerop is it still useful to change tree-strlen? Teresa > > Jakub -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek wrote: > On Thu, Nov 13, 2014 at 06:46:23AM -0800, Teresa Johnson wrote: >> >> --- tree.c (revision 217190) >> >> +++ tree.c (working copy) >> >> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) >> >>{ >> >> unsigned HOST_WIDE_INT idx; >> >> >> >> +if (TREE_CLOBBER_P (init)) >> >> + return false; >> > >> > Wrong formatting. >> >> Sorry, not sure I understand why? My mailer does tend to eat spaces, >> but it is indented the correct amount and I think has the right spaces >> within the line. > > I meant that you are using spaces instead of tabs and the unsigned line > above it being one char off suggested to me visually it has the tab in there > (apparently it is eaten too). Use MUA that doesn't eat it, or convince your > employer that tabs are important in e-mails? ;) > > If you commit tabs in there, it is fine with me. Changed the spaces to tabs to match the surrounding lines and committed to trunk (r217505). > >> Ok, I have held off on my commit for now. I considered fixing this in >> tree-ssa-strlen, but thought this was a potentially larger problem >> with initializer_zerop, where it shouldn't be considering a clobber to >> be a zero init and might be affecting other callers as well. > > As I said, I think your tree.c change is useful, but perhaps too risky > for the release branches, because we don't know what all it will affect? > >> If we make the change to initializer_zerop is it still useful to >> change tree-strlen? > > I think it is better to be clear on it that right now we want clobbers > to clobber the memory for strlen pass purposes, maybe in the future we want > to perform some optimizations for them as Richard suggested in the PR > (though, extending the lifetime in that case by removing the clobber > shouldn't be done unconditionally (i.e. limited to when we'd actually want > to benefit from the known length) and assuming we are close to the clobber > (i.e. it is fine to extend it a little bit, but not extend the lifetime > across multi-megabyte function e.g.). > And for release branches I'd really prefer tree-ssa-strlen.c change. Ok, I started testing the initializer_zerop change on the 4_9 branch, will also test the strlen fix and send that patch for review here when it completes. Thanks, Teresa > > Jakub -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 7:39 AM, Teresa Johnson wrote: > On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek wrote: >> And for release branches I'd really prefer tree-ssa-strlen.c change. > > Ok, I started testing the initializer_zerop change on the 4_9 branch, > will also test the strlen fix and send that patch for review here when > it completes. Here is the more conservative patch for 4_9. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for gcc-4_9 branch? Thanks, Teresa 2014-11-13 Teresa Johnson gcc: PR tree-optimization/63841 * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers. 2014-11-13 Teresa Johnson gcc/testsuite: PR tree-optimization/63841 * testsuite/g++.dg/tree-ssa/pr63841.C: New test. Index: tree-ssa-strlen.c === --- tree-ssa-strlen.c (revision 217503) +++ tree-ssa-strlen.c (working copy) @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) break; } } - else if (is_gimple_assign (stmt)) + else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) { tree lhs = gimple_assign_lhs (stmt); Index: testsuite/g++.dg/tree-ssa/pr63841.C === --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include +#include + +std::string __attribute__ ((noinline)) comp_test_write() { + std::string data; + + for (int i = 0; i < 2; ++i) { +char b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +std::string __attribute__ ((noinline)) comp_test_write_good() { + std::string data; + + char b; + for (int i = 0; i < 2; ++i) { +b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +int main() { + std::string good = comp_test_write_good(); + printf("expected: %hx\n", *(short*)good.c_str()); + + std::string bad = comp_test_write(); + printf("got: %hx\n", *(short*)bad.c_str()); + + return good != bad; +} -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 12:55 PM, Jakub Jelinek wrote: > On Thu, Nov 13, 2014 at 12:51:05PM -0800, Teresa Johnson wrote: >> On Thu, Nov 13, 2014 at 7:39 AM, Teresa Johnson wrote: >> > On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek wrote: >> >> And for release branches I'd really prefer tree-ssa-strlen.c change. >> > >> > Ok, I started testing the initializer_zerop change on the 4_9 branch, >> > will also test the strlen fix and send that patch for review here when >> > it completes. >> >> Here is the more conservative patch for 4_9. Bootstrapped and tested >> on x86_64-unknown-linux-gnu. Ok for gcc-4_9 branch? > > Ok, thanks. But I have a comment regarding the test below: > >> 2014-11-13 Teresa Johnson >> >> gcc: >> PR tree-optimization/63841 >> * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers. >> >> 2014-11-13 Teresa Johnson >> >> gcc/testsuite: >> PR tree-optimization/63841 >> * testsuite/g++.dg/tree-ssa/pr63841.C: New test. >> >> Index: tree-ssa-strlen.c >> === >> --- tree-ssa-strlen.c (revision 217503) >> +++ tree-ssa-strlen.c (working copy) >> @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) >> break; >> } >> } >> - else if (is_gimple_assign (stmt)) >> + else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) >> { >>tree lhs = gimple_assign_lhs (stmt); >> >> Index: testsuite/g++.dg/tree-ssa/pr63841.C >> === >> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) >> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) >> @@ -0,0 +1,38 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +#include >> +#include >> + >> +std::string __attribute__ ((noinline)) comp_test_write() { >> + std::string data; >> + >> + for (int i = 0; i < 2; ++i) { >> +char b = 1 >> (i * 8); >> +data.append(&b, 1); >> + } >> + >> + return data; >> +} >> + >> +std::string __attribute__ ((noinline)) comp_test_write_good() { >> + std::string data; >> + >> + char b; >> + for (int i = 0; i < 2; ++i) { >> +b = 1 >> (i * 8); >> +data.append(&b, 1); >> + } >> + >> + return data; >> +} >> + >> +int main() { >> + std::string good = comp_test_write_good(); >> + printf("expected: %hx\n", *(short*)good.c_str()); >> + >> + std::string bad = comp_test_write(); >> + printf("got: %hx\n", *(short*)bad.c_str()); > > Supposedly the printfs should have been removed and the #include > isn't needed then either. No need to clutter the test output and log files. > On the other side, tests should abort (); or __builtin_abort (); on failure, > not just return non-zero. Ok, sounds good. Here is the new patch. Confirmed that the test case passes with the fix, aborts without it. If this looks good I will also update the trunk version of the test. Thanks, Teresa 2014-11-13 Teresa Johnson gcc: PR tree-optimization/63841 * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers. 2014-11-13 Teresa Johnson gcc/testsuite: PR tree-optimization/63841 * testsuite/g++.dg/tree-ssa/pr63841.C: New test. Index: tree-ssa-strlen.c === --- tree-ssa-strlen.c (revision 217503) +++ tree-ssa-strlen.c (working copy) @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) break; } } - else if (is_gimple_assign (stmt)) + else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) { tree lhs = gimple_assign_lhs (stmt); Index: testsuite/g++.dg/tree-ssa/pr63841.C === --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include + +std::string __attribute__ ((noinline)) comp_test_write() { + std::string data; + + for (int i = 0; i < 2; ++i) { +char b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +std::string __attribute__ ((noinline)) comp_test_write_good() { + std::string data; + + char b; + for (int i = 0; i < 2; ++i) { +b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +int main() { + std::string good = comp_test_write_good(); + std::string bad = comp_test_write(); + + if (good != bad) +__builtin_abort (); +} > >> + >> + return good != bad; >> +} > > Jakub -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
On Thu, Nov 13, 2014 at 1:45 PM, Jakub Jelinek wrote: > On Thu, Nov 13, 2014 at 01:33:07PM -0800, Teresa Johnson wrote: >> > Supposedly the printfs should have been removed and the #include >> > isn't needed then either. No need to clutter the test output and log >> > files. >> > On the other side, tests should abort (); or __builtin_abort (); on >> > failure, >> > not just return non-zero. >> >> Ok, sounds good. Here is the new patch. Confirmed that the test case >> passes with the fix, aborts without it. >> >> If this looks good I will also update the trunk version of the test. > > LGTM, thanks. > > Please commit also the tree-ssa-strlen.c change to the trunk, it will not > hurt. Ok. It is on gcc-4_9 now, will commit both to trunk as soon as regression testing completes. Thanks, Teresa > >> 2014-11-13 Teresa Johnson >> >> gcc: >> PR tree-optimization/63841 >> * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers. >> >> 2014-11-13 Teresa Johnson >> >> gcc/testsuite: >> PR tree-optimization/63841 >> * testsuite/g++.dg/tree-ssa/pr63841.C: New test. >> >> Index: tree-ssa-strlen.c >> === >> --- tree-ssa-strlen.c (revision 217503) >> +++ tree-ssa-strlen.c (working copy) >> @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) >> break; >> } >> } >> - else if (is_gimple_assign (stmt)) >> + else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt)) >> { >>tree lhs = gimple_assign_lhs (stmt); >> >> Index: testsuite/g++.dg/tree-ssa/pr63841.C >> === >> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) >> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) >> @@ -0,0 +1,35 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +#include >> + >> +std::string __attribute__ ((noinline)) comp_test_write() { >> + std::string data; >> + >> + for (int i = 0; i < 2; ++i) { >> +char b = 1 >> (i * 8); >> +data.append(&b, 1); >> + } >> + >> + return data; >> +} >> + >> +std::string __attribute__ ((noinline)) comp_test_write_good() { >> + std::string data; >> + >> + char b; >> + for (int i = 0; i < 2; ++i) { >> +b = 1 >> (i * 8); >> +data.append(&b, 1); >> + } >> + >> + return data; >> +} >> + >> +int main() { >> + std::string good = comp_test_write_good(); >> + std::string bad = comp_test_write(); >> + >> + if (good != bad) >> +__builtin_abort (); >> +} > > Jakub -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
On Wed, Mar 18, 2015 at 9:25 PM, Xinliang David Li wrote: > get_local_tls_init_fn can be called from other contexts other than > 'handle_tls_init' -- is the added new assertion safe ? In fact there is still an issue here, for file static TLS vars. Will work on a fix and send the original test case (forgot to include in first patch) and the new one with the fixed patch. Teresa > > David > > On Wed, Mar 18, 2015 at 9:18 PM, Teresa Johnson > wrote: >> >> Ensure TLS variable wrapper and init functions are recorded at >> the module scope upon creation so that they are cleared when >> popping the module scope in between modules in LIPO mode. >> >> Also, do not generate a local TLS init function (__tls_init) >> for aux functions, only in the primary modules. >> >> Passes regression tests. Ok for Google branches? >> Teresa >> >> 2015-03-18 Teresa Johnson >> >> Google ref b/19618364. >> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module. >> (get_tls_init_fn): Record new global decl. >> (get_tls_wrapper_fn): Ditto. >> (handle_tls_init): Skip aux modules. >> >> Index: cp/decl2.c >> === >> --- cp/decl2.c (revision 221425) >> +++ cp/decl2.c (working copy) >> @@ -3036,6 +3036,9 @@ get_local_tls_init_fn (void) >>DECL_ARTIFICIAL (fn) = true; >>mark_used (fn); >>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); >> + /* In LIPO mode we should not generate local init functions for >> + the aux modules (see handle_tls_init). */ >> + gcc_assert (!L_IPO_IS_AUXILIARY_MODULE); >> } >>return fn; >> } >> @@ -3100,6 +3103,11 @@ get_tls_init_fn (tree var) >>DECL_BEFRIENDING_CLASSES (fn) = var; >> >>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); >> + /* In LIPO mode make sure we record the new global value so that it >> + is cleared before parsing the next aux module. */ >> + if (L_IPO_COMP_MODE && !is_parsing_done_p ()) >> +add_decl_to_current_module_scope (fn, >> + NAMESPACE_LEVEL >> (global_namespace)); >> } >>return fn; >> } >> @@ -3157,6 +3165,11 @@ get_tls_wrapper_fn (tree var) >>DECL_BEFRIENDING_CLASSES (fn) = var; >> >>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); >> + /* In LIPO mode make sure we record the new global value so that it >> + is cleared before parsing the next aux module. */ >> + if (L_IPO_COMP_MODE && !is_parsing_done_p ()) >> +add_decl_to_current_module_scope (fn, >> + NAMESPACE_LEVEL >> (global_namespace)); >> } >>return fn; >> } >> @@ -4179,6 +4192,14 @@ clear_decl_external (struct cgraph_node *node, voi >> static void >> handle_tls_init (void) >> { >> + /* In LIPO mode we should not generate local init functions for >> + aux modules. The wrapper will reference the variable's init function >> + that is defined in its own primary module. Otherwise there is >> + a name conflict between the primary and aux __tls_init functions, >> + and difficulty ensuring each TLS variable is initialized exactly >> once. */ >> + if (L_IPO_IS_AUXILIARY_MODULE) >> +return; >> + >>tree vars = prune_vars_needing_no_initialization (&tls_aggregates); >>if (vars == NULL_TREE) >> return; >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
New patch below. Passes regression tests plus internal application build. 2015-03-19 Teresa Johnson gcc/ Google ref b/19618364. * cp/decl2.c (get_local_tls_init_fn): Assert on aux module. (get_tls_init_fn): Promote non-public init functions if necessary in LIPO mode, record new global at module scope. (get_tls_wrapper_fn): Record new global at module scope. (handle_tls_init): Skip in aux module, setup alias in exported primary module. testsuite/ Google ref b/19618364. * testsuite/g++.dg/tree-prof/lipo/tls.h: New test. * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto. Index: cp/decl2.c === --- cp/decl2.c (revision 221425) +++ cp/decl2.c (working copy) @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "c-family/c-ada-spec.h" #include "asan.h" +#include "gcov-io.h" extern cpp_reader *parse_in; @@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var) static tree get_local_tls_init_fn (void) { + /* In LIPO mode we should not generate local init functions for + the aux modules (see handle_tls_init). */ + gcc_assert (!L_IPO_IS_AUXILIARY_MODULE); tree sname = get_identifier ("__tls_init"); tree fn = IDENTIFIER_GLOBAL_VALUE (sname); if (!fn) @@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var) if (!flag_extern_tls_init && DECL_EXTERNAL (var)) return NULL_TREE; + /* In LIPO mode we should not generate local init functions for + aux modules. The wrapper will reference the variable's init function + that is defined in its own primary module. Otherwise there is + a name conflict between the primary and aux __tls_init functions, + and difficulty ensuring each TLS variable is initialized exactly once. + Therefore, if this is an aux module or an exported primary module, we + need to ensure that the init function is available externally by + promoting it if it is not already public. This is similar to the + handling in promote_static_var_func, but we do it as the init function + is created to avoid the need to detect and properly promote this + artificial decl later on. */ + bool promote = L_IPO_IS_AUXILIARY_MODULE || + (L_IPO_COMP_MODE && PRIMARY_MODULE_EXPORTED); + #ifdef ASM_OUTPUT_DEF /* If the variable is internal, or if we can't generate aliases, - call the local init function directly. */ - if (!TREE_PUBLIC (var)) + call the local init function directly. Don't do this if we + are in LIPO mode an may need to export the init function. Note + that get_local_tls_init_fn will assert if it is called on an aux + module. */ + if (!TREE_PUBLIC (var) && !promote) #endif return get_local_tls_init_fn (); tree sname = mangle_tls_init_fn (var); + if (promote) +{ + const char *name = IDENTIFIER_POINTER (sname); + char *assembler_name = (char*) alloca (strlen (name) + 30); + sprintf (assembler_name, "%s.cmo.%u", name, current_module_id); + sname = get_identifier (assembler_name); +} + tree fn = IDENTIFIER_GLOBAL_VALUE (sname); if (!fn) { @@ -3072,7 +3101,18 @@ get_tls_init_fn (tree var) build_function_type (void_type_node, void_list_node)); SET_DECL_LANGUAGE (fn, lang_c); - TREE_PUBLIC (fn) = TREE_PUBLIC (var); + if (promote) +{ + TREE_PUBLIC (fn) = 1; + DECL_VISIBILITY (fn) = VISIBILITY_HIDDEN; + DECL_VISIBILITY_SPECIFIED (fn) = 1; +} + else +{ + TREE_PUBLIC (fn) = TREE_PUBLIC (var); + DECL_VISIBILITY (fn) = DECL_VISIBILITY (var); + DECL_VISIBILITY_SPECIFIED (fn) = DECL_VISIBILITY_SPECIFIED (var); +} DECL_ARTIFICIAL (fn) = true; DECL_COMDAT (fn) = DECL_COMDAT (var); DECL_EXTERNAL (fn) = DECL_EXTERNAL (var); @@ -3091,8 +3131,6 @@ get_tls_init_fn (tree var) else DECL_WEAK (fn) = DECL_WEAK (var); } - DECL_VISIBILITY (fn) = DECL_VISIBILITY (var); - DECL_VISIBILITY_SPECIFIED (fn) = DECL_VISIBILITY_SPECIFIED (var); DECL_DLLIMPORT_P (fn) = DECL_DLLIMPORT_P (var); DECL_IGNORED_P (fn) = 1; mark_used (fn); @@ -3100,6 +3138,11 @@ get_tls_init_fn (tree var) DECL_BEFRIENDING_CLASSES (fn) = var; SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); + /* In LIPO mode make sure we record the new global value so that it + is cleared before parsing the next aux module. */
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li wrote: > does generate_tls_wrapper also need to be suppressed for aux module? No, we generate the wrapper in the aux module and use it to access the TLS variable and invoke it's init function (which is defined in the variable's own module). Presumably we could generate that only in the original module and promote it if necessary, but generating it in the aux module does allow it to be inlined. This is essentially how the TLS accesses are handled for extern TLS variables defined elsewhere (wrapper generated in referring module). Teresa > > David > > On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson wrote: >> New patch below. Passes regression tests plus internal application build. >> >> 2015-03-19 Teresa Johnson >> >> gcc/ >> Google ref b/19618364. >> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module. >> (get_tls_init_fn): Promote non-public init functions if necessary >> in LIPO mode, record new global at module scope. >> (get_tls_wrapper_fn): Record new global at module scope. >> (handle_tls_init): Skip in aux module, setup alias in exported >> primary module. >> >> testsuite/ >> Google ref b/19618364. >> * testsuite/g++.dg/tree-prof/lipo/tls.h: New test. >> * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto. >> * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto. >> * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto. >> * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto. >> * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto. >> >> Index: cp/decl2.c >> === >> --- cp/decl2.c (revision 221425) >> +++ cp/decl2.c (working copy) >> @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see >> #include "langhooks.h" >> #include "c-family/c-ada-spec.h" >> #include "asan.h" >> +#include "gcov-io.h" >> >> extern cpp_reader *parse_in; >> >> @@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var) >> static tree >> get_local_tls_init_fn (void) >> { >> + /* In LIPO mode we should not generate local init functions for >> + the aux modules (see handle_tls_init). */ >> + gcc_assert (!L_IPO_IS_AUXILIARY_MODULE); >>tree sname = get_identifier ("__tls_init"); >>tree fn = IDENTIFIER_GLOBAL_VALUE (sname); >>if (!fn) >> @@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var) >>if (!flag_extern_tls_init && DECL_EXTERNAL (var)) >> return NULL_TREE; >> >> + /* In LIPO mode we should not generate local init functions for >> + aux modules. The wrapper will reference the variable's init function >> + that is defined in its own primary module. Otherwise there is >> + a name conflict between the primary and aux __tls_init functions, >> + and difficulty ensuring each TLS variable is initialized exactly once. >> + Therefore, if this is an aux module or an exported primary module, we >> + need to ensure that the init function is available externally by >> + promoting it if it is not already public. This is similar to the >> + handling in promote_static_var_func, but we do it as the init function >> + is created to avoid the need to detect and properly promote this >> + artificial decl later on. */ >> + bool promote = L_IPO_IS_AUXILIARY_MODULE || >> + (L_IPO_COMP_MODE && PRIMARY_MODULE_EXPORTED); >> + >> #ifdef ASM_OUTPUT_DEF >>/* If the variable is internal, or if we can't generate aliases, >> - call the local init function directly. */ >> - if (!TREE_PUBLIC (var)) >> + call the local init function directly. Don't do this if we >> + are in LIPO mode an may need to export the init function. Note >> + that get_local_tls_init_fn will assert if it is called on an aux >> + module. */ >> + if (!TREE_PUBLIC (var) && !promote) >> #endif >> return get_local_tls_init_fn (); >> >>tree sname = mangle_tls_init_fn (var); >> + if (promote) >> +{ >> + const char *name = IDENTIFIER_POINTER (sname); >> + char *assembler_name = (char*) alloca (strlen (name) + 30); >> + sprintf (assembler_name, "%s.cmo.%u", name, current_module_id); >> + sname = get_identifier (assembler_name); >> +} >> + >>tree fn = IDENTIFIER_GLOBAL_VALUE (sname); >
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
On Thu, Mar 19, 2015 at 8:00 PM, Xinliang David Li wrote: > ok -- then there is an over assertion in get_local_tls_init_fn. The > method can be called for aux module -- the decl created will also be > promoted. No, it won't ever be called for an aux module. Both callsites are guarded (handle_tls_init has an early return at the top, and the callsite is guarded in get_tls_iniit_fn). > > Also can we rely on late promotion (special case of artificial > function __tls_init? This can avoid duplicated logic here. We don't need to promote __tls_init, but rather the init functions for each TLS variable (which are each aliased to __tls_init). I thought about both approaches, but promoting it as it was created seemed more straightforward. I can give that approach a try though if you prefer it (special casing DECL_ARTIFICIAL functions that start with _ZTH). Incidentally they are not marked static, so that would also need to be done to get them promoted. Teresa > > David > > On Thu, Mar 19, 2015 at 5:51 PM, Teresa Johnson wrote: >> On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li >> wrote: >>> does generate_tls_wrapper also need to be suppressed for aux module? >> >> No, we generate the wrapper in the aux module and use it to access the >> TLS variable and invoke it's init function (which is defined in the >> variable's own module). Presumably we could generate that only in the >> original module and promote it if necessary, but generating it in the >> aux module does allow it to be inlined. This is essentially how the >> TLS accesses are handled for extern TLS variables defined elsewhere >> (wrapper generated in referring module). >> >> Teresa >> >>> >>> David >>> >>> On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson >>> wrote: >>>> New patch below. Passes regression tests plus internal application build. >>>> >>>> 2015-03-19 Teresa Johnson >>>> >>>> gcc/ >>>> Google ref b/19618364. >>>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module. >>>> (get_tls_init_fn): Promote non-public init functions if necessary >>>> in LIPO mode, record new global at module scope. >>>> (get_tls_wrapper_fn): Record new global at module scope. >>>> (handle_tls_init): Skip in aux module, setup alias in exported >>>> primary module. >>>> >>>> testsuite/ >>>> Google ref b/19618364. >>>> * testsuite/g++.dg/tree-prof/lipo/tls.h: New test. >>>> * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto. >>>> * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto. >>>> * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto. >>>> * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto. >>>> * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto. >>>> >>>> Index: cp/decl2.c >>>> === >>>> --- cp/decl2.c (revision 221425) >>>> +++ cp/decl2.c (working copy) >>>> @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see >>>> #include "langhooks.h" >>>> #include "c-family/c-ada-spec.h" >>>> #include "asan.h" >>>> +#include "gcov-io.h" >>>> >>>> extern cpp_reader *parse_in; >>>> >>>> @@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var) >>>> static tree >>>> get_local_tls_init_fn (void) >>>> { >>>> + /* In LIPO mode we should not generate local init functions for >>>> + the aux modules (see handle_tls_init). */ >>>> + gcc_assert (!L_IPO_IS_AUXILIARY_MODULE); >>>>tree sname = get_identifier ("__tls_init"); >>>>tree fn = IDENTIFIER_GLOBAL_VALUE (sname); >>>>if (!fn) >>>> @@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var) >>>>if (!flag_extern_tls_init && DECL_EXTERNAL (var)) >>>> return NULL_TREE; >>>> >>>> + /* In LIPO mode we should not generate local init functions for >>>> + aux modules. The wrapper will reference the variable's init function >>>> + that is defined in its own primary module. Otherwise there is >>>> + a name conflict between the primary and aux __tls_init functions, >>>> + and difficulty ensuring each TLS variable is initialized e
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
On Thu, Mar 19, 2015 at 10:38 PM, Xinliang David Li wrote: > On Thu, Mar 19, 2015 at 9:57 PM, Teresa Johnson wrote: >> On Thu, Mar 19, 2015 at 8:00 PM, Xinliang David Li >> wrote: >>> ok -- then there is an over assertion in get_local_tls_init_fn. The >>> method can be called for aux module -- the decl created will also be >>> promoted. >> >> No, it won't ever be called for an aux module. Both callsites are >> guarded (handle_tls_init has an early return at the top, and the >> callsite is guarded in get_tls_iniit_fn). >> >>> >>> Also can we rely on late promotion (special case of artificial >>> function __tls_init? This can avoid duplicated logic here. >> >> We don't need to promote __tls_init, but rather the init functions for >> each TLS variable (which are each aliased to __tls_init). > >> I thought >> about both approaches, but promoting it as it was created seemed more >> straightforward. I can give that approach a try though if you prefer >> it (special casing DECL_ARTIFICIAL functions that start with _ZTH). >> Incidentally they are not marked static, so that would also need to be >> done to get them promoted. > > For public tls symbols in aux module, tls wrappers will reference > _ZTHxx init functions which are aliases to __tls_init. If __tls_init > is not created for aux modules, what symbol will those _ZTHxx > funcitons aliased to? > > Is it better just let _tls_init function to be created as usual, but > let the standard promotion kick in later? The ZTHxx functions can > still be marked as alias to the promoted __tls_init? So there are 3 function decls being generated for TLS variables: 1) A single __tls_init function per module This initializes *all* TLS variables in the module, guarded by a single __tls_guard variable to ensure it is only done once. 2) One _ZTHxx init function per TLS variable Each of these is aliased to the module's __tls_init 3) One _ZTWxx wrapper function per TLS variable This is called in place of accesses to that TLS variable. It invokes the _ZTHxx init function for the variable to initialize it if necessary. I am concerned about generating the aux module __tls_init functions and then promoting them as they contain initializations for all TLS variables from the aux module - generating it each time the module is included as an aux module seems inefficient at best. What happens in the case where the TLS variable is declared extern and accessed by a separate module (and essentially is what happens with my patch when it is file static, promoted and accessed via an aux module) is this: The _ZTHxx init functions are aliased to __tls_init in the module containing the definition, and are externally-visible (after promoting in the file-static exported primary module case). The external module containing the reference (or the primary module that imported the defining module as an aux module in the LIPO case) generates a _ZTWxx wrapper for the TLS variable, which references its _ZTHxx init function. This ends up resolved to the exported/promoted _ZTHxx symbol in the defining module, which in turn is aliased to the defining module's __tls_init function. Teresa > > David > > > > >> >> Teresa >> >>> >>> David >>> >>> On Thu, Mar 19, 2015 at 5:51 PM, Teresa Johnson >>> wrote: >>>> On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li >>>> wrote: >>>>> does generate_tls_wrapper also need to be suppressed for aux module? >>>> >>>> No, we generate the wrapper in the aux module and use it to access the >>>> TLS variable and invoke it's init function (which is defined in the >>>> variable's own module). Presumably we could generate that only in the >>>> original module and promote it if necessary, but generating it in the >>>> aux module does allow it to be inlined. This is essentially how the >>>> TLS accesses are handled for extern TLS variables defined elsewhere >>>> (wrapper generated in referring module). >>>> >>>> Teresa >>>> >>>>> >>>>> David >>>>> >>>>> On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson >>>>> wrote: >>>>>> New patch below. Passes regression tests plus internal application build. >>>>>> >>>>>> 2015-03-19 Teresa Johnson >>>>>> >>>>>> gcc/ >>>>>> Google ref b/19618364. >>>>>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module. >>>>>> (get_tls
Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode
After offline discussions with David, I changed to the approach of generating __tls_init for the aux module, but then modifying the LIPO code to allow it to be statically promoted and treated as an aux function as a special case of an artificial decl. That works well. New patch below. Fixed the tests to validate rather than print the expected output. Ok for google branches? 2015-03-20 Teresa Johnson gcc/ Google ref b/19618364. * cp/decl2.c (get_local_tls_init_fn): Mark static, record new global at module scope. (get_tls_init_fn): Record new global at module scope. (get_tls_wrapper_fn): Ditto. (handle_tls_init): Suppress alias in aux module. * l-ipo.c (decl_artificial_nopromote): New function. (cgraph_is_aux_decl_external): Call new function. (process_module_scope_static_func): Ditto. testsuite/ Google ref b/19618364. * testsuite/g++.dg/tree-prof/lipo/tls.h: New test. * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto. * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto. Index: cp/decl2.c === --- cp/decl2.c (revision 221425) +++ cp/decl2.c (working copy) @@ -3033,9 +3033,15 @@ get_local_tls_init_fn (void) void_list_node)); SET_DECL_LANGUAGE (fn, lang_c); TREE_PUBLIC (fn) = false; + TREE_STATIC (fn) = true; DECL_ARTIFICIAL (fn) = true; mark_used (fn); SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); + /* In LIPO mode make sure we record the new global value so that it + is cleared before parsing the next aux module. */ + if (L_IPO_COMP_MODE && !is_parsing_done_p ()) +add_decl_to_current_module_scope (fn, + NAMESPACE_LEVEL (global_namespace)); } return fn; } @@ -3100,6 +3106,11 @@ get_tls_init_fn (tree var) DECL_BEFRIENDING_CLASSES (fn) = var; SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); + /* In LIPO mode make sure we record the new global value so that it + is cleared before parsing the next aux module. */ + if (L_IPO_COMP_MODE && !is_parsing_done_p ()) +add_decl_to_current_module_scope (fn, + NAMESPACE_LEVEL (global_namespace)); } return fn; } @@ -3157,6 +3168,11 @@ get_tls_wrapper_fn (tree var) DECL_BEFRIENDING_CLASSES (fn) = var; SET_IDENTIFIER_GLOBAL_VALUE (sname, fn); + /* In LIPO mode make sure we record the new global value so that it + is cleared before parsing the next aux module. */ + if (L_IPO_COMP_MODE && !is_parsing_done_p ()) +add_decl_to_current_module_scope (fn, + NAMESPACE_LEVEL (global_namespace)); } return fn; } @@ -4213,8 +4229,12 @@ handle_tls_init (void) one_static_initialization_or_destruction (var, init, true); #ifdef ASM_OUTPUT_DEF - /* Output init aliases even with -fno-extern-tls-init. */ - if (TREE_PUBLIC (var)) + /* Output init aliases even with -fno-extern-tls-init. Don't emit + aliases in LIPO aux modules, since the corresponding __tls_init + will be static promoted and deleted, so the variable's tls init + function will be resolved by its own primary module. An alias + would prevent the promoted aux __tls_init from being deleted. */ + if (TREE_PUBLIC (var) && !L_IPO_IS_AUXILIARY_MODULE) { tree single_init_fn = get_tls_init_fn (var); if (single_init_fn == NULL_TREE) Index: l-ipo.c === --- l-ipo.c (revision 221425) +++ l-ipo.c (working copy) @@ -1120,6 +1120,27 @@ cgraph_unify_type_alias_sets (void) htab_delete (type_hash_tab); } +/* Return true if DECL is an artificial function that we do not want + to promote and which may not be available in the primary module. + The sole exception is currently __tls_init. */ + +static bool +decl_artificial_nopromote (tree decl) +{ + if (!DECL_ARTIFICIAL (decl)) +return false; + + /* Handle the __tls_init function specially as we do want to promote it and + allow the aux module to be resolved to the version in the primary module. + We check if it is prefixed by __tls_init to catch it after promotion + as well from cgraph_is_aux_decl_external. */ + if (!strncmp (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)), "__tls_init", +10)) +return false; + + return true; +} + /* Return true if NODE->decl from an auxiliary module has external definition (and therefor
Re: [GOOGLE] Only save command-line options for LIPO
Resending in plain text mode. On Wed, Jul 8, 2015 at 11:34 AM, Teresa Johnson wrote: > This patch avoids saving optimization pragma or function level attributes in > the list of command line options saved in the LIPO module info structure. > Passes regression and internal testing. Ok for google/4_9? > > Thanks, > Teresa > > 2015-07-08 Teresa Johnson > > * gcc/c-family/c-common.c (parse_optimize_options): New parameter. > * gcc/opts-global.c (read_cmdline_options): Use new parameter to > guard saving of LIPO command line options. > (decode_options): New parameter. > * gcc/opts.h (extern void decode_options): Ditto. > * gcc/toplev.c (toplev_main): Ditto. > > Index: gcc/c-family/c-common.c > === > --- gcc/c-family/c-common.c (revision 225521) > +++ gcc/c-family/c-common.c (working copy) > @@ -9200,7 +9200,7 @@ parse_optimize_options (tree args, bool attr_p) > &decoded_options_count); >decode_options (&global_options, &global_options_set, > decoded_options, decoded_options_count, > - input_location, global_dc); > + input_location, global_dc, false); > >targetm.override_options_after_change(); > > Index: gcc/opts-global.c > === > --- gcc/opts-global.c (revision 225521) > +++ gcc/opts-global.c (working copy) > @@ -299,7 +299,9 @@ lipo_save_cl_args (struct cl_decoded_option *decod > the results of processing DECODED_OPTIONS and DECODED_OPTIONS_COUNT > in OPTS and OPTS_SET and using DC for diagnostic state. LANG_MASK > contains has a single bit set representing the current language. > - HANDLERS describes what functions to call for the options. */ > + HANDLERS describes what functions to call for the options. > + If COMMAND_LINE is true, this is being invoked for file level command > + line options, otherwise for an optimize pragma or function attribute. > */ > > static void > read_cmdline_options (struct gcc_options *opts, struct gcc_options > *opts_set, > @@ -308,7 +310,8 @@ read_cmdline_options (struct gcc_options *opts, st > location_t loc, > unsigned int lang_mask, > const struct cl_option_handlers *handlers, > - diagnostic_context *dc) > + diagnostic_context *dc, > + bool command_line) > { >unsigned int i; >int force_multi_module = 0; > @@ -341,7 +344,8 @@ read_cmdline_options (struct gcc_options *opts, st >read_cmdline_option (opts, opts_set, >decoded_options + i, loc, lang_mask, handlers, >dc); > - lipo_save_cl_args (decoded_options + i); > + if (command_line) > +lipo_save_cl_args (decoded_options + i); > } > } > > @@ -393,12 +397,14 @@ set_default_handlers (struct cl_option_handlers *h > /* Parse command line options and set default flag values. Do minimal > options processing. The decoded options are in *DECODED_OPTIONS > and *DECODED_OPTIONS_COUNT; settings go in OPTS, OPTS_SET and DC; > - the options are located at LOC. */ > + the options are located at LOC. If COMMAND_LINE is true, this is > + being invoked for file level command line options, otherwise for > + an optimize pragma or function attribute. */ > void > decode_options (struct gcc_options *opts, struct gcc_options *opts_set, > struct cl_decoded_option *decoded_options, > unsigned int decoded_options_count, > - location_t loc, diagnostic_context *dc) > + location_t loc, diagnostic_context *dc, bool command_line) > { >struct cl_option_handlers handlers; > > @@ -415,7 +421,7 @@ decode_options (struct gcc_options *opts, struct g >read_cmdline_options (opts, opts_set, > decoded_options, decoded_options_count, > loc, lang_mask, > - &handlers, dc); > + &handlers, dc, command_line); > >finish_options (opts, opts_set, loc); > } > Index: gcc/opts.h > === > --- gcc/opts.h (revision 225521) > +++ gcc/opts.h (working copy) > @@ -344,7 +344,8 @@ extern void decode_options (struct gcc_options *op > struct cl_decoded_option *decoded_options, > unsigned int decoded_options_count, >
Re: [PATCH]Partially fix PR61529, bound basic block frequency
Hi Renlin, Are the incoming edge counts or probabilities insane in this case? I guess the patch is ok if we need to do this to handle those incoming insanitiles. But I can't approve patches myself. However, this is a fix to code (r215739) committed after the ICE in the original bug report and in comment 2 were reported, so I wonder if it is just hiding the original problem. Originally this was reported to be due to r210538 - ccing Dehao who was the author of that patch. Dehao, did you get a chance to look at this bug and see why your change triggered it? It is possible that Dehao's patch simply amplified an even further upstream profile insanity, but it would be good to confirm. Thanks! Teresa On Wed, Oct 29, 2014 at 2:26 AM, Renlin Li wrote: > Hi all, > > This is a simple patch to fix ICE in comment 2 of PR61529: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61529 > > Bound checking code is added to make sure the frequency is within legal > range. > > As far as I have observed, r215830 patch fixes the glibc building ICE. And > this patch should fix the ICE while building the sample code in comment 2 > using aarch64-none-elf toolchain. Until now, all the ICEs reported in this > bug ticket should be fixed. > > x86_64-unknown-linux-gnu bootstrap and regression test have been done, no > new issue. > aarch64-none-elf toolchain has been test on the model. No new regression. > > Is this Okay for trunk? > > gcc/ChangeLog: > > 2014-10-29 Renlin Li > PR middle-end/61529 > * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH]Partially fix PR61529, bound basic block frequency
On Thu, Nov 6, 2014 at 7:09 AM, Renlin Li wrote: > > Hi Jeff, > > Test case has been added. With the patch, both x86_64-unknown-linux-gnu and > aarch64-none-elf compile the test case successfully. > > Okay to commit? > > > On 04/11/14 21:59, Jeff Law wrote: >> >> On 11/03/14 08:29, Renlin Li wrote: >>> >>> On 29/10/14 12:42, Teresa Johnson wrote: >>>> >>>> Hi Renlin, >>>> >>>> Are the incoming edge counts or probabilities insane in this case? I >>>> guess the patch is ok if we need to do this to handle those incoming >>>> insanitiles. But I can't approve patches myself. >>> >>> Not really, it's just a little bigger than the limit. >>> >>> For this particular test case, ABC is a threaded path. >>> B is the fallthrough basic block of A, D is a basic block split from B >>> (used to be a self loop). A, B and D have roughly the same frequency ( >>> 8281, 9100, 8281). >>> When calculating the path_in_freq, frequencies from AB and DB edges are >>> accumulated, and the final result is large than BB_FREQ_MAX. >>> >>> >>> A >>> 100% | >>> | 9% >>> -->B-->C >>> | | >>> |100%| 91% >>> | | >>> D The frequencies look insane given these probabilities. If most of the execution stays in the loop then B should have a much higher frequency than A. >>> >>> >>> >>> There are 2 suspicious points: >>> 1, The BD edge is not correctly guessed at the profile stage. However, >>> anyway it's heuristic, so I don't think, it's here the problem starts. >>> 2, The BD edge is not eliminated before jump threading. But the jump >>> threading pass will analysis the condition jump statement in B block (In >>> this particular case, the BD probability should be zero), and makes the >>> decision to thread it. >>> >>> Later in the dom pass, the BD edge is indeed removed. >> >> Can you add a testcase please? With a testcase, this patch is OK for >> the trunk. >> >> jeff >> > > x86_64-unknown-linux-gnu bootstrap and regression test have been done, no > new issue. > aarch64-none-elf toolchain has been test on the model. No new regression. > > gcc/ChangeLog: > > 2014-11-06 Renlin Li > PR middle-end/61529 > * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq. Please add a comment that this is needed due to insane incoming frequencies. > > gcc/testsuite/ChangeLog: > > 2014-11-06 Renlin Li > PR middle-end/61529 > * gcc.dg/pr61529.c: New. The 'b' variable is uninitialized. Also, 'd' and 'a' may end up uninitialized depending on the initial value of 'b'. Please initialize these. Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH]Partially fix PR61529, bound basic block frequency
Thanks for fixing the test case. Can you also add the comment I suggested to the source change? > Please add a comment that this is needed due to insane incoming frequencies. Thanks, Teresa On Thu, Nov 6, 2014 at 9:53 AM, Renlin Li wrote: > Hi Teresa, > > Thank you for the suggestion, updated! > >> Please add a comment that this is needed due to insane incoming >> frequencies. >> >> The 'b' variable is uninitialized. Also, 'd' and 'a' may end up >> uninitialized depending on the initial value of 'b'. Please initialize >> these. >> > > Test case has been added. With the patch, both x86_64-unknown-linux-gnu and > aarch64-none-elf compile the test case successfully. > > x86_64-unknown-linux-gnu bootstrap and regression test have been done, no > new issue. > aarch64-none-elf toolchain has been test on the model. No new regression. > > gcc/ChangeLog: > > 2014-11-06 Renlin Li > PR middle-end/61529 > * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq. > This is needed due to insane incoming frequencies. > > gcc/testsuite/ChangeLog: > > 2014-11-06 Renlin Li > PR middle-end/61529 > * gcc.dg/pr61529.c: New. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PING][PATCH]Partially fix PR61529, bound basic block frequency
Hi Renlin, Looks like Jeff already approved it: >Can you add a testcase please? With a testcase, this patch is OK for the >trunk. Teresa On Mon, Nov 10, 2014 at 8:59 AM, Renlin Li wrote: > On 06/11/14 18:07, Renlin Li wrote: >> >> On 06/11/14 17:59, Teresa Johnson wrote: >>> >>> Thanks for fixing the test case. Can you also add the comment I >>> suggested to the source change? >>> >>>> Please add a comment that this is needed due to insane incoming >>>> frequencies. >>>> >> >> Sorry, I mistakenly add it to the ChangeLog. Should be correct now. >> >> >> x86_64-unknown-linux-gnu bootstrap and regression test have been done, no >> new issue. >> aarch64-none-elf toolchain has been test on the model. No new regression. >> >> gcc/ChangeLog: >> >> 2014-11-06 Renlin Li >> PR middle-end/61529 >> * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq. >> >> gcc/testsuite/ChangeLog: >> >> 2014-11-06 Renlin Li >> PR middle-end/61529 >> * gcc.dg/pr61529.c: New. > > > Hi, > > Can anybody help to review or approve this? > > Kind regards, > Renlin Li > > -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
This patch fixes an issue where tree-strlen was incorrectly removing a store of 0 into a string because it thought a prior CLOBBER (which is an empty constructor with no elements) was zero-initializing the string. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Teresa 2014-11-12 PR tree-optimization/63841 * tree.c (initializer_zerop): A constructor with no elements does not zero initialize. Index: tree.c === --- tree.c (revision 217190) +++ tree.c (working copy) @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) { unsigned HOST_WIDE_INT idx; +if (!CONSTRUCTOR_NELTS (init)) + return false; FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) if (!initializer_zerop (elt)) return false; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize
Added testcase. Here is the new patch: 2014-11-12 gcc: PR tree-optimization/63841 * tree.c (initializer_zerop): A constructor with no elements does not zero initialize. gcc/testsuite: * g++.dg/tree-ssa/pr63841.C: New test. Index: tree.c === --- tree.c (revision 217190) +++ tree.c (working copy) @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) { unsigned HOST_WIDE_INT idx; +if (!CONSTRUCTOR_NELTS (init)) + return false; FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) if (!initializer_zerop (elt)) return false; Index: testsuite/g++.dg/tree-ssa/pr63841.C === --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0) +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy) @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include +#include + +std::string __attribute__ ((noinline)) comp_test_write() { + std::string data; + + for (int i = 0; i < 2; ++i) { +char b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +std::string __attribute__ ((noinline)) comp_test_write_good() { + std::string data; + + char b; + for (int i = 0; i < 2; ++i) { +b = 1 >> (i * 8); +data.append(&b, 1); + } + + return data; +} + +int main() { + std::string good = comp_test_write_good(); + printf("expected: %hx\n", *(short*)good.c_str()); + + std::string bad = comp_test_write(); + printf("got: %hx\n", *(short*)bad.c_str()); + + return good != bad; +} On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li wrote: > missing test case? > > David > > On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson wrote: >> This patch fixes an issue where tree-strlen was incorrectly removing a >> store of 0 into a string because it thought a prior CLOBBER (which is >> an empty constructor with no elements) was zero-initializing the >> string. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? >> >> Thanks, >> Teresa >> >> 2014-11-12 >> >> PR tree-optimization/63841 >> * tree.c (initializer_zerop): A constructor with no elements >> does not zero initialize. >> >> Index: tree.c >> === >> --- tree.c (revision 217190) >> +++ tree.c (working copy) >> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init) >>{ >> unsigned HOST_WIDE_INT idx; >> >> + if (!CONSTRUCTOR_NELTS (init)) >> + return false; >> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt) >> if (!initializer_zerop (elt)) >> return false; >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413