Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.

2014-01-05 Thread Teresa Johnson
On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt  wrote:
> On 01/05/2014 12:35 PM, Felix Yang wrote:
>>
>> Ping?
>> Cheers,
>
>
> I have a different patch which I'll submit next week after some more
> testing. The assert in cfgrtl is unnecessarily broad and really only needs
> to trigger if -freorder-blocks-and-partition; there's nothing wrong with
> entering cfglayout after normal bb-reorder.

Currently -freorder-blocks-and-partition is the default for x86. I
assume that hw-doloop is not enabled for any i386 targets, which is
why we haven't seen this?

And will this mean that -freorder-blocks-and-partition cannot be used
for the targets that use hw-doloop? If so, should
-freorder-blocks-and-partition be prevented with a warning for those
targets?

>
> I've also tested that Blackfin still benefits from the hw-doloop reordering
> code and generates more hardware loops if it's enabled. So we want to be
> able to run it at -O2.

I looked at hw-doloop briefly and since it seems to be doing some
manual bb reordering I guess it can't simply be moved before bbro. It
seems like a better long-term solution would be to make bbro
hw-doloop-aware as Felix suggested earlier.

Teresa

>
>
> Bernd
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch] libgcov.c re-factoring

2014-01-06 Thread Teresa Johnson
On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka  wrote:
>> 2014-01-03  Rong Xu  
>>
>> * gcc/gcov-io.c (gcov_var): Move from gcov-io.h.
>> (gcov_position): Ditto.
>> (gcov_is_error): Ditto.
>> (gcov_rewrite): Ditto.
>> * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov
>> only part to libgcc/libgcov.h.
>> * libgcc/libgcov-driver.c: Use libgcov.h.
>> (buffer_fn_data): Use xmalloc instead of malloc.
>> (gcov_exit_merge_gcda): Ditto.
>> * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto.
>> * libgcc/libgcov.h: New common header files for libgcov-*.h.
>> * libgcc/libgcov-interface.c: Use libgcov.h
>> * libgcc/libgcov-merge.c: Ditto.
>> * libgcc/libgcov-profiler.c: Ditto.
>> * libgcc/Makefile.in: Add dependence to libgcov.h
>
> OK, with the licence changes and...
>>
>> Index: gcc/gcov-io.c
>> ===
>> --- gcc/gcov-io.c   (revision 206100)
>> +++ gcc/gcov-io.c   (working copy)
>> @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns
>>  static void gcov_allocate (unsigned);
>>  #endif
>>
>> +/* Optimum number of gcov_unsigned_t's read from or written to disk.  */
>> +#define GCOV_BLOCK_SIZE (1 << 10)
>> +
>> +GCOV_LINKAGE struct gcov_var
>> +{
>> +  FILE *file;
>> +  gcov_position_t start;   /* Position of first byte of block */
>> +  unsigned offset; /* Read/write position within the block.  */
>> +  unsigned length; /* Read limit in the block.  */
>> +  unsigned overread;   /* Number of words overread.  */
>> +  int error;   /* < 0 overflow, > 0 disk error.  */
>> +  int mode;/* < 0 writing, > 0 reading */
>> +#if IN_LIBGCOV
>> +  /* Holds one block plus 4 bytes, thus all coverage reads & writes
>> + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE
>> + to and from the disk. libgcov never backtracks and only writes 4
>> + or 8 byte objects.  */
>> +  gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1];
>> +#else
>> +  int endian;  /* Swap endianness.  */
>> +  /* Holds a variable length block, as the compiler can write
>> + strings and needs to backtrack.  */
>> +  size_t alloc;
>> +  gcov_unsigned_t *buffer;
>> +#endif
>> +} gcov_var;
>> +
>> +/* Save the current position in the gcov file.  */
>> +static inline gcov_position_t
>> +gcov_position (void)
>> +{
>> +  gcc_assert (gcov_var.mode > 0);
>> +  return gcov_var.start + gcov_var.offset;
>> +}
>> +
>> +/* Return nonzero if the error flag is set.  */
>> +static inline int
>> +gcov_is_error (void)
>> +{
>> +  return gcov_var.file ? gcov_var.error : 1;
>> +}
>> +
>> +#if IN_LIBGCOV
>> +/* Move to beginning of file and initialize for writing.  */
>> +GCOV_LINKAGE inline void
>> +gcov_rewrite (void)
>> +{
>> +  gcc_assert (gcov_var.mode > 0);
>
> I would turn those two asserts into checking asserts so they do not
> bloat the runtime lib.

Ok, but note that there are a number of other gcc_assert already in
gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should
I go ahead and change all of them in gcov-io.c?

Thanks,
Teresa

>
> Thanks,
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch] libgcov.c re-factoring

2014-01-08 Thread Teresa Johnson
On Mon, Jan 6, 2014 at 9:49 AM, Teresa Johnson  wrote:
> On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka  wrote:
>>> 2014-01-03  Rong Xu  
>>>
>>> * gcc/gcov-io.c (gcov_var): Move from gcov-io.h.
>>> (gcov_position): Ditto.
>>> (gcov_is_error): Ditto.
>>> (gcov_rewrite): Ditto.
>>> * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov
>>> only part to libgcc/libgcov.h.
>>> * libgcc/libgcov-driver.c: Use libgcov.h.
>>> (buffer_fn_data): Use xmalloc instead of malloc.
>>> (gcov_exit_merge_gcda): Ditto.
>>> * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto.
>>> * libgcc/libgcov.h: New common header files for libgcov-*.h.
>>> * libgcc/libgcov-interface.c: Use libgcov.h
>>> * libgcc/libgcov-merge.c: Ditto.
>>> * libgcc/libgcov-profiler.c: Ditto.
>>> * libgcc/Makefile.in: Add dependence to libgcov.h
>>
>> OK, with the licence changes and...
>>>
>>> Index: gcc/gcov-io.c
>>> ===
>>> --- gcc/gcov-io.c   (revision 206100)
>>> +++ gcc/gcov-io.c   (working copy)
>>> @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns
>>>  static void gcov_allocate (unsigned);
>>>  #endif
>>>
>>> +/* Optimum number of gcov_unsigned_t's read from or written to disk.  */
>>> +#define GCOV_BLOCK_SIZE (1 << 10)
>>> +
>>> +GCOV_LINKAGE struct gcov_var
>>> +{
>>> +  FILE *file;
>>> +  gcov_position_t start;   /* Position of first byte of block */
>>> +  unsigned offset; /* Read/write position within the block.  */
>>> +  unsigned length; /* Read limit in the block.  */
>>> +  unsigned overread;   /* Number of words overread.  */
>>> +  int error;   /* < 0 overflow, > 0 disk error.  */
>>> +  int mode;/* < 0 writing, > 0 reading */
>>> +#if IN_LIBGCOV
>>> +  /* Holds one block plus 4 bytes, thus all coverage reads & writes
>>> + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE
>>> + to and from the disk. libgcov never backtracks and only writes 4
>>> + or 8 byte objects.  */
>>> +  gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1];
>>> +#else
>>> +  int endian;  /* Swap endianness.  */
>>> +  /* Holds a variable length block, as the compiler can write
>>> + strings and needs to backtrack.  */
>>> +  size_t alloc;
>>> +  gcov_unsigned_t *buffer;
>>> +#endif
>>> +} gcov_var;
>>> +
>>> +/* Save the current position in the gcov file.  */
>>> +static inline gcov_position_t
>>> +gcov_position (void)
>>> +{
>>> +  gcc_assert (gcov_var.mode > 0);
>>> +  return gcov_var.start + gcov_var.offset;
>>> +}
>>> +
>>> +/* Return nonzero if the error flag is set.  */
>>> +static inline int
>>> +gcov_is_error (void)
>>> +{
>>> +  return gcov_var.file ? gcov_var.error : 1;
>>> +}
>>> +
>>> +#if IN_LIBGCOV
>>> +/* Move to beginning of file and initialize for writing.  */
>>> +GCOV_LINKAGE inline void
>>> +gcov_rewrite (void)
>>> +{
>>> +  gcc_assert (gcov_var.mode > 0);
>>
>> I would turn those two asserts into checking asserts so they do not
>> bloat the runtime lib.
>
> Ok, but note that there are a number of other gcc_assert already in
> gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should
> I go ahead and change all of them in gcov-io.c?

Actually, I tried changing these two, but gcc_checking_assert is
undefined in libgcov.a. Ok to commit without this change?

Teresa

>
> Thanks,
> Teresa
>
>>
>> Thanks,
>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch] libgcov.c re-factoring

2014-01-08 Thread Teresa Johnson
On Wed, Jan 8, 2014 at 6:34 AM, Jan Hubicka  wrote:
>>
>> Actually, I tried changing these two, but gcc_checking_assert is
>> undefined in libgcov.a. Ok to commit without this change?
>
> OK.
> incrementally can you please define gcov_nonruntime_assert that will wind into
> gcc_assert for code within gcc/coverage tools and into nothing for libgcov
> runtime and we can change those offenders to that.

Ok, committed as r206435. Will send the assert patch in a follow-up
later this week.

Teresa

>
> Honza
>>
>> Teresa
>>
>> >
>> > Thanks,
>> > Teresa
>> >
>> >>
>> >> Thanks,
>> >> Honza
>> >
>> >
>> >
>> > --
>> > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.

2014-01-08 Thread Teresa Johnson
On Tue, Jan 7, 2014 at 8:07 AM, Bernd Schmidt  wrote:
> On 01/05/2014 05:10 PM, Teresa Johnson wrote:
>>
>> On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt 
>> wrote:
>>>
>>> I have a different patch which I'll submit next week after some more
>>> testing. The assert in cfgrtl is unnecessarily broad and really only
>>> needs
>>> to trigger if -freorder-blocks-and-partition; there's nothing wrong with
>>> entering cfglayout after normal bb-reorder.
>>
>>
>> Currently -freorder-blocks-and-partition is the default for x86. I
>> assume that hw-doloop is not enabled for any i386 targets, which is
>> why we haven't seen this?
>
>
> Precisely.
>
>
>> And will this mean that -freorder-blocks-and-partition cannot be used
>> for the targets that use hw-doloop? If so, should
>> -freorder-blocks-and-partition be prevented with a warning for those
>> targets?
>
>
> If someone explicitly chooses that option we can turn off the reordering in
> hw-doloop. That should happen sufficiently rarely that it isn't a problem.
> That's what the patch below does - bootstraped on x86_64-linux, tested there
> and with bfin-elf. Ok?

Ok, looks good to me.

>
>
>>> I've also tested that Blackfin still benefits from the hw-doloop
>>> reordering
>>> code and generates more hardware loops if it's enabled. So we want to be
>>> able to run it at -O2.
>>
>>
>> I looked at hw-doloop briefly and since it seems to be doing some
>> manual bb reordering I guess it can't simply be moved before bbro. It
>> seems like a better long-term solution would be to make bbro
>> hw-doloop-aware as Felix suggested earlier.
>
>
> Maybe. It could be argued that the code in hw-doloop is relevant only for a
> small class of targets so it should only be enabled for them. In any case,
> that's not stage 3 material and two ports are broken...

Ok, that makes sense. Thanks, Teresa

>
>
> Bernd
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Patch] Avoid gcc_assert in libgcov

2014-01-09 Thread Teresa Johnson
As suggested by Honza, avoid bloating libgcov from gcc_assert by using
a new macro gcov_nonruntime_assert in gcov-io.c that is only mapped to
gcc_assert when not in libgcov.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2014-01-09  Teresa Johnson  

* gcov-io.c (gcov_position): Use gcov_nonruntime_assert.
(gcov_is_error): Ditto.
(gcov_rewrite): Ditto.
(gcov_open): Ditto.
(gcov_write_words): Ditto.
(gcov_write_length): Ditto.
(gcov_read_words): Ditto.
(gcov_read_summary): Ditto.
(gcov_sync): Ditto.
(gcov_seek): Ditto.
(gcov_histo_index): Ditto.
(static void gcov_histogram_merge): Ditto.
(compute_working_sets): Ditto.
* gcov-io.h (gcov_nonruntime_assert): Define.

Index: gcov-io.c
===
--- gcov-io.c   (revision 206435)
+++ gcov-io.c   (working copy)
@@ -67,7 +67,7 @@ GCOV_LINKAGE struct gcov_var
 static inline gcov_position_t
 gcov_position (void)
 {
-  gcc_assert (gcov_var.mode > 0);
+  gcov_nonruntime_assert (gcov_var.mode > 0);
   return gcov_var.start + gcov_var.offset;
 }

@@ -83,7 +83,7 @@ gcov_is_error (void)
 GCOV_LINKAGE inline void
 gcov_rewrite (void)
 {
-  gcc_assert (gcov_var.mode > 0);
+  gcov_nonruntime_assert (gcov_var.mode > 0);
   gcov_var.mode = -1;
   gcov_var.start = 0;
   gcov_var.offset = 0;
@@ -133,7 +133,7 @@ gcov_open (const char *name, int mode)
   s_flock.l_pid = getpid ();
 #endif

-  gcc_assert (!gcov_var.file);
+  gcov_nonruntime_assert (!gcov_var.file);
   gcov_var.start = 0;
   gcov_var.offset = gcov_var.length = 0;
   gcov_var.overread = -1u;
@@ -291,14 +291,14 @@ gcov_write_words (unsigned words)
 {
   gcov_unsigned_t *result;

-  gcc_assert (gcov_var.mode < 0);
+  gcov_nonruntime_assert (gcov_var.mode < 0);
 #if IN_LIBGCOV
   if (gcov_var.offset >= GCOV_BLOCK_SIZE)
 {
   gcov_write_block (GCOV_BLOCK_SIZE);
   if (gcov_var.offset)
{
- gcc_assert (gcov_var.offset == 1);
+ gcov_nonruntime_assert (gcov_var.offset == 1);
  memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4);
}
 }
@@ -393,9 +393,9 @@ gcov_write_length (gcov_position_t position)
   gcov_unsigned_t length;
   gcov_unsigned_t *buffer;

-  gcc_assert (gcov_var.mode < 0);
-  gcc_assert (position + 2 <= gcov_var.start + gcov_var.offset);
-  gcc_assert (position >= gcov_var.start);
+  gcov_nonruntime_assert (gcov_var.mode < 0);
+  gcov_nonruntime_assert (position + 2 <= gcov_var.start + gcov_var.offset);
+  gcov_nonruntime_assert (position >= gcov_var.start);
   offset = position - gcov_var.start;
   length = gcov_var.offset - offset - 2;
   buffer = (gcov_unsigned_t *) &gcov_var.buffer[offset];
@@ -481,14 +481,14 @@ gcov_read_words (unsigned words)
   const gcov_unsigned_t *result;
   unsigned excess = gcov_var.length - gcov_var.offset;

-  gcc_assert (gcov_var.mode > 0);
+  gcov_nonruntime_assert (gcov_var.mode > 0);
   if (excess < words)
 {
   gcov_var.start += gcov_var.offset;
 #if IN_LIBGCOV
   if (excess)
{
- gcc_assert (excess == 1);
+ gcov_nonruntime_assert (excess == 1);
  memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4);
}
 #else
@@ -497,7 +497,7 @@ gcov_read_words (unsigned words)
   gcov_var.offset = 0;
   gcov_var.length = excess;
 #if IN_LIBGCOV
-  gcc_assert (!gcov_var.length || gcov_var.length == 1);
+  gcov_nonruntime_assert (!gcov_var.length || gcov_var.length == 1);
   excess = GCOV_BLOCK_SIZE;
 #else
   if (gcov_var.length + words > gcov_var.alloc)
@@ -614,7 +614,7 @@ gcov_read_summary (struct gcov_summary *summary)
   while (!cur_bitvector)
 {
   h_ix = bv_ix * 32;
-  gcc_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE);
+  gcov_nonruntime_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE);
   cur_bitvector = histo_bitvector[bv_ix++];
 }
   while (!(cur_bitvector & 0x1))
@@ -622,7 +622,7 @@ gcov_read_summary (struct gcov_summary *summary)
   h_ix++;
   cur_bitvector >>= 1;
 }
-  gcc_assert (h_ix < GCOV_HISTOGRAM_SIZE);
+  gcov_nonruntime_assert (h_ix < GCOV_HISTOGRAM_SIZE);

   csum->histogram[h_ix].num_counters = gcov_read_unsigned ();
   csum->histogram[h_ix].min_value = gcov_read_counter ();
@@ -642,7 +642,7 @@ gcov_read_summary (struct gcov_summary *summary)
 GCOV_LINKAGE void
 gcov_sync (gcov_position_t base, gcov_unsigned_t length)
 {
-  gcc_assert (gcov_var.mode > 0);
+  gcov_nonruntime_assert (gcov_var.mode > 0);
   base += length;
   if (base - gcov_var.start <= gcov_var.length)
 gcov_var.offset = base - gcov_var.start;
@@ -661,7 +661,7 @@ gcov_sync (gcov_position_t base, gcov_unsig

Re: [Patch] Avoid gcc_assert in libgcov

2014-01-14 Thread Teresa Johnson
On Thu, Jan 9, 2014 at 6:56 AM, Jan Hubicka  wrote:
>> As suggested by Honza, avoid bloating libgcov from gcc_assert by using
>> a new macro gcov_nonruntime_assert in gcov-io.c that is only mapped to
>> gcc_assert when not in libgcov.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2014-01-09  Teresa Johnson  
>>
>> * gcov-io.c (gcov_position): Use gcov_nonruntime_assert.
>> (gcov_is_error): Ditto.
>> (gcov_rewrite): Ditto.
>> (gcov_open): Ditto.
>> (gcov_write_words): Ditto.
>> (gcov_write_length): Ditto.
>> (gcov_read_words): Ditto.
>> (gcov_read_summary): Ditto.
>> (gcov_sync): Ditto.
>> (gcov_seek): Ditto.
>> (gcov_histo_index): Ditto.
>> (static void gcov_histogram_merge): Ditto.
>> (compute_working_sets): Ditto.
>> * gcov-io.h (gcov_nonruntime_assert): Define.
>>
>
>> @@ -481,14 +481,14 @@ gcov_read_words (unsigned words)
>>const gcov_unsigned_t *result;
>>unsigned excess = gcov_var.length - gcov_var.offset;
>>
>> -  gcc_assert (gcov_var.mode > 0);
>> +  gcov_nonruntime_assert (gcov_var.mode > 0);
>>if (excess < words)
>>  {
>>gcov_var.start += gcov_var.offset;
>>  #if IN_LIBGCOV
>>if (excess)
>> {
>> - gcc_assert (excess == 1);
>> + gcov_nonruntime_assert (excess == 1);
>
> It probably makes no sense to put nonruntime access into IN_LIBGCOV defines.

You are right - there were several that were in IN_LIBGCOV defines
that I can just remove.

>
>>   memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4);
>> }
>>  #else
>> @@ -497,7 +497,7 @@ gcov_read_words (unsigned words)
>>gcov_var.offset = 0;
>>gcov_var.length = excess;
>>  #if IN_LIBGCOV
>> -  gcc_assert (!gcov_var.length || gcov_var.length == 1);
>> +  gcov_nonruntime_assert (!gcov_var.length || gcov_var.length == 1);
>>excess = GCOV_BLOCK_SIZE;
>>  #else
>>if (gcov_var.length + words > gcov_var.alloc)
>> @@ -614,7 +614,7 @@ gcov_read_summary (struct gcov_summary *summary)
>>while (!cur_bitvector)
>>  {
>>h_ix = bv_ix * 32;
>> -  gcc_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE);
>> +  gcov_nonruntime_assert (bv_ix < 
>> GCOV_HISTOGRAM_BITVECTOR_SIZE);
>>cur_bitvector = histo_bitvector[bv_ix++];
>>  }
>>while (!(cur_bitvector & 0x1))
>> @@ -622,7 +622,7 @@ gcov_read_summary (struct gcov_summary *summary)
>>h_ix++;
>>cur_bitvector >>= 1;
>>  }
>> -  gcc_assert (h_ix < GCOV_HISTOGRAM_SIZE);
>> +  gcov_nonruntime_assert (h_ix < GCOV_HISTOGRAM_SIZE);
>
> How many of those asserts can be triggered by a corrupted gcda file?
> I would like to make libgcov more safe WRT file corruptions, too, so in that
> case we should produce an error message.

In that case should we call gcov_error when IN_LIBGCOV? One
possibility would be to simply make gcov_nonruntime_assert be defined
as if (!EXPR) gcov_error in the IN_LIBGCOV case. But I think what you
wanted here was to reduce libgcov bloat by removing calls altogether,
which this wouldn't solve. But if we want to call gcov_error in some
cases, I think I need to add another macro that will either do
gcc_assert when !IN_LIBGCOV and "if (!EXPR) gcov_error" when
IN_LIBGCOV. Is that what you had in mind?

Thanks,
Teresa

>
> The rest of changes seems OK.
>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-24 Thread Teresa Johnson
This patch unsets -freorder-blocks-and-partition when -fprofile-use
is not specified. Function splitting was not actually being performed
in that case, as probably_never_executed_bb_p does not distinguish
any basic blocks as being cold vs hot when there is no profile data.

Leaving it enabled, however, causes the assembly code generator to create
(empty) cold sections and labels, leading to unnecessary size overhead.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2015-09-24  Teresa Johnson  

* opts.c (finish_options): Unset -freorder-blocks-and-partition
if not using profile.

Index: opts.c
===
--- opts.c  (revision 228062)
+++ opts.c  (working copy)
@@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g
   opts->x_flag_reorder_blocks = 1;
 }

+  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
+ effect. Function splitting was not actually being performed in that case,
+ as probably_never_executed_bb_p does not distinguish any basic blocks as
+ being cold vs hot when there is no profile data. Leaving it enabled,
+ however, causes the assembly code generator to create (empty) cold
+ sections and labels, leading to unnecessary size overhead.  */
   if (opts->x_flag_reorder_blocks_and_partition
+  && !opts_set->x_flag_profile_use)
+opts->x_flag_reorder_blocks_and_partition = 0;
+
+  if (opts->x_flag_reorder_blocks_and_partition
   && !opts_set->x_flag_reorder_functions)
 opts->x_flag_reorder_functions = 1;


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson
On Fri, Sep 25, 2015 at 6:58 AM, Bernd Schmidt  wrote:
> On 09/24/2015 07:16 PM, Teresa Johnson wrote:
>>
>> This patch unsets -freorder-blocks-and-partition when -fprofile-use
>> is not specified. Function splitting was not actually being performed
>> in that case, as probably_never_executed_bb_p does not distinguish
>> any basic blocks as being cold vs hot when there is no profile data.
>>
>> Leaving it enabled, however, causes the assembly code generator to create
>> (empty) cold sections and labels, leading to unnecessary size overhead.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2015-09-24  Teresa Johnson  
>>
>>  * opts.c (finish_options): Unset -freorder-blocks-and-partition
>>  if not using profile.
>
>
> Hmm, I'd noticed I was getting that enabled by default. It looks like you
> added this default with:
>
> 2013-11-19  Teresa Johnson  
>
>* common/config/i386/i386-common.c: Enable
>-freorder-blocks-and-partition at -O2 and up for x86.
>* doc/invoke.texi: Update -freorder-blocks-and-partition default.
>* opts.c (finish_options): Only warn if -freorder-blocks-and-
>partition was set on command line.
>
> (Note that this ChangeLog entry should have mentioned
> ix86_option_optimization_table as the variable that was changed).

Yeah, looks like I accidentally left the function name out of the
i386-common.c entry. I can fix the ChangeLog entry.

>
> What has changed between then and now? Also, do we not use
> estimates/heuristics when not using a profile?

Nothing has changed - splitting effectively never kicked in without a
profile. Honza and I had discussed the idea that static profile
heuristics could eventually be used to distinguish hot vs cold bbs,
but that hasn't happened. What I didn't notice until recently was the
size increase in the .o files from varasm adding in unnecessary
sections and labels when this option was on. Unless and until static
heuristics are used to distinguish cold bbs in
probably_never_executed_bb_p, I don't think it makes sense to do
anything finer grained that just disabling the option.

Teresa

>
>
> Bernd



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson
On Fri, Sep 25, 2015 at 7:34 AM, Jan Hubicka  wrote:
>> > What has changed between then and now? Also, do we not use
>> > estimates/heuristics when not using a profile?
>>
>> Nothing has changed - splitting effectively never kicked in without a
>> profile. Honza and I had discussed the idea that static profile
>> heuristics could eventually be used to distinguish hot vs cold bbs,
>
> Yep, the basic idea was to move EH clenaups to the cold section for start.  
> The
> cleanup code can get surprisingly large for some C++ apps.
>
>> but that hasn't happened. What I didn't notice until recently was the
>> size increase in the .o files from varasm adding in unnecessary
>> sections and labels when this option was on. Unless and until static
>
> Perhaps we also may just teach varasm to not output those when function is not
> split.  I am happy with the patch as it is because it is pointless to run the
> code when no splitting happens.

Right, that will need to happen if splitting is tuned for static
frequencies. For now I committed this patch.

I also fixed the old change log entry that Bernd pointed out.

Thanks,
Teresa

>
> Honza
>> heuristics are used to distinguish cold bbs in
>> probably_never_executed_bb_p, I don't think it makes sense to do
>> anything finer grained that just disabling the option.
>>
>> Teresa
>>
>> >
>> >
>> > Bernd
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson
Woops, we crossed wires. I just committed this patch. Would you like
me to revert it?

Teresa

On Fri, Sep 25, 2015 at 9:57 AM, Andi Kleen  wrote:
> Teresa Johnson  writes:
>
>> This patch unsets -freorder-blocks-and-partition when -fprofile-use
>> is not specified. Function splitting was not actually being performed
>> in that case, as probably_never_executed_bb_p does not distinguish
>> any basic blocks as being cold vs hot when there is no profile data.
>
> Actually I'm experimenting with a patch to fix that by allowing
> function splitting even without profile feed back. See PR66890
> which has the patch. I would prefer to keep and fix it.
>
> -Andi
>
>
> --
> a...@linux.intel.com -- Speaking for myself only



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson
This patch backports r228136 from trunk to Google/4_9. The only
difference is that I have added the option disabling to our
Google-specific routine for removing profile options when the
specified profile gcda file is not found.

Passes regression tests. Ok for google/4_9?

Thanks,
Teresa

2015-09-25  Teresa Johnson  

Google ref b/24265250
* opts.c (finish_options): Unset -freorder-blocks-and-partition
if not using profile.
(set_profile_use_options): Unset -freorder-blocks-and-partition
if input profile not found and it wasn't explicitly set.

Index: opts.c
===
--- opts.c  (revision 228063)
+++ opts.c  (working copy)
@@ -787,6 +787,16 @@ finish_options (struct gcc_options *opts, struct g
 }

   if (opts->x_flag_reorder_blocks_and_partition
+  && !opts_set->x_flag_profile_use)
+opts->x_flag_reorder_blocks_and_partition = 0;
+
+  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
+ effect. Function splitting was not actually being performed in that case,
+ as probably_never_executed_bb_p does not distinguish any basic blocks as
+ being cold vs hot when there is no profile data. Leaving it enabled,
+ however, causes the assembly code generator to create (empty) cold
+ sections and labels, leading to unnecessary size overhead.  */
+  if (opts->x_flag_reorder_blocks_and_partition
   && !opts_set->x_flag_reorder_functions)
 opts->x_flag_reorder_functions = 1;

@@ -1365,6 +1375,8 @@ set_profile_use_options (struct gcc_options *opts,
(PARAM_MAX_INLINE_INSNS_AUTO,
  default_param_value (PARAM_MAX_INLINE_INSNS_AUTO),
 opts->x_param_values, opts_set->x_param_values);
+  if (!opts_set->x_flag_reorder_blocks_and_partition)
+opts->x_flag_reorder_blocks_and_partition = false;
}

   if (!opts_set->x_flag_branch_probabilities || reset)



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Reduce max-vartrack-size

2015-10-13 Thread Teresa Johnson
Reduce the maximum variable tracking size by 20% to avoid extreme
compilation times.

Ok for google-4_9?

2015-10-13  Teresa Johnson  

Google ref b/24569916
* params.def (PARAM_MAX_VARTRACK_SIZE): Reduce default to 40M.

Index: params.def
===
--- params.def (revision 228063)
+++ params.def (working copy)
@@ -1160,7 +1160,7 @@ DEFPARAM (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO,
 DEFPARAM (PARAM_MAX_VARTRACK_SIZE,
   "max-vartrack-size",
   "Max. size of var tracking hash tables",
-  5000, 0, 0)
+  4000, 0, 0)

 /* Set maximum recursion depth for var tracking expression expansion
and resolution.  */


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH]Add -fprofile-use option for check_effective_target_freorder.

2015-10-26 Thread Teresa Johnson
Looks good to me, but I don't have approval rights.
Thanks for fixing,
Teresa

On Mon, Oct 26, 2015 at 2:00 AM, Renlin Li  wrote:
> Hi all,
>
> After r228136, flag_reorder_blocks_and_partition is canceled when
> -fprofile-use is not specified.
>
> In this case check_effective_target_freorder() is not able to check the
> proper target support.
> This is a simple patch to add "-fprofile-use" option that effective target
> check.
>
> Okay to commit on the trunk?
>
> Regards,
> Renlin Li
>
> gcc/testsuite/ChangeLog:
>
> 2015-10-26  Renlin Li  
>
> * lib/target-supports.exp (check_effective_target_freorder): Add
> -fprofile-use flag.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Remove overly-aggressive LIPO assert

2015-12-02 Thread Teresa Johnson
Remove an assert that was overly-strict and already partially redundant
with an immediately prior assert. In this case we had a hidden visibility
function clone that was created after the LIPO link due to indirect call
promotion. It is a cgraph_is_aux_decl_external node.

Fixes failures and passes regression tests. Ok for Google branch?

2015-12-02  Teresa Johnson  

Google ref b/25925223.
* l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict
assert.

Index: l-ipo.c
===
--- l-ipo.c (revision 231131)
+++ l-ipo.c (working copy)
@@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d
   gcc_assert (DECL_EXTERNAL (decl)
   || cgraph_is_aux_decl_external (n)
   || DECL_VIRTUAL_P (decl));
-  gcc_assert (/* This is the case for explicit extern
instantiation,
- when cgraph node is not created before link.  */
-  DECL_EXTERNAL (decl));
   cgraph_link_node (n);
   return n;
 }


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Remove overly-aggressive LIPO assert

2015-12-04 Thread Teresa Johnson
Ping.
Thanks, Teresa

On Wed, Dec 2, 2015 at 12:46 PM, Teresa Johnson  wrote:
> Remove an assert that was overly-strict and already partially redundant
> with an immediately prior assert. In this case we had a hidden visibility
> function clone that was created after the LIPO link due to indirect call
> promotion. It is a cgraph_is_aux_decl_external node.
>
> Fixes failures and passes regression tests. Ok for Google branch?
>
> 2015-12-02  Teresa Johnson  
>
> Google ref b/25925223.
> * l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict
> assert.
>
> Index: l-ipo.c
> ===
> --- l-ipo.c (revision 231131)
> +++ l-ipo.c (working copy)
> @@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d
>gcc_assert (DECL_EXTERNAL (decl)
>|| cgraph_is_aux_decl_external (n)
>|| DECL_VIRTUAL_P (decl));
> -  gcc_assert (/* This is the case for explicit extern
> instantiation,
> - when cgraph node is not created before link.  */
> -  DECL_EXTERNAL (decl));
>    cgraph_link_node (n);
>return n;
>  }
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute

2014-08-11 Thread Teresa Johnson
On Fri, Aug 8, 2014 at 3:22 PM, Yi Yang  wrote:
> Friendly ping.

Sorry, was OOO.

The solution of preventing splitting for named sections is good - but
it looks like there is already code that should prevent this. See the
user_defined_section_attribute check here - why is that not set? Looks
like it should be set in handle_section_attribute() in
c-family/c-common.c.

Teresa

>
>
> On Wed, Aug 6, 2014 at 5:20 PM, Dehao Chen  wrote:
>>
>> OK for google-4_8 and google-4_9. David and Teresa may have further
>> comments.
>>
>> Dehao
>>
>> On Wed, Aug 6, 2014 at 3:36 PM, Yi Yang  wrote:
>> > This currently puts split sections together again in the specified
>> > section and breaks DWARF output. This patch disables the partitioning
>> > for such functions.
>> >
>> > --
>> >
>> > 2014-08-06  Yi Yang  
>> >
>> > gcc:
>> > * bb-reorder.c (gate_handle_partition_blocks): Add a check for
>> > "section"
>> > attribute.
>> >
>> > diff --git gcc/bb-reorder.c gcc/bb-reorder.c
>> > index fa6f62f..09449c6 100644
>> > --- gcc/bb-reorder.c
>> > +++ gcc/bb-reorder.c
>> > @@ -2555,6 +2555,7 @@ gate_handle_partition_blocks (void)
>> >   we are going to omit the reordering.  */
>> >&& optimize_function_for_speed_p (cfun)
>> >&& !DECL_ONE_ONLY (current_function_decl)
>> > +  && !DECL_SECTION_NAME (current_function_decl)
>> >&& !user_defined_section_attribute);
>> >  }
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute

2014-08-11 Thread Teresa Johnson
Ok, thanks. This seems reasonable. Can you send the patch to trunk as well?
Teresa

On Mon, Aug 11, 2014 at 12:35 PM, Yi Yang  wrote:
> Patch v2
>
> ..
>
> diff --git gcc/bb-reorder.c gcc/bb-reorder.c
> index fa6f62f..a1b3e65 100644
> --- gcc/bb-reorder.c
> +++ gcc/bb-reorder.c
> @@ -95,7 +95,6 @@
>  #include "expr.h"
>  #include "params.h"
>  #include "diagnostic-core.h"
> -#include "toplev.h" /* user_defined_section_attribute */
>  #include "tree-pass.h"
>  #include "df.h"
>  #include "bb-reorder.h"
> @@ -2555,7 +2554,7 @@ gate_handle_partition_blocks (void)
>   we are going to omit the reordering.  */
>&& optimize_function_for_speed_p (cfun)
>&& !DECL_ONE_ONLY (current_function_decl)
> -  && !user_defined_section_attribute);
> +  && !DECL_SECTION_NAME (current_function_decl));
>  }
>
>  /* This function is the main 'entrance' for the optimization that
> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
> index 65c25bf..9923928 100644
> --- gcc/c-family/c-common.c
> +++ gcc/c-family/c-common.c
> @@ -7378,8 +7378,6 @@ handle_section_attribute (tree *node, tree
> ARG_UNUSED (name), tree args,
>
>if (targetm_common.have_named_sections)
>  {
> -  user_defined_section_attribute = true;
> -
>if ((TREE_CODE (decl) == FUNCTION_DECL
> || TREE_CODE (decl) == VAR_DECL)
>&& TREE_CODE (TREE_VALUE (args)) == STRING_CST)
> diff --git gcc/final.c gcc/final.c
> index 9af0b2b..38c90b2 100644
> --- gcc/final.c
> +++ gcc/final.c
> @@ -4501,8 +4501,6 @@ rest_of_handle_final (void)
>
>assemble_end_function (current_function_decl, fnname);
>
> -  user_defined_section_attribute = false;
> -
>/* Free up reg info memory.  */
>free_reg_info ();
>
> diff --git gcc/toplev.c gcc/toplev.c
> index 9b8d313..4c8c965 100644
> --- gcc/toplev.c
> +++ gcc/toplev.c
> @@ -153,11 +153,6 @@ HOST_WIDE_INT random_seed;
> the support provided depends on the backend.  */
>  rtx stack_limit_rtx;
>
> -/* True if the user has tagged the function with the 'section'
> -   attribute.  */
> -
> -bool user_defined_section_attribute = false;
> -
>  struct target_flag_state default_target_flag_state;
>  #if SWITCHABLE_TARGET
>  struct target_flag_state *this_target_flag_state = 
> &default_target_flag_state;
> diff --git gcc/toplev.h gcc/toplev.h
> index 0290be3..65e38e7 100644
> --- gcc/toplev.h
> +++ gcc/toplev.h
> @@ -53,11 +53,6 @@ extern void target_reinit (void);
>  /* A unique local time stamp, might be zero if none is available.  */
>  extern unsigned local_tick;
>
> -/* True if the user has tagged the function with the 'section'
> -   attribute.  */
> -
> -extern bool user_defined_section_attribute;
> -
>  /* See toplev.c.  */
>  extern int flag_rerun_cse_after_global_opts;
>
> --
>
> On Mon, Aug 11, 2014 at 12:22 PM, Yi Yang  wrote:
>> Thanks for pointing out this!
>>
>> It seems to me that this user_defined_section_attribute variable is
>> useless now and should be removed. It is intended to work in this way:
>>
>> for each function {
>>parse into tree (setting user_defined_section_attribute)
>>do tree passes
>>do RTL passes (using user_defined_section_attribute)
>>resetting (user_defined_section_attribute = false)
>> }
>>
>> But now GCC works this way:
>>
>> for each function {
>>parse into tree (setting user_defined_section_attribute)
>> }
>> do IPA passes
>> for each function {
>>do tree passes
>>do RTL passes (using user_defined_section_attribute)
>>resetting (user_defined_section_attribute = false)
>> }
>>
>> Therefore the first function will use the actual
>> user_defined_section_attribute of the last function, and all other
>> functions will always see user_defined_section_attribute=0.
>>
>> I suggest removing user_defined_section_attribute and simply check
>> !DECL_SECTION_NAME (current_function_decl).
>>
>> On Mon, Aug 11, 2014 at 8:00 AM, Teresa Johnson  wrote:
>>> On Fri, Aug 8, 2014 at 3:22 PM, Yi Yang  wrote:
>>>> Friendly ping.
>>>
>>> Sorry, was OOO.
>>>
>>> The solution of preventing splitting for named sections is good - but
>>> it looks like there is already code that should prevent this. See the
>>> user_defined_section_attribute check here - why is that not set? Looks
>>> like it should be set in handle_section_attribute() in
>>> c-family/c-common.

Re: [GOOGLE] Do not partition the cold/hot sections of a function with "section" attribute

2014-08-12 Thread Teresa Johnson
On Mon, Aug 11, 2014 at 1:44 PM, Yi Yang  wrote:
> I ported this to trunk.
>
> Shall I commit this patch to the Google 4_8/4_9 branches first?

Ok with me.
Teresa

>
> On Mon, Aug 11, 2014 at 12:46 PM, Teresa Johnson  wrote:
>> Ok, thanks. This seems reasonable. Can you send the patch to trunk as well?
>> Teresa
>>
>> On Mon, Aug 11, 2014 at 12:35 PM, Yi Yang  wrote:
>>> Patch v2
>>>
>>> ..
>>>
>>> diff --git gcc/bb-reorder.c gcc/bb-reorder.c
>>> index fa6f62f..a1b3e65 100644
>>> --- gcc/bb-reorder.c
>>> +++ gcc/bb-reorder.c
>>> @@ -95,7 +95,6 @@
>>>  #include "expr.h"
>>>  #include "params.h"
>>>  #include "diagnostic-core.h"
>>> -#include "toplev.h" /* user_defined_section_attribute */
>>>  #include "tree-pass.h"
>>>  #include "df.h"
>>>  #include "bb-reorder.h"
>>> @@ -2555,7 +2554,7 @@ gate_handle_partition_blocks (void)
>>>   we are going to omit the reordering.  */
>>>&& optimize_function_for_speed_p (cfun)
>>>&& !DECL_ONE_ONLY (current_function_decl)
>>> -  && !user_defined_section_attribute);
>>> +  && !DECL_SECTION_NAME (current_function_decl));
>>>  }
>>>
>>>  /* This function is the main 'entrance' for the optimization that
>>> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
>>> index 65c25bf..9923928 100644
>>> --- gcc/c-family/c-common.c
>>> +++ gcc/c-family/c-common.c
>>> @@ -7378,8 +7378,6 @@ handle_section_attribute (tree *node, tree
>>> ARG_UNUSED (name), tree args,
>>>
>>>if (targetm_common.have_named_sections)
>>>  {
>>> -  user_defined_section_attribute = true;
>>> -
>>>if ((TREE_CODE (decl) == FUNCTION_DECL
>>> || TREE_CODE (decl) == VAR_DECL)
>>>&& TREE_CODE (TREE_VALUE (args)) == STRING_CST)
>>> diff --git gcc/final.c gcc/final.c
>>> index 9af0b2b..38c90b2 100644
>>> --- gcc/final.c
>>> +++ gcc/final.c
>>> @@ -4501,8 +4501,6 @@ rest_of_handle_final (void)
>>>
>>>assemble_end_function (current_function_decl, fnname);
>>>
>>> -  user_defined_section_attribute = false;
>>> -
>>>/* Free up reg info memory.  */
>>>free_reg_info ();
>>>
>>> diff --git gcc/toplev.c gcc/toplev.c
>>> index 9b8d313..4c8c965 100644
>>> --- gcc/toplev.c
>>> +++ gcc/toplev.c
>>> @@ -153,11 +153,6 @@ HOST_WIDE_INT random_seed;
>>> the support provided depends on the backend.  */
>>>  rtx stack_limit_rtx;
>>>
>>> -/* True if the user has tagged the function with the 'section'
>>> -   attribute.  */
>>> -
>>> -bool user_defined_section_attribute = false;
>>> -
>>>  struct target_flag_state default_target_flag_state;
>>>  #if SWITCHABLE_TARGET
>>>  struct target_flag_state *this_target_flag_state = 
>>> &default_target_flag_state;
>>> diff --git gcc/toplev.h gcc/toplev.h
>>> index 0290be3..65e38e7 100644
>>> --- gcc/toplev.h
>>> +++ gcc/toplev.h
>>> @@ -53,11 +53,6 @@ extern void target_reinit (void);
>>>  /* A unique local time stamp, might be zero if none is available.  */
>>>  extern unsigned local_tick;
>>>
>>> -/* True if the user has tagged the function with the 'section'
>>> -   attribute.  */
>>> -
>>> -extern bool user_defined_section_attribute;
>>> -
>>>  /* See toplev.c.  */
>>>  extern int flag_rerun_cse_after_global_opts;
>>>
>>> --
>>>
>>> On Mon, Aug 11, 2014 at 12:22 PM, Yi Yang  wrote:
>>>> Thanks for pointing out this!
>>>>
>>>> It seems to me that this user_defined_section_attribute variable is
>>>> useless now and should be removed. It is intended to work in this way:
>>>>
>>>> for each function {
>>>>parse into tree (setting user_defined_section_attribute)
>>>>do tree passes
>>>>do RTL passes (using user_defined_section_attribute)
>>>>resetting (user_defined_section_attribute = false)
>>>> }
>>>>
>>>> But now GCC works this way:
>>>>
>>>> for each function {
>>>>parse into tree (setting user_defined_section_attribute)
>>>> }
>>>>

Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Teresa Johnson
On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang  wrote:
> This bug is caused by my last patch, which did not differentiate
> between explicit section names (via attributes) and implicit section
> names (via -ffunction-section).
>
> This patch fixes that.
>
> --
>
> diff --git gcc/bb-reorder.c gcc/bb-reorder.c
> index 8f8c420..2115b01 100644
> --- gcc/bb-reorder.c
> +++ gcc/bb-reorder.c
> @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
>  we are going to omit the reordering.  */
>   && optimize_function_for_speed_p (cfun)
>   && !DECL_ONE_ONLY (current_function_decl)
> - && !DECL_SECTION_NAME (current_function_decl));
> + && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
>  }
>
>  /* This function is the main 'entrance' for the optimization that
> diff --git gcc/tree.h gcc/tree.h
> index 817507f..738675a 100644
> --- gcc/tree.h
> +++ gcc/tree.h
> @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
>  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
>(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p)
>
> +/* Speficy whether the section name was explicitly set with decl_attributes. 
> */
> +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
> +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
> +   !!DECL_SECTION_NAME(NODE))

Personally, I think it is clearer to simply write this as:

#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
(DECL_SECTION_NAME(NODE) != NULL_TREE \
 && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))

Teresa

> +
>  struct GTY(()) tree_decl_with_vis {
>   struct tree_decl_with_rtl common;
>   tree assembler_name;
> --



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Google/4_9] A couple gcov-tool fixes

2014-08-14 Thread Teresa Johnson
Fix a couple problems found during testing.

Backport from trunk (r212694) failed to fixup gcov_read_counter
invocations in google-specific code. Also, forward port
r211800 from google/4_8 to tolerate differences after COMDAT
fixup.

Passes manual testing, ok if passes regression tests?

Thanks,
Teresa

2014-08-14  Teresa Johnson  

* libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter,
Relax the check after COMDAT fixup.
(__gcov_merge_icall_topn): Use gcov_get_counter.

Index: libgcov-merge.c
===
--- libgcov-merge.c (revision 213975)
+++ libgcov-merge.c (working copy)
@@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
   gcc_assert (!(n_counters % 2));
   for (i = 0; i < n_counters; i += 2)
 {
-  gcov_type global_id = gcov_read_counter ();
-  gcov_type call_count = gcov_read_counter ();
+  gcov_type global_id = gcov_get_counter_target ();
+  gcov_type call_count = gcov_get_counter ();

   /* Note that global id counter may never have been set if no calls were
 made from this call-site.  */
@@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
   else if (__gcov_is_gid_insane (global_id))
 global_id = counters[i];

-  gcc_assert (counters[i] == global_id);
+  /* In the case of inconsistency, use the src's target.  */
+  if (counters[i] != global_id)
+fprintf (stderr, "Warning: Inconsistent call targets in"
+ " direct-call profile.\n");
 }
   else if (global_id)
counters[i] = global_id;
@@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi
 }

   /* Skip the number_of_eviction entry.  */
-  gcov_read_counter ();
+  gcov_get_counter ();
   for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2)
 {
   int found = 0;
-  gcov_type global_id = gcov_read_counter ();
-  gcov_type call_count = gcov_read_counter ();
+  gcov_type global_id = gcov_get_counter_target ();
+  gcov_type call_count = gcov_get_counter ();
   for (m = 0; m < j; m += 2)
 {
   if (tmp_array[m] == global_id)


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Google/4_9] A couple gcov-tool fixes

2014-08-14 Thread Teresa Johnson
On Thu, Aug 14, 2014 at 11:36 AM, Xinliang David Li  wrote:
> Ok.
>
> The interfaces of counter reading/getting now becomes confusing. Should it
> be better documented somewhere so that developer knows what is the right one
> to use in a certain context?

I think it is documented in libgcov.h and accesses should be through
the new gcov_get_counter interface,
but these just got missed when it was ported from trunk.

Teresa

>
> David
>
>
>
> On Thu, Aug 14, 2014 at 11:27 AM, Teresa Johnson 
> wrote:
>>
>> Fix a couple problems found during testing.
>>
>> Backport from trunk (r212694) failed to fixup gcov_read_counter
>> invocations in google-specific code. Also, forward port
>> r211800 from google/4_8 to tolerate differences after COMDAT
>> fixup.
>>
>> Passes manual testing, ok if passes regression tests?
>>
>> Thanks,
>> Teresa
>>
>> 2014-08-14  Teresa Johnson  
>>
>> * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter,
>> Relax the check after COMDAT fixup.
>> (__gcov_merge_icall_topn): Use gcov_get_counter.
>>
>> Index: libgcov-merge.c
>> ===
>> --- libgcov-merge.c (revision 213975)
>> +++ libgcov-merge.c (working copy)
>> @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
>>gcc_assert (!(n_counters % 2));
>>for (i = 0; i < n_counters; i += 2)
>>  {
>> -  gcov_type global_id = gcov_read_counter ();
>> -  gcov_type call_count = gcov_read_counter ();
>> +  gcov_type global_id = gcov_get_counter_target ();
>> +  gcov_type call_count = gcov_get_counter ();
>>
>>/* Note that global id counter may never have been set if no calls
>> were
>>  made from this call-site.  */
>> @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
>>else if (__gcov_is_gid_insane (global_id))
>>  global_id = counters[i];
>>
>> -  gcc_assert (counters[i] == global_id);
>> +  /* In the case of inconsistency, use the src's target.  */
>> +  if (counters[i] != global_id)
>> +fprintf (stderr, "Warning: Inconsistent call targets in"
>> + " direct-call profile.\n");
>>  }
>>else if (global_id)
>> counters[i] = global_id;
>> @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi
>>  }
>>
>>/* Skip the number_of_eviction entry.  */
>> -  gcov_read_counter ();
>> +  gcov_get_counter ();
>>for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2)
>>  {
>>int found = 0;
>> -  gcov_type global_id = gcov_read_counter ();
>> -  gcov_type call_count = gcov_read_counter ();
>> +  gcov_type global_id = gcov_get_counter_target ();
>> +  gcov_type call_count = gcov_get_counter ();
>>for (m = 0; m < j; m += 2)
>>  {
>>if (tmp_array[m] == global_id)
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Teresa Johnson
On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang  wrote:
> Patch v2.
>
> Trunk no longer set SECTION_NAME for implicit section names, so this
> probably does not apply to trunk. It's probably not necessary for
> trunk either.
>
> Tested for Google 4.8(albeit unnecessary) and 4.9 branch.
>
> diff --git gcc/bb-reorder.c gcc/bb-reorder.c
> index a1b3e65..b9a829e 100644
> --- gcc/bb-reorder.c
> +++ gcc/bb-reorder.c
> @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void)
>   we are going to omit the reordering.  */
>&& optimize_function_for_speed_p (cfun)
>&& !DECL_ONE_ONLY (current_function_decl)
> -  && !DECL_SECTION_NAME (current_function_decl));
> +  && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
>  }
>
>  /* This function is the main 'entrance' for the optimization that
> diff --git gcc/tree.h gcc/tree.h
> index b656b7b..308eef8 100644
> --- gcc/tree.h
> +++ gcc/tree.h
> @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree);
>  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
>(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p)
>
> +/* Speficy whether the section name was explicitly set with decl_attributes. 
> */

Typo "Specify".

Otherwise looks ok to me.

Thanks,
Teresa

> +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
> +  (DECL_SECTION_NAME(NODE) != NULL_TREE \
> +   && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))
> +
>  extern tree decl_debug_expr_lookup (tree);
>  extern void decl_debug_expr_insert (tree, tree);
>
> --
>
> On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson  wrote:
>> On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang  wrote:
>>> This bug is caused by my last patch, which did not differentiate
>>> between explicit section names (via attributes) and implicit section
>>> names (via -ffunction-section).
>>>
>>> This patch fixes that.
>>>
>>> --
>>>
>>> diff --git gcc/bb-reorder.c gcc/bb-reorder.c
>>> index 8f8c420..2115b01 100644
>>> --- gcc/bb-reorder.c
>>> +++ gcc/bb-reorder.c
>>> @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
>>>  we are going to omit the reordering.  */
>>>   && optimize_function_for_speed_p (cfun)
>>>   && !DECL_ONE_ONLY (current_function_decl)
>>> - && !DECL_SECTION_NAME (current_function_decl));
>>> + && !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
>>>  }
>>>
>>>  /* This function is the main 'entrance' for the optimization that
>>> diff --git gcc/tree.h gcc/tree.h
>>> index 817507f..738675a 100644
>>> --- gcc/tree.h
>>> +++ gcc/tree.h
>>> @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
>>>  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
>>>(DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.implicit_section_name_p)
>>>
>>> +/* Speficy whether the section name was explicitly set with 
>>> decl_attributes. */
>>> +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
>>> +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
>>> +   !!DECL_SECTION_NAME(NODE))
>>
>> Personally, I think it is clearer to simply write this as:
>>
>> #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
>> (DECL_SECTION_NAME(NODE) != NULL_TREE \
>>  && !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))
>>
>> Teresa
>>
>>> +
>>>  struct GTY(()) tree_decl_with_vis {
>>>   struct tree_decl_with_rtl common;
>>>   tree assembler_name;
>>> --
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Fix AutoFDO LIPO ICE due to eliminated abstract origin

2014-08-23 Thread Teresa Johnson
This patch ensures we don't prematurely delete an abstract origin node, leading
to creating a new one after LIPO linking when we invoke similar handling in
symtab_remove_unreachable_nodes, which then does not have a resolved node. It
makes the handling of such nodes equivalent to that in
symtab_remove_unreachable_nodes.

Tested with regression tests and internal benchmarks. Ok for google/4_9

2014-08-23  Teresa Johnson  

Google ref b/16731481
* cgraphunit.c (analyze_functions): Don't remove origin node.

Index: cgraphunit.c
===
--- cgraphunit.c(revision 214320)
+++ cgraphunit.c(working copy)
@@ -1051,6 +1051,7 @@ analyze_functions (void)
  struct cgraph_node *origin_node
  = cgraph_get_node (DECL_ABSTRACT_ORIGIN (decl));
  origin_node->used_as_abstract_origin = true;
+ enqueue_node (origin_node);
}
}
      else

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Fix -fopt-info seg fault in AutoFDO LIPO mode

2014-08-23 Thread Teresa Johnson
Fixes seg fault when using -fopt-info with AutoFDO LIPO. Ensure that the
mapping between module name and ident is setup.

Tested with regression tests and internal benchmarks. Ok for google/4_9

2014-08-23  Teresa Johnson  

Google ref b/17124135
* auto-profile.c (read_aux_modules): Record the module name/ident pair.
* coverage.c (record_module_name): Make non-static.
* l-ipo.h (record_module_name): Ditto.

Index: auto-profile.c
===
--- auto-profile.c  (revision 214320)
+++ auto-profile.c  (working copy)
@@ -958,6 +958,7 @@ read_aux_modules (void)
   module_infos = XCNEWVEC (gcov_module_info *, num_aux_modules + 1);
   module_infos[0] = module;
   primary_module_id = module->ident;
+  record_module_name (module->ident, lbasename (in_fnames[0]));
   if (aux_modules == NULL)
 return;
   unsigned curr_module = 1, max_group = PARAM_VALUE (PARAM_MAX_LIPO_GROUP);
@@ -1004,6 +1005,7 @@ read_aux_modules (void)
}
   module_infos[curr_module++] = aux_module;
   add_input_filename (*iter);
+  record_module_name (aux_module->ident, lbasename (*iter));
 }
 }

Index: coverage.c
===
--- coverage.c  (revision 214320)
+++ coverage.c  (working copy)
@@ -658,7 +658,7 @@ typedef struct {

 static vec *mod_names;

-static void
+void
 record_module_name (unsigned int mod_id, const char *name)
 {
   mod_id_to_name_t t;
Index: l-ipo.h
===
--- l-ipo.h (revision 214320)
+++ l-ipo.h (working copy)
@@ -63,6 +63,7 @@ int equivalent_struct_types_for_tbaa (const_tree t
 void lipo_link_and_fixup (void);
 extern void copy_defined_module_set (tree, tree);
 extern bool is_parsing_done_p (void);
+extern void record_module_name (unsigned int, const char *);
 extern const char* get_module_name (unsigned int);

 #endif

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fix AutoFDO LIPO ICE due to eliminated abstract origin

2014-08-23 Thread Teresa Johnson
The ICE is, because we create a new node when querying the node during
symtab_remove_unreachable_nodes after LIPO linking is complete, and
then try to access its resolved node, for which there is none. Trunk
has the same code in these two locations, but the creation of a new
node after deleting the first does not cause any problems.
Teresa

On Sat, Aug 23, 2014 at 11:50 AM, Xinliang David Li  wrote:
> Is it a problem specific to LIPO?
>
> David
>
> On Sat, Aug 23, 2014 at 11:41 AM, Teresa Johnson  wrote:
>> This patch ensures we don't prematurely delete an abstract origin node, 
>> leading
>> to creating a new one after LIPO linking when we invoke similar handling in
>> symtab_remove_unreachable_nodes, which then does not have a resolved node. It
>> makes the handling of such nodes equivalent to that in
>> symtab_remove_unreachable_nodes.
>>
>> Tested with regression tests and internal benchmarks. Ok for google/4_9
>>
>> 2014-08-23  Teresa Johnson  
>>
>> Google ref b/16731481
>> * cgraphunit.c (analyze_functions): Don't remove origin node.
>>
>> Index: cgraphunit.c
>> ===
>> --- cgraphunit.c(revision 214320)
>> +++ cgraphunit.c(working copy)
>> @@ -1051,6 +1051,7 @@ analyze_functions (void)
>>   struct cgraph_node *origin_node
>>   = cgraph_get_node (DECL_ABSTRACT_ORIGIN (decl));
>>   origin_node->used_as_abstract_origin = true;
>> + enqueue_node (origin_node);
>> }
>> }
>>   else
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Fixup varpool references after LIPO linking

2014-08-28 Thread Teresa Johnson
This patch fixes up varpool nodes after LIPO linking as we were doing
for cgraph node references. While, here I made some fixes to the
cgraph fixup as well (mark address taken as appropriate) and removed
old references. The latter exposed an issue with resolved cgraph nodes
getting eliminated when they were only referenced from vtables and the
LIPO linking selected an external copy as the resolved node. Addressed
this by forcing the LIPO linking to prefer the non-external copy.

Passes regression testing, internal benchmark testing in progress. Ok
for google/4_9 if that succeeds?

Teresa

2014-08-28  Teresa Johnson  

Google ref b/17038802.
* l-ipo.c (resolve_cgraph_node): Pick non-external node.
(fixup_reference_list): Fixup varpool references, remove old
references, mark cgraph nodes as address taken as needed.

Index: l-ipo.c
===
--- l-ipo.c (revision 213975)
+++ l-ipo.c (working copy)
@@ -1564,6 +1564,15 @@ resolve_cgraph_node (struct cgraph_sym **slot, str
   (*slot)->rep_decl = decl2;
   return;
 }
+  /* Similarly, pick the non-external symbol, since external
+ symbols may be eliminated by symtab_remove_unreachable_nodes
+ after ipa inlining (see process_references).  */
+  if (DECL_EXTERNAL (decl1) && !DECL_EXTERNAL (decl2))
+{
+  (*slot)->rep_node = node;
+  (*slot)->rep_decl = decl2;
+  return;
+}

   has_prof1 = has_profile_info (decl1);
   bool is_aux1 = cgraph_is_auxiliary (decl1);
@@ -2304,31 +2313,44 @@ fixup_reference_list (struct varpool_node *node)
   int i;
   struct ipa_ref *ref;
   struct ipa_ref_list *list = &node->ref_list;
-  vec new_refered;
+  vec new_refered;
   vec new_refered_type;
-  struct cgraph_node *c;
+  struct symtab_node *sym_node;
   enum ipa_ref_use use_type = IPA_REF_LOAD;

   new_refered.create (10);
   new_refered_type.create (10);
   for (i = 0; ipa_ref_list_reference_iterate (list, i, ref); i++)
 {
-  if (!is_a  (ref->referred))
-continue;
-
-  struct cgraph_node *cnode = ipa_ref_node (ref);
-  struct cgraph_node *r_cnode
-= cgraph_lipo_get_resolved_node (cnode->decl);
-  if (r_cnode != cnode)
+  if (is_a  (ref->referred))
 {
+  struct cgraph_node *cnode = ipa_ref_node (ref);
+  struct cgraph_node *r_cnode
+= cgraph_lipo_get_resolved_node (cnode->decl);
   new_refered.safe_push (r_cnode);
   use_type = ref->use;
   new_refered_type.safe_push ((int) use_type);
+  gcc_assert (use_type != IPA_REF_ADDR
+  || cnode->global.inlined_to
+  || cnode->address_taken);
+  if (use_type == IPA_REF_ADDR)
+cgraph_mark_address_taken_node (r_cnode);
 }
+  else if (is_a  (ref->referred))
+{
+  struct varpool_node *var = ipa_ref_varpool_node (ref);
+  struct varpool_node *r_var = real_varpool_node (var->decl);
+  new_refered.safe_push (r_var);
+  use_type = ref->use;
+  new_refered_type.safe_push ((int) use_type);
+}
+  else
+gcc_assert (false);
 }
-  for (i = 0; new_refered.iterate (i, &c); ++i)
+  ipa_remove_all_references (&node->ref_list);
+  for (i = 0; new_refered.iterate (i, &sym_node); ++i)
 {
-  ipa_record_reference (node, c,
+  ipa_record_reference (node, sym_node,
     (enum ipa_ref_use) new_refered_type[i], NULL);
 }
 }


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fixup varpool references after LIPO linking

2014-08-28 Thread Teresa Johnson
On Thu, Aug 28, 2014 at 1:40 PM, Xinliang David Li  wrote:
>
> Do you know why the previous check is not enough ?
>
> cgraph_can_remove_if_no_direct_calls_and_refs_p (struct cgraph_node *node)

This will return true for the external node, but it also returns true
for the non-external node. The non-external node is a COMDAT, as as
the comments in that routine indicate, COMDATs can be removed even if
they are externally_visible.

Teresa

>
> David
>
>
> On Thu, Aug 28, 2014 at 1:29 PM, Teresa Johnson 
> wrote:
>>
>> This patch fixes up varpool nodes after LIPO linking as we were doing
>> for cgraph node references. While, here I made some fixes to the
>> cgraph fixup as well (mark address taken as appropriate) and removed
>> old references. The latter exposed an issue with resolved cgraph nodes
>> getting eliminated when they were only referenced from vtables and the
>> LIPO linking selected an external copy as the resolved node. Addressed
>> this by forcing the LIPO linking to prefer the non-external copy.
>>
>> Passes regression testing, internal benchmark testing in progress. Ok
>> for google/4_9 if that succeeds?
>>
>> Teresa
>>
>> 2014-08-28  Teresa Johnson  
>>
>> Google ref b/17038802.
>> * l-ipo.c (resolve_cgraph_node): Pick non-external node.
>> (fixup_reference_list): Fixup varpool references, remove old
>> references, mark cgraph nodes as address taken as needed.
>>
>> Index: l-ipo.c
>> ===
>> --- l-ipo.c (revision 213975)
>> +++ l-ipo.c (working copy)
>> @@ -1564,6 +1564,15 @@ resolve_cgraph_node (struct cgraph_sym **slot, str
>>(*slot)->rep_decl = decl2;
>>return;
>>  }
>> +  /* Similarly, pick the non-external symbol, since external
>> + symbols may be eliminated by symtab_remove_unreachable_nodes
>> + after ipa inlining (see process_references).  */
>> +  if (DECL_EXTERNAL (decl1) && !DECL_EXTERNAL (decl2))
>> +{
>> +  (*slot)->rep_node = node;
>> +  (*slot)->rep_decl = decl2;
>> +  return;
>> +}
>>
>>has_prof1 = has_profile_info (decl1);
>>bool is_aux1 = cgraph_is_auxiliary (decl1);
>> @@ -2304,31 +2313,44 @@ fixup_reference_list (struct varpool_node *node)
>>int i;
>>struct ipa_ref *ref;
>>struct ipa_ref_list *list = &node->ref_list;
>> -  vec new_refered;
>> +  vec new_refered;
>>vec new_refered_type;
>> -  struct cgraph_node *c;
>> +  struct symtab_node *sym_node;
>>enum ipa_ref_use use_type = IPA_REF_LOAD;
>>
>>new_refered.create (10);
>>new_refered_type.create (10);
>>for (i = 0; ipa_ref_list_reference_iterate (list, i, ref); i++)
>>  {
>> -  if (!is_a  (ref->referred))
>> -continue;
>> -
>> -  struct cgraph_node *cnode = ipa_ref_node (ref);
>> -  struct cgraph_node *r_cnode
>> -= cgraph_lipo_get_resolved_node (cnode->decl);
>> -  if (r_cnode != cnode)
>> +  if (is_a  (ref->referred))
>>  {
>> +  struct cgraph_node *cnode = ipa_ref_node (ref);
>> +  struct cgraph_node *r_cnode
>> += cgraph_lipo_get_resolved_node (cnode->decl);
>>new_refered.safe_push (r_cnode);
>>use_type = ref->use;
>>new_refered_type.safe_push ((int) use_type);
>> +  gcc_assert (use_type != IPA_REF_ADDR
>> +  || cnode->global.inlined_to
>> +  || cnode->address_taken);
>> +  if (use_type == IPA_REF_ADDR)
>> +cgraph_mark_address_taken_node (r_cnode);
>>  }
>> +  else if (is_a  (ref->referred))
>> +{
>> +  struct varpool_node *var = ipa_ref_varpool_node (ref);
>> +  struct varpool_node *r_var = real_varpool_node (var->decl);
>> +      new_refered.safe_push (r_var);
>> +  use_type = ref->use;
>> +      new_refered_type.safe_push ((int) use_type);
>> +}
>> +  else
>> +gcc_assert (false);
>>  }
>> -  for (i = 0; new_refered.iterate (i, &c); ++i)
>> +  ipa_remove_all_references (&node->ref_list);
>> +  for (i = 0; new_refered.iterate (i, &sym_node); ++i)
>>  {
>> -  ipa_record_reference (node, c,
>> +  ipa_record_reference (node, sym_node,
>>  (enum ipa_ref_use) new_refered_type[i],
>> NULL);
>>  }
>>  }
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-05 Thread Teresa Johnson
Ping.
Thanks, Teresa

On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson  wrote:
> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka  wrote:
>>> Here is the patch updated to use the new parameter from r203830.
>>> Passed bootstrap and regression tests.
>>>
>>> 2013-10-18  Jan Hubicka  
>>> Teresa Johnson  
>>>
>>> * predict.c (handle_missing_profiles): New function.
>>> (counts_to_freqs): Don't overwrite estimated frequencies
>>> when function has no profile counts.
>>> * predict.h (handle_missing_profiles): Declare.
>>> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>>
>>> Index: predict.c
>>> ===
>>> --- predict.c   (revision 203830)
>>> +++ predict.c   (working copy)
>>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>>BITMAP_FREE (tovisit);
>>>  }
>>>
>>
>> You need block comment. It should explain the problem of COMDATs and how 
>> they are handled.
>> It is not an obvious thing.
>
> Done.
>
>>
>>> +void
>>> +handle_missing_profiles (void)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> extra line
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +{
>>> +  struct cgraph_edge *e;
>>> +  gcov_type call_count = 0;
>>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
>> Extra line
>>> +  if (node->count)
>>> +continue;
>>> +  for (e = node->callers; e; e = e->next_caller)
>>> +call_count += e->count;
>> What about checking if the sum is way off even for non-0 counts.
>> I.e. for case where function was inlined to some calls but not to others?  In
>> that case we may want to scale up the counts (with some resonable care for
>> capping)
>
> In this patch I am not changing any counts, so I am leaving this one
> for follow-on work (even for the routines missing counts completely
> like these, we don't apply any counts, just mark them as guessed. I
> have a follow-on patch to send once this goes in that does apply
> counts to these 0-count routines only when we decide to inline as we
> discussed).
>
>>
>> Also I think the code really should propagate - i.e. have simple worklist and
>> look for functions that are COMDAT and are called by other COMDAT functions
>> where we decided to drop the profile.  Having non-trivial chains of comdats 
>> is
>> common thing.
>
> Done.
>
>>
>> What about outputting user visible warning/error on the incosnsistency when
>> no COMDAT function is involved same way as we do for BB profile?
>
> Done. This one caught the fact that we have this situation for "extern
> template" functions as well when I tested on cpu2006. I added in a
> check to ignore those when issuing the warning (they are not emitted
> thus don't get any profile counts).
>
> Updated patch below.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2013-10-30  Teresa Johnson  
>
> * predict.c (drop_profile): New function.
> (handle_missing_profiles): Ditto.
> (counts_to_freqs): Don't overwrite estimated frequencies
> when function has no profile counts.
> * predict.h (handle_missing_profiles): Declare.
> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>
> Index: predict.c
> ===
> --- predict.c   (revision 204213)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>BITMAP_FREE (tovisit);
>  }
>
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +fprintf (dump_file,
> + "Dropping 0 profile for %s/%i. %s based on calls.\n",
> + cgraph_node_name (node), node->order,
> + hot ? "Function is h

Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition

2013-11-08 Thread Teresa Johnson
On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher  wrote:
> On Wed, Apr 24, 2013 at 12:58 AM, Sriraman Tallam wrote:
>> Hi,
>>
>>   This patch generates labels for cold function parts that are split when
>> using the option -freorder-blocks-and-partition.  The cold label name
>> is generated by suffixing ".cold" to the assembler name of the hot
>> function.
>>
>>   This is useful when getting back traces from gdb when the cold function
>> part does get executed.
>
> Can you show how this changes your gdb back traces? Can you not
> already let GDB inform you that it's in another .text section?
>
> I don't like this label idea much. It seems so arbitrary, unless
> you're only interested in knowing when a piece of function from
> another .text section is executed. But I'm personally more interested
> in the whole function, i.e. also in not-so-hot (or even cold) blocks
> that are not in the .text.unlikely section. That means somehow passing
> the profile-use information from the compiler to the debugger.
>
> So what I would really like to have instead, is some kind of DWARF
> annotations for profile information in the function, with some way in
> GDB to inspect and modify the values, and to add break points based on
> execution count threshold.
>
> I haven't really thought this through yet, but I have this dream of
> having the data of a gcno file encoded in DWARF instead, and to have
> profile information included in the FDO-compiled binary to add the
> ability to verify that those parts of the program that you expect to
> be hot are really also considered hot by the compiler. The only way to
> check this now is by inspecting some dumps and that's IMHO not very
> practical.
>
> Ciao!
> Steven

I'd like to revisit this old patch from Sri to generate a label for
the split out cold section, now that all the patches to fix the
failures and insanities from -freorder-blocks-and-partition are in
(and I want to send a patch to enable it by default with fdo). The
problem is that the split code doesn't currently have any label to
identify where it came from, so in the final binary it appears to be
part of whatever non-cold function the linker happens to place it
after. This confuses tools like objdump, profilers and the debugger.

For example, from mcf with -freorder-blocks-and-partition:

main:
.LFB40:
.cfi_startproc
...
jne .L27   < jump to main's text.unlikely
...
.cfi_endproc
.section.text.unlikely
.cfi_startproc
.L27:   <--- start of main's text.unlikely
...

$ objdump -d mcf

00400aa0 :
  ...
  400b1a:   0f 85 1b fb ff ff   jne40063b
   < jump to main's text.unlikely

004004c0 :
  ...
  40063b:   bf 06 8a 49 00  mov$0x498a06,%edi<---
start of main's text.unlikely
  ...
  40065e:   e9 0d 05 00 00  jmpq   400b70 
<--- jump back to main's hot text


Note that objdump thinks we are jumping to/from another function. If
we end up executing the cold section (e.g. due to non-representative
profiles), profiling tools and gdb are going to think we are executing
replace_weaker_arc. With Sri's patch, this becomes much clearer:

00400aa0 :
  ...
  400b1a:   0f 85 1b fb ff ff   jne40063b 

0040063b :
  40063b:   bf 06 8a 49 00  mov$0x498a06,%edi
  ...
  40065e:   e9 0d 05 00 00  jmpq   400b70 


Steven, I think the ideas you describe above about passing profile
information in DWARF and doing some special handling in gdb to use
that seem orthogonal to this.

Given that, is this patch ok for trunk (pending successful re-runs of
regression testing)

Thanks,
Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition

2013-11-08 Thread Teresa Johnson
On Fri, Nov 8, 2013 at 8:14 AM, Steven Bosscher  wrote:
> On Fri, Nov 8, 2013 at 4:45 PM, Teresa Johnson wrote:
>> On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher wrote:
>> I'd like to revisit this old patch from Sri to generate a label for
>> the split out cold section, now that all the patches to fix the
>> failures and insanities from -freorder-blocks-and-partition are in
>> (and I want to send a patch to enable it by default with fdo). The
>> problem is that the split code doesn't currently have any label to
>> identify where it came from, so in the final binary it appears to be
>> part of whatever non-cold function the linker happens to place it
>> after. This confuses tools like objdump, profilers and the debugger.
>
> Surely there are .loc directives in the code to clarify where the code
> came from? So at least the debugger should know what function the code
> belongs to.
>
>> For example, from mcf with -freorder-blocks-and-partition:
>>
>> main:
>> .LFB40:
>> .cfi_startproc
>> ...
>> jne .L27   < jump to main's text.unlikely
>> ...
>> .cfi_endproc
>> .section.text.unlikely
>> .cfi_startproc
>> .L27:   <--- start of main's text.unlikely
>> ...
>>
>> $ objdump -d mcf
>>
>> 00400aa0 :
>>   ...
>>   400b1a:   0f 85 1b fb ff ff   jne40063b
>>< jump to main's text.unlikely
>>
>> 004004c0 :
>>   ...
>>   40063b:   bf 06 8a 49 00  mov$0x498a06,%edi<---
>> start of main's text.unlikely
>>   ...
>>   40065e:   e9 0d 05 00 00  jmpq   400b70 
>> <--- jump back to main's hot text
>>
>>
>> Note that objdump thinks we are jumping to/from another function. If
>> we end up executing the cold section (e.g. due to non-representative
>> profiles), profiling tools and gdb are going to think we are executing
>> replace_weaker_arc. With Sri's patch, this becomes much clearer:
>>
>> 00400aa0 :
>>   ...
>>   400b1a:   0f 85 1b fb ff ff   jne40063b 
>>
>> 0040063b :
>>   40063b:   bf 06 8a 49 00  mov$0x498a06,%edi
>>   ...
>>   40065e:   e9 0d 05 00 00  jmpq   400b70 
>>
>
> Does -ffunction-sections help here?

No. It just moves the cold section to a different spot (so the
function it appears part of is different, but same issue).

>
>
>> Steven, I think the ideas you describe above about passing profile
>> information in DWARF and doing some special handling in gdb to use
>> that seem orthogonal to this.
>
> It's not just about profile information but about what function some
> code belongs to. I just cannot believe that there isn't already
> something in DWARF to mark address ranges as being part of a given
> function.

Cary, any ideas here?

>
>
>> Given that, is this patch ok for trunk (pending successful re-runs of
>> regression testing)
>
>
> IMHO, no. This is conceptually wrong. You create artificial named
> labels in the compiler that might clash with user labels. Is that
> likely to happen? No. But it could and therefore the approach of
> adding functionname.cold labels is wrong.

Is there a format for compiler-defined labels that would not be able
to clash with other user-generated labels?

Teresa

>
> Ciao!
> Steven



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-08 Thread Teresa Johnson
On Tue, Nov 5, 2013 at 6:02 AM, Teresa Johnson  wrote:
> Ping.
> Thanks, Teresa
>

I updated the patch to include the follow-on work to apply call counts
to 0-weight functions using the estimated frequencies when inlining.
This helps avoid 0-weight blocks of inlined code which are
particularly bad for function splitting.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?

Thanks,
Teresa

2013-11-08  Teresa Johnson  
Jan Hubicka  

* predict.c (drop_profile): New function.
(handle_missing_profiles): Ditto.
(counts_to_freqs): Don't overwrite estimated frequencies
when function has no profile counts.
* predict.h (handle_missing_profiles): Declare.
* tree-inline.c (freqs_to_counts): New function.
(copy_cfg_body): Invoke freqs_to_counts as needed.
* tree-profile.c (tree_profiling): Invoke handle_missing_profiles.

Index: predict.c
===
--- predict.c   (revision 204516)
+++ predict.c   (working copy)
@@ -2765,6 +2765,107 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Drop the profile for NODE to guessed, and update its frequency based on
+   whether it is expected to be HOT.  */
+
+static void
+drop_profile (struct cgraph_node *node, bool hot)
+{
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (dump_file)
+fprintf (dump_file,
+ "Dropping 0 profile for %s/%i. %s based on calls.\n",
+ cgraph_node_name (node), node->order,
+ hot ? "Function is hot" : "Function is normal");
+  /* We only expect to miss profiles for functions that are reached
+ via non-zero call edges in cases where the function may have
+ been linked from another module or library (COMDATs and extern
+ templates). See the comments below for handle_missing_profiles.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+warning (0,
+ "Missing counts for called function %s/%i",
+ cgraph_node_name (node), node->order);
+
+  profile_status_for_function (fn)
+  = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency
+  = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+}
+
+/* In the case of COMDAT routines, multiple object files will contain the same
+   function and the linker will select one for the binary. In that case
+   all the other copies from the profile instrument binary will be missing
+   profile counts. Look for cases where this happened, due to non-zero
+   call counts going to 0-count functions, and drop the profile to guessed
+   so that we can use the estimated probabilities and avoid optimizing only
+   for size.
+
+   The other case where the profile may be missing is when the routine
+   is not going to be emitted to the object file, e.g. for "extern template"
+   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
+   all other cases of non-zero calls to 0-count functions.  */
+
+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  vec worklist;
+  worklist.create (64);
+
+  /* See if 0 count function has non-0 count callers.  In this case we
+ lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  struct cgraph_edge *e;
+  gcov_type call_count = 0;
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (node->count)
+continue;
+  for (e = node->callers; e; e = e->next_caller)
+call_count += e->count;
+  if (call_count
+  && fn && fn->cfg
+  && (call_count * unlikely_count_fraction >= profile_info->runs))
+{
+  bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+
+  drop_profile (node, maybe_hot);
+  worklist.safe_push (node);
+}
+}
+
+  /* Propagate the profile dropping to other 0-count COMDATs that are
+ potentially called by COMDATs we already dropped the profile on.  */
+  while (worklist.length () > 0)
+{
+  struct cgraph_edge *e;
+
+  node = worklist.pop ();
+  for (e = node->callees; e; e = e->next_caller)
+{
+  struct cgraph_node *callee = e->callee;
+  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
+
+  if (callee->count > 0)
+continue;
+  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
+  && profile_status_for_function (fn) == PROFILE_READ)
+{
+  /* Since there are no non-0 call counts to this function,
+ we don't know for sur

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Teresa Johnson
On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka  wrote:
>> 2013-11-08  Teresa Johnson  
>> Jan Hubicka  
>>
>> * predict.c (drop_profile): New function.
>> (handle_missing_profiles): Ditto.
>> (counts_to_freqs): Don't overwrite estimated frequencies
>> when function has no profile counts.
>> * predict.h (handle_missing_profiles): Declare.
>> * tree-inline.c (freqs_to_counts): New function.
>> (copy_cfg_body): Invoke freqs_to_counts as needed.
>> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===
>> --- predict.c   (revision 204516)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +fprintf (dump_file,
>> + "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> + cgraph_node_name (node), node->order,
>> + hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> + via non-zero call edges in cases where the function may have
>> + been linked from another module or library (COMDATs and extern
>> + templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +warning (0,
>> + "Missing counts for called function %s/%i",
>> + cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +  = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +  = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the 
>> same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +{
>> +  struct cgraph_edge *e;
>> +  gcov_type call_count = 0;
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (node->count)
>> +continue;
>> +  for (e = node->callers; e; e = e->next_caller)
>> +call_count += e->count;
>> +  if (call_count
>> +  && fn && fn->cfg
>> +  && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +{
>> +  bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +  drop_profile (node, maybe_hot);
>> +  worklist.safe_push (node);
>> +}
>
> Perhaps we should add an note/error on mishandled profile when function is 
> not COMDAT?
> That way we may notice further bugs in this area.

I have a warning like that already in drop_profile(). Is that
sufficient? Also, Steven Bosscher suggested putting that warning under
OPT_Wdisabled_optimization instead of '0', what do you prefer for
that?

>> +}
>> +
>> +  /* Propagate the pro

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Teresa Johnson
On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka  wrote:
>> I have a warning like that already in drop_profile(). Is that
>
> I think it should be warning (or silent) for COMDAT and error/note for
> other functions (depending on flag_profile_correction).
> I guess drop_profile is better place for it indeed.

I don't want to warn for COMDAT since this is currently an expected
issue in that case, so I'm afraid it will be too noisy (but there is a
note emitted to the dump file to track how often this occurs). So
currently the warning is only emitted in cases where we don't expect
it to occur (e.g. non-comdat).

>
>> sufficient? Also, Steven Bosscher suggested putting that warning under
>> OPT_Wdisabled_optimization instead of '0', what do you prefer for
>> that?
>
> It is a bit different case than other disabled optimizations we have (where
> the optimization does not happen because of --param tunable limits), but I
> am fine with both alternatives.
> The warning may end up quite noisy so having way to silence it definitely
> makes sense.

I didn't find any warnings being emitted when running this for spec or
internal benchmarks, so how about instead of a warning, I handle it
like you suggest above: depending on the setting of
flag_profile_correction either error or emit a note to the dump file
under MSG_MISSED_OPTIMIZATION?

>>
>> >> +}
>> >> +
>> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> >> + potentially called by COMDATs we already dropped the profile on.  */
>> >> +  while (worklist.length () > 0)
>> >> +{
>> >> +  struct cgraph_edge *e;
>> >> +
>> >> +  node = worklist.pop ();
>> >> +  for (e = node->callees; e; e = e->next_caller)
>> >> +{
>> >> +  struct cgraph_node *callee = e->callee;
>> >> +  struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> >> +
>> >> +  if (callee->count > 0)
>> >> +continue;
>> >> +  if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> >> +  && profile_status_for_function (fn) == PROFILE_READ)
>> >
>> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on 
>> > profile estimate
>> > to give us false only for really known to be unlikely paths? (i.e. EH 
>> > handling, noreturns
>> > etc.)
>>
>> Ok, let me try this.
>>
>> >> +{
>> >> +  /* Since there are no non-0 call counts to this function,
>> >> + we don't know for sure whether it is hot. Indicate to
>> >> + the drop_profile routine that function should be marked
>> >> + normal, rather than hot.  */
>> >> +  drop_profile (node, false);
>> >> +  worklist.safe_push (callee);
>> >> +}
>> >> +}
>> >> +}
>> >> +  worklist.release ();
>> >
>> > I would add a pointer set so we avoid duplicates in worklist.  This is 
>> > potentially quadratic
>> > for large programs.
>>
>> I'll add a visited_nodes set to keep track of processed nodes so they
>> don't get added to the worklist multiple times.
>
> Perhaps you can also track this by testing profile_status_for_function. Both 
> solutions are fine for me.

Ah - I just realized I am already checking profile_status_for_function
above and adding to the worklist if it is still PROFILE_READ. Since I
call drop_profile before adding to the worklist, we can't end up
adding multiple times. So I don't think there is currently an issue
with this.

Thanks,
Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > OK, with these changes.
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-11 Thread Teresa Johnson
On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka  wrote:
>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka  wrote:
>> >> I have a warning like that already in drop_profile(). Is that
>> >
>> > I think it should be warning (or silent) for COMDAT and error/note for
>> > other functions (depending on flag_profile_correction).
>> > I guess drop_profile is better place for it indeed.
>>
>> I don't want to warn for COMDAT since this is currently an expected
>> issue in that case, so I'm afraid it will be too noisy (but there is a
>> note emitted to the dump file to track how often this occurs). So
>> currently the warning is only emitted in cases where we don't expect
>> it to occur (e.g. non-comdat).
>
> I see, I misread it.
> The non-comdat warning IMO ought go the same way as the other profile 
> confussions.
> I guess something like
> if (!flag_profile_correction)
>   error (...);
> like existing cases in profile.c

Ok, will do (emitting a note to the dump file as in other cases in
profile.c that do this same thing).

>>
>> I didn't find any warnings being emitted when running this for spec or
>> internal benchmarks, so how about instead of a warning, I handle it
>> like you suggest above: depending on the setting of
>> flag_profile_correction either error or emit a note to the dump file
>> under MSG_MISSED_OPTIMIZATION?
>
> Yep, lets just go with error with !flag_profile_correction and being silent
> otherwise.
> If we want to go with warnings, we need to change all the similar places
> in profile.c and elsewhere.
>>
>> Ah - I just realized I am already checking profile_status_for_function
>> above and adding to the worklist if it is still PROFILE_READ. Since I
>> call drop_profile before adding to the worklist, we can't end up
>> adding multiple times. So I don't think there is currently an issue
>> with this.
>
> Yep, indeed.
>
> The patch is OK with the error message as discussed above.

Ok, will do that and fix a couple of minor issues mentioned by Steven
(missing comment and incorrect indentation in one spot). Will retest
just to make sure I didn't miss any warnings which will now be errors.

Thanks,
Teresa

>
> Thanks.
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Teresa
>> >>
>> >> >
>> >> > OK, with these changes.
>> >> >
>> >> > Honza
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition

2013-11-12 Thread Teresa Johnson
On Fri, Nov 8, 2013 at 8:20 AM, Teresa Johnson  wrote:
> On Fri, Nov 8, 2013 at 8:14 AM, Steven Bosscher  wrote:
>> On Fri, Nov 8, 2013 at 4:45 PM, Teresa Johnson wrote:
>>> On Tue, May 14, 2013 at 3:47 PM, Steven Bosscher wrote:
>>> I'd like to revisit this old patch from Sri to generate a label for
>>> the split out cold section, now that all the patches to fix the
>>> failures and insanities from -freorder-blocks-and-partition are in
>>> (and I want to send a patch to enable it by default with fdo). The
>>> problem is that the split code doesn't currently have any label to
>>> identify where it came from, so in the final binary it appears to be
>>> part of whatever non-cold function the linker happens to place it
>>> after. This confuses tools like objdump, profilers and the debugger.
>>
>> Surely there are .loc directives in the code to clarify where the code
>> came from? So at least the debugger should know what function the code
>> belongs to.
>>
>>> For example, from mcf with -freorder-blocks-and-partition:
>>>
>>> main:
>>> .LFB40:
>>> .cfi_startproc
>>> ...
>>> jne .L27   < jump to main's text.unlikely
>>> ...
>>> .cfi_endproc
>>> .section.text.unlikely
>>> .cfi_startproc
>>> .L27:   <--- start of main's text.unlikely
>>> ...
>>>
>>> $ objdump -d mcf
>>>
>>> 00400aa0 :
>>>   ...
>>>   400b1a:   0f 85 1b fb ff ff   jne40063b
>>>< jump to main's text.unlikely
>>>
>>> 004004c0 :
>>>   ...
>>>   40063b:   bf 06 8a 49 00  mov$0x498a06,%edi<---
>>> start of main's text.unlikely
>>>   ...
>>>   40065e:   e9 0d 05 00 00  jmpq   400b70 
>>> <--- jump back to main's hot text
>>>
>>>
>>> Note that objdump thinks we are jumping to/from another function. If
>>> we end up executing the cold section (e.g. due to non-representative
>>> profiles), profiling tools and gdb are going to think we are executing
>>> replace_weaker_arc. With Sri's patch, this becomes much clearer:
>>>
>>> 00400aa0 :
>>>   ...
>>>   400b1a:   0f 85 1b fb ff ff   jne40063b 
>>>
>>> 0040063b :
>>>   40063b:   bf 06 8a 49 00  mov$0x498a06,%edi
>>>   ...
>>>   40065e:   e9 0d 05 00 00  jmpq   400b70 
>>>
>>
>> Does -ffunction-sections help here?
>
> No. It just moves the cold section to a different spot (so the
> function it appears part of is different, but same issue).
>
>>
>>
>>> Steven, I think the ideas you describe above about passing profile
>>> information in DWARF and doing some special handling in gdb to use
>>> that seem orthogonal to this.
>>
>> It's not just about profile information but about what function some
>> code belongs to. I just cannot believe that there isn't already
>> something in DWARF to mark address ranges as being part of a given
>> function.
>
> Cary, any ideas here?

I spoke with Cary a little bit (Cary, feel free to add to this).
Regarding this question and the follow-up email:

> Isn't this something that should be expressed in DWARF with
> DW_AT_ranges? See DWARF4, section 2.17,
>
> Does GCC generate such ranges?

GCC does generate these ranges. However, according to Cary many tools
do not rely on dwarf info for locating the corresponding function
name, they just look at the symbols to identify what function an
address resides in. Nor would we want tools such as objdump and
profilers to rely on dwarf for locating the function names as this
would not work for binaries that were generated without -g options or
had their debug info stripped.

>
>>
>>
>>> Given that, is this patch ok for trunk (pending successful re-runs of
>>> regression testing)
>>
>>
>> IMHO, no. This is conceptually wrong. You create artificial named
>> labels in the compiler that might clash with user labels. Is that
>> likely to happen? No. But it could and therefore the approach of
>> adding functionname.cold labels is wrong.
>
> Is there a format for compiler-defined labels that would not be able
> to clash with other user-generated labels?

My understanding is that the "." in the generated name ensures that it
will not clash with user-generated labels which cannot contain ".". So
this should not be an issue.

Thanks,
Teresa

>
> Teresa
>
>>
>> Ciao!
>> Steven
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-12 Thread Teresa Johnson
On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson  wrote:
> On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka  wrote:
>>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka  wrote:
>>> >> I have a warning like that already in drop_profile(). Is that
>>> >
>>> > I think it should be warning (or silent) for COMDAT and error/note for
>>> > other functions (depending on flag_profile_correction).
>>> > I guess drop_profile is better place for it indeed.
>>>
>>> I don't want to warn for COMDAT since this is currently an expected
>>> issue in that case, so I'm afraid it will be too noisy (but there is a
>>> note emitted to the dump file to track how often this occurs). So
>>> currently the warning is only emitted in cases where we don't expect
>>> it to occur (e.g. non-comdat).
>>
>> I see, I misread it.
>> The non-comdat warning IMO ought go the same way as the other profile 
>> confussions.
>> I guess something like
>> if (!flag_profile_correction)
>>   error (...);
>> like existing cases in profile.c
>
> Ok, will do (emitting a note to the dump file as in other cases in
> profile.c that do this same thing).
>
>>>
>>> I didn't find any warnings being emitted when running this for spec or
>>> internal benchmarks, so how about instead of a warning, I handle it
>>> like you suggest above: depending on the setting of
>>> flag_profile_correction either error or emit a note to the dump file
>>> under MSG_MISSED_OPTIMIZATION?
>>
>> Yep, lets just go with error with !flag_profile_correction and being silent
>> otherwise.
>> If we want to go with warnings, we need to change all the similar places
>> in profile.c and elsewhere.
>>>
>>> Ah - I just realized I am already checking profile_status_for_function
>>> above and adding to the worklist if it is still PROFILE_READ. Since I
>>> call drop_profile before adding to the worklist, we can't end up
>>> adding multiple times. So I don't think there is currently an issue
>>> with this.
>>
>> Yep, indeed.
>>
>> The patch is OK with the error message as discussed above.
>
> Ok, will do that and fix a couple of minor issues mentioned by Steven
> (missing comment and incorrect indentation in one spot). Will retest
> just to make sure I didn't miss any warnings which will now be errors.
>
> Thanks,
> Teresa
>
>>
>> Thanks.
>> Honza
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > Honza
>>> >>
>>> >> Thanks,
>>> >> Teresa
>>> >>
>>> >> >
>>> >> > OK, with these changes.
>>> >> >
>>> >> > Honza

This was committed earlier this morning as r204704. Then I hit the new
error when doing an lto-bootstrap profiledbootstrap for an unrelated
change. I'd like to change this error to a warning for now to avoid
breaking the lto profiledbootstrap while I investigate. I don't think
this really needs to be a hard error at this point.

Running regression tests and lto profiledbootstrap right now. Ok for
trunk if testing succeeds?

2013-11-12  Teresa Johnson  

* predict.c (drop_profile): Error is currently too strict.

Index: predict.c
=======
--- predict.c   (revision 204704)
+++ predict.c   (working copy)
@@ -2792,8 +2792,8 @@ drop_profile (struct cgraph_node *node, bool hot)
  cgraph_node_name (node), node->order);
     }
   else
-error ("Missing counts for called function %s/%i",
-   cgraph_node_name (node), node->order);
+warning (0, "Missing counts for called function %s/%i",
+ cgraph_node_name (node), node->order);
 }

   profile_status_for_function (fn)

Thanks,
Teresa

>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-12 Thread Teresa Johnson
On Tue, Nov 12, 2013 at 11:08 AM, Teresa Johnson  wrote:
> On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson  wrote:
>> On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka  wrote:
>>>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka  wrote:
>>>> >> I have a warning like that already in drop_profile(). Is that
>>>> >
>>>> > I think it should be warning (or silent) for COMDAT and error/note for
>>>> > other functions (depending on flag_profile_correction).
>>>> > I guess drop_profile is better place for it indeed.
>>>>
>>>> I don't want to warn for COMDAT since this is currently an expected
>>>> issue in that case, so I'm afraid it will be too noisy (but there is a
>>>> note emitted to the dump file to track how often this occurs). So
>>>> currently the warning is only emitted in cases where we don't expect
>>>> it to occur (e.g. non-comdat).
>>>
>>> I see, I misread it.
>>> The non-comdat warning IMO ought go the same way as the other profile 
>>> confussions.
>>> I guess something like
>>> if (!flag_profile_correction)
>>>   error (...);
>>> like existing cases in profile.c
>>
>> Ok, will do (emitting a note to the dump file as in other cases in
>> profile.c that do this same thing).
>>
>>>>
>>>> I didn't find any warnings being emitted when running this for spec or
>>>> internal benchmarks, so how about instead of a warning, I handle it
>>>> like you suggest above: depending on the setting of
>>>> flag_profile_correction either error or emit a note to the dump file
>>>> under MSG_MISSED_OPTIMIZATION?
>>>
>>> Yep, lets just go with error with !flag_profile_correction and being silent
>>> otherwise.
>>> If we want to go with warnings, we need to change all the similar places
>>> in profile.c and elsewhere.
>>>>
>>>> Ah - I just realized I am already checking profile_status_for_function
>>>> above and adding to the worklist if it is still PROFILE_READ. Since I
>>>> call drop_profile before adding to the worklist, we can't end up
>>>> adding multiple times. So I don't think there is currently an issue
>>>> with this.
>>>
>>> Yep, indeed.
>>>
>>> The patch is OK with the error message as discussed above.
>>
>> Ok, will do that and fix a couple of minor issues mentioned by Steven
>> (missing comment and incorrect indentation in one spot). Will retest
>> just to make sure I didn't miss any warnings which will now be errors.
>>
>> Thanks,
>> Teresa
>>
>>>
>>> Thanks.
>>> Honza
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> >
>>>> > Honza
>>>> >>
>>>> >> Thanks,
>>>> >> Teresa
>>>> >>
>>>> >> >
>>>> >> > OK, with these changes.
>>>> >> >
>>>> >> > Honza
>
> This was committed earlier this morning as r204704. Then I hit the new
> error when doing an lto-bootstrap profiledbootstrap for an unrelated
> change. I'd like to change this error to a warning for now to avoid
> breaking the lto profiledbootstrap while I investigate. I don't think
> this really needs to be a hard error at this point.

More info on the lto bootstrap failure:

/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
warning: Missing counts for called function pex_child_error.isra.1/73
[enabled by default]
 }

This is an error handling routine that invokes _exit. Looking at the
ipa-profile dump, I can see that all the counts read in for
pex_child_error have the value 0. But there is a non-zero callsite,
that not surprisingly complains of a profile insanity in the bb's
outgoing edge weights, since pex_child_error never returns:

;;   basic block 38, loop depth 1, count 192, freq 5000
;;   Invalid sum of outgoing counts 0, should be 192
...
  pex_child_error.isra.1D.5005 (_92, executable_40(D),
[/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
677] "execv", _91);
;;succ:   3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)

Not sure why the counts were all 0 for this routine though, I wouldn't
think the early exit should prevent us from updating the counts. But
IMO the best thing to do here is to issue a warning, since I don't
want more issues like this to cause compile errors when we handled
them fine before.

Let me kn

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-12 Thread Teresa Johnson
On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka  wrote:
>> More info on the lto bootstrap failure:
>>
>> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
>> warning: Missing counts for called function pex_child_error.isra.1/73
>> [enabled by default]
>>  }
>>
>> This is an error handling routine that invokes _exit. Looking at the
>> ipa-profile dump, I can see that all the counts read in for
>> pex_child_error have the value 0. But there is a non-zero callsite,
>> that not surprisingly complains of a profile insanity in the bb's
>> outgoing edge weights, since pex_child_error never returns:
>>
>> ;;   basic block 38, loop depth 1, count 192, freq 5000
>> ;;   Invalid sum of outgoing counts 0, should be 192
>> ...
>>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
>> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
>> 677] "execv", _91);
>> ;;succ:   3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
>>
>> Not sure why the counts were all 0 for this routine though, I wouldn't
>> think the early exit should prevent us from updating the counts. But
>> IMO the best thing to do here is to issue a warning, since I don't
>> want more issues like this to cause compile errors when we handled
>> them fine before.
>>
>> Let me know if the patch below is ok.
>
> OK, so _exit actually makes us to miss streaming of the path leading to the
> error message?  Handling of vfork is somewhat broken, as I noticed with Martin
> Liska. One problem is that gcov_exit is not clearing the counts.  With vfork
> being used in addition to execvs we stream out the data but the counts stays 
> in
> the oriignal process memory and then are streamed again.  If execve fails and
> leads to _exit I think we can get the miscompare you see.
>
> Does calling gcov_clear within gcov_exit help?  (It is what I have in our 
> tree)

I thought this was already handled by the __gcov_execv wrapper around
execv which invokes __gcov_flush that in turns does a gcov_exit
followed by gcov_clear?

I think the issue is that we register gcov_exit with atexit(), but
_exit does not execute any atexit functions by definition. So that
would explain why the counters from pex_child_error didn't get dumped.
The caller bb invokes execv before invoking pex_child_error:
  execv (executable, to_ptr32 (argv));
  pex_child_error (obj, executable, "execv", errno);
So the counters of the caller bb are dumped (hence the caller bb has
non-zero counts) and cleared, and the pex_child_error does not dump
its counters since it uses _exit instead of exit and therefore doesn't
invoke gcov_exit atexit.

If that is what is going on, then I don't see how adding a call to
gcov_clear to gcov_exit would help. Dealing with situations like this
is why I think it is better to convert this to a warning.

Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] decide edge's hotness when there is profile info

2013-11-13 Thread Teresa Johnson
On Tue, Nov 12, 2013 at 2:35 PM, Jan Hubicka  wrote:
>> On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka  wrote:
>> >> More info on the lto bootstrap failure:
>> >>
>> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
>> >> warning: Missing counts for called function pex_child_error.isra.1/73
>> >> [enabled by default]
>> >>  }
>> >>
>> >> This is an error handling routine that invokes _exit. Looking at the
>> >> ipa-profile dump, I can see that all the counts read in for
>> >> pex_child_error have the value 0. But there is a non-zero callsite,
>> >> that not surprisingly complains of a profile insanity in the bb's
>> >> outgoing edge weights, since pex_child_error never returns:
>> >>
>> >> ;;   basic block 38, loop depth 1, count 192, freq 5000
>> >> ;;   Invalid sum of outgoing counts 0, should be 192
>> >> ...
>> >>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
>> >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
>> >> 677] "execv", _91);
>> >> ;;succ:   3 [100.0%]  
>> >> (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
>> >>
>> >> Not sure why the counts were all 0 for this routine though, I wouldn't
>> >> think the early exit should prevent us from updating the counts. But
>> >> IMO the best thing to do here is to issue a warning, since I don't
>> >> want more issues like this to cause compile errors when we handled
>> >> them fine before.
>> >>
>> >> Let me know if the patch below is ok.
>> >
>> > OK, so _exit actually makes us to miss streaming of the path leading to the
>> > error message?  Handling of vfork is somewhat broken, as I noticed with 
>> > Martin
>> > Liska. One problem is that gcov_exit is not clearing the counts.  With 
>> > vfork
>> > being used in addition to execvs we stream out the data but the counts 
>> > stays in
>> > the oriignal process memory and then are streamed again.  If execve fails 
>> > and
>> > leads to _exit I think we can get the miscompare you see.
>> >
>> > Does calling gcov_clear within gcov_exit help?  (It is what I have in our 
>> > tree)
>>
>> I thought this was already handled by the __gcov_execv wrapper around
>> execv which invokes __gcov_flush that in turns does a gcov_exit
>> followed by gcov_clear?
>>
>> I think the issue is that we register gcov_exit with atexit(), but
>> _exit does not execute any atexit functions by definition. So that
>> would explain why the counters from pex_child_error didn't get dumped.
>> The caller bb invokes execv before invoking pex_child_error:
>>   execv (executable, to_ptr32 (argv));
>>   pex_child_error (obj, executable, "execv", errno);
>> So the counters of the caller bb are dumped (hence the caller bb has
>> non-zero counts) and cleared, and the pex_child_error does not dump
>> its counters since it uses _exit instead of exit and therefore doesn't
>> invoke gcov_exit atexit.
>
> Hmm, I see, the problem here is that execv gets BB splitted (and profile is 
> correct)
> but before we get into ipa_profile the BBs get re-merged and we incorrectly 
> put
> count of 1 to pex_child_error.
>
> I suppose this will always happen when you have function terminating program
> (execve) before other function call.  Perhaps we can warn only when the 
> difference in counts
> is greater than number of train runs?

Ok, that sounds good. Here is the new patch. Is this ok for trunk if
testing (bootstrap regression and lto profiledbootstrap) succeeds?

Thanks,
Teresa

2013-11-13  Teresa Johnson  

* predict.c (drop_profile): Error is currently too strict.
(handle_missing_profiles): Pass call_count to drop_profile.

Index: predict.c
===
--- predict.c   (revision 204704)
+++ predict.c   (working copy)
@@ -2766,12 +2766,17 @@ estimate_loops (void)
 }

 /* Drop the profile for NODE to guessed, and update its frequency based on
-   whether it is expected to be HOT.  */
+   whether it is expected to be hot given the CALL_COUNT.  */

 static void
-drop_profile (struct cgraph_node *node, bool hot)
+drop_profile (struct cgraph_node *node, gcov_type call_count)
 {
   struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+  /* In the case where this was called by another function with a
+ dropped profile, call_count will be 0. Since there are no
+

[PATCH] Fix PR ipa/58862 (overflow in edge_badness computation)

2013-11-13 Thread Teresa Johnson
The following fixes PR ipa/58862, which caused failures in lto
profiledbootstrap and in several spec cpu2006 profile-use builds.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that
it fixed the lto profiledbootstrap and cpu2006 failures. Ok for trunk?

Thanks,
Teresa

2013-11-13  Teresa Johnson  

PR ipa/58862
* ipa-inline.c (edge_badness): Fix overflow.

Index: ipa-inline.c
===
--- ipa-inline.c(revision 204703)
+++ ipa-inline.c(working copy)
@@ -909,7 +909,7 @@ edge_badness (struct cgraph_edge *edge, bool dump)
   /* Capping edge->count to max_count. edge->count can be larger than
 max_count if an inline adds new edges which increase max_count
 after max_count is computed.  */
-  int edge_count = edge->count > max_count ? max_count : edge->count;
+  gcov_type edge_count = edge->count > max_count ? max_count : edge->count;

   sreal_init (&relbenefit_real, relbenefit, 0);
   sreal_init (&growth_real, growth, 0);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition

2013-11-15 Thread Teresa Johnson
When testing with -freorder-blocks-and-partition enabled, I hit a
verification failure in an LTO profiledbootstrap. Edge forwarding
performed when we went into cfg layout mode after bb reordering
(during compgotos) created a situation where a hot block was then
dominated by a cold block and was therefore remarked as cold. Because
bb reorder was complete at that point, it was not moved in the
physical layout, and we incorrectly went in and out of the cold
section multiple times.

The following patch addresses that by fixing the layout when we move
blocks to the cold section after bb reordering is complete.

Tested with an LTO profiledbootstrap with
-freorder-blocks-and-partition enabled. Ok for trunk?

Thanks,
Teresa

2013-11-15  Teresa Johnson  

* cfgrtl.c (fixup_partitions): Reorder blocks if necessary.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 204792)
+++ cfgrtl.c(working copy)
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ggc.h"
 #include "tree-pass.h"
 #include "df.h"
+#include "pointer-set.h"

 /* Holds the interesting leading and trailing notes for the function.
Only applicable if the CFG is in cfglayout mode.  */
@@ -2332,14 +2333,69 @@ fixup_partitions (void)
  forwarding and unreachable block deletion.  */
   vec bbs_to_fix = find_partition_fixes (false);

+  unsigned new_cold_count = bbs_to_fix.length();
+  if (! new_cold_count)
+return;
+
   /* Do the partition fixup after all necessary blocks have been converted to
  cold, so that we only update the region crossings the minimum number of
  places, which can require forcing edges to be non fallthru.  */
+  struct pointer_set_t *bbs_moved_to_cold = pointer_set_create ();
   while (! bbs_to_fix.is_empty ())
 {
   bb = bbs_to_fix.pop ();
   fixup_new_cold_bb (bb);
+  pointer_set_insert (bbs_moved_to_cold, bb);
 }
+
+  /* If this happens after bb reordering is done we need to adjust the layout
+ via the next_bb pointers. Otherwise we will incorrectly have multiple
+ switches in and out of the cold section, which is illegal and will be
+ flagged by verify_flow_info. Currently, the compgotos pass goes back into
+ and out of cfglayout mode after bb reordering is complete. When
+ going into cfglayout mode, cfg optimizations may detect opportunities
+ for edge forwarding, which could leave a block in the hot section
+ dominated by a cold block, which will cause the above code to move
+ into to the cold section. By adjusting the next_bb/prev_bb pointers we
+ ensure that fixup_reorder_chain correctly relinks the blocks on exit
+ from cfglayout mode.  */
+  if (crtl->bb_reorder_complete)
+{
+  /* We should only be doing optimizations that require moving
+ blocks between sections after reordering when we are in cfglayout
+ mode.  */
+  gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
+
+  /* Walk through in layout order, rip out each bb that was moved to
+ the cold section and physically move it to the end of the cold
+ section in layout order. Doing it in this order will guarantee
+ that any fallthrough blocks that were both reassigned to cold stay
+ in the right order.  */
+  unsigned moved_bbs = 0;
+  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)
+{
+  if (BB_PARTITION (bb) != BB_COLD_PARTITION)
+continue;
+
+  /* If this was already a cold bb (it is not in the set
+ bbs_moved_to_cold), then we have hit the original cold
+ section blocks and we are done.  */
+  if (!pointer_set_contains (bbs_moved_to_cold, bb))
+{
+  /* Make sure we processed all the newly cold bbs.  */
+  gcc_assert (moved_bbs == new_cold_count);
+  break;
+}
+
+  /* Remove bb from current layout position.  */
+  unlink_block (bb);
+  /* Move bb to the end of the cold section.  */
+  link_block (bb, EXIT_BLOCK_PTR->prev_bb);
+
+  moved_bbs++;
+}
+}
+pointer_set_destroy (bbs_moved_to_cold);
 }

 /* Verify, in the basic block chain, that there is at most one switch

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition

2013-11-16 Thread Teresa Johnson
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka  wrote:
>> When testing with -freorder-blocks-and-partition enabled, I hit a
>> verification failure in an LTO profiledbootstrap. Edge forwarding
>> performed when we went into cfg layout mode after bb reordering
>> (during compgotos) created a situation where a hot block was then
>> dominated by a cold block and was therefore remarked as cold. Because
>> bb reorder was complete at that point, it was not moved in the
>> physical layout, and we incorrectly went in and out of the cold
>> section multiple times.
>>
>> The following patch addresses that by fixing the layout when we move
>> blocks to the cold section after bb reordering is complete.
>>
>> Tested with an LTO profiledbootstrap with
>> -freorder-blocks-and-partition enabled. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-15  Teresa Johnson  
>>
>> * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.
>
> computed_gotos just unfactors unified blocks that we use to avoid CFGs with
> O(n^2) edges. This is mostly to avoid problems with nonlinearity of other 
> passes
> and to reduce the quadratic memory use case to one function at a time.
>
> I wonder if it won't be cleaner to simply unfactor those just before 
> pass_reorder_blocks.
>
> Computed gotos are used e.g. in libjava interpreter to optimize the tight 
> interpretting
> loop.  I think those cases would benefit from having at least 
> scheduling/reordering
> and alignments done right.
>
> Of course it depends on how bad the compile time implications are (I think in 
> addition
> to libjava, there was a lucier's testcase that made us to go for this trick) ,
> but I would prefer it over ading yet another hack into cfgrtl...
> We also may just avoid cfglayout cleanup_cfg while doing computed gotos...

Note I haven't done an extensive check to see if compgotos is the only
phase that goes back into cfglayout mode after bb reordering is done,
that's just the one that hit this. Eventually it might be good to
prevent going into cfglayout mode after bb reordering.

For now we could either fix up the layout as I am doing here. Or as
you suggest, prevent some cleanup/cfg optimization after bb reordering
is done. I thought about preventing the forwarding optimization after
bb reordering when splitting was on initially, but didn't want
enabling -freorder-blocks-and-partition to unnecessarily prevent
optimization. The reordering seemed reasonably straightforward so I
went with that solution in this patch.

Let me know if you'd rather have the solution of preventing the
forwarding (or maybe all all of try_optimize_cfg to be safe) under
-freorder-blocks-and-partition after bb reordering.

Thanks,
Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition

2013-11-18 Thread Teresa Johnson
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka  wrote:
>> When testing with -freorder-blocks-and-partition enabled, I hit a
>> verification failure in an LTO profiledbootstrap. Edge forwarding
>> performed when we went into cfg layout mode after bb reordering
>> (during compgotos) created a situation where a hot block was then
>> dominated by a cold block and was therefore remarked as cold. Because
>> bb reorder was complete at that point, it was not moved in the
>> physical layout, and we incorrectly went in and out of the cold
>> section multiple times.
>>
>> The following patch addresses that by fixing the layout when we move
>> blocks to the cold section after bb reordering is complete.
>>
>> Tested with an LTO profiledbootstrap with
>> -freorder-blocks-and-partition enabled. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-15  Teresa Johnson  
>>
>> * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.
>
> computed_gotos just unfactors unified blocks that we use to avoid CFGs with
> O(n^2) edges. This is mostly to avoid problems with nonlinearity of other 
> passes
> and to reduce the quadratic memory use case to one function at a time.
>
> I wonder if it won't be cleaner to simply unfactor those just before 
> pass_reorder_blocks.
>
> Computed gotos are used e.g. in libjava interpreter to optimize the tight 
> interpretting
> loop.  I think those cases would benefit from having at least 
> scheduling/reordering
> and alignments done right.
>
> Of course it depends on how bad the compile time implications are (I think in 
> addition
> to libjava, there was a lucier's testcase that made us to go for this trick) ,
> but I would prefer it over ading yet another hack into cfgrtl...
> We also may just avoid cfglayout cleanup_cfg while doing computed gotos...

I am testing a new patch right now that simply moves compgotos to just
before bb reordering, and ads an assert to cfg_layout_initialize to
detect when we attempt that after bb reordering. I looked at the other
places that go into cfg layout and I think compgotos is currently the
only one after bb reordering.

>From a bb layout perspective it seems like it would be beneficial to
do compgotos before layout. Was the current position just to try to
reduce compile time by keeping the block unified as long as possible?
For your comment "I think those cases would benefit from having at
least scheduling/reordering and alignments done right." do you mean
that it would be better from a code quality perspective for those to
have compgotos done earlier before those passes? That seems to make
sense to me as well.

I'm doing an lto profiledbootstrap with the change right now. Is there
other testing that should be done for this change? Can you point me at
lucier's testcase that you refer to above? I found that PR15242 was
the bug that inspired adding compgotos, but it is a small test case so
I'm not sure what I will learn from trying that other than whether
compgotos still kicks in as expected.

Thanks,
Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition

2013-11-18 Thread Teresa Johnson
On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher  wrote:
> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
>> From a bb layout perspective it seems like it would be beneficial to
>> do compgotos before layout. Was the current position just to try to
>> reduce compile time by keeping the block unified as long as possible?
>
> It was more a hack that got out of hand. Apparently it hurt some
> interpreters (branch prediction!) when the unified computed goto is
> not "unfactored". There was a PR for this, and the unfactoring code I
> added only fixed part of the problem.
>
>
>> For your comment "I think those cases would benefit from having at
>> least scheduling/reordering and alignments done right." do you mean
>> that it would be better from a code quality perspective for those to
>> have compgotos done earlier before those passes? That seems to make
>> sense to me as well.
>
> Theoretically: Yes, perhaps. In practice there isn't much to gain.
> Unfactoring before bb-reorder is probably the most helpful thing, to
> get better results for basic block alignment and placement. But
> scheduling punts on computed gotos (or explodes in some non-linear
> bottleneck).
>
> What used to help is profile-based branch speculation, i.e.
>
> if (*target_addr == most_frequent_target_addr)
>   goto most_frequent_target_add;
> else
>   goto *target_addr;
>
> But I'm not sure if our value profiling based optimizations still have
> this case.
>
>
>> I'm doing an lto profiledbootstrap with the change right now. Is there
>> other testing that should be done for this change? Can you point me at
>> lucier's testcase that you refer to above? I found that PR15242 was
>> the bug that inspired adding compgotos, but it is a small test case so
>> I'm not sure what I will learn from trying that other than whether
>> compgotos still kicks in as expected.
>
> ISTR it's http://gcc.gnu.org/PR26854

Ok, I have confirmed that moving compgotos earlier, to just before bb
reordering, passes an LTO profiledbootstrap with
-freorder-blocks-and-partition forced on (with my prior patch to fixup
the layout removed). I also added an assert to ensure we aren't going
into cfglayout mode after bb reordering is complete. Currently I am
running a normal bootstrap and regression test without
-freorder-blocks-and-partition force enabled.

I confirmed that we still apply compgotos and successfully unfactor
the computed goto in the testcase from PR15242.

I also tried the two test cases in PR26854 (all.i and compiler.i) both
with -O1 -fschedule-insns2 (which shouldn't actually be affected since
compgotos/bbro not on) and with -O3 -fschedule-insns, as described in
the bug report. I compared the time and mem reports before and after
my change to the compgotos pass placement and there is no significant
difference in the time or memory used from bb reordering or other
downstream passes.

Here is the patch I am testing with a normal bootstrap and regression
test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming
that passes?

Thanks,
Teresa

2013-11-18  Teresa Johnson  

* gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
try to go into cfglayout after bb reordering.
* gcc/passes.def: Move compgotos before bb reordering
since it goes into cfglayout.

Index: gcc/cfgrtl.c
===
--- gcc/cfgrtl.c(revision 204977)
+++ gcc/cfgrtl.c(working copy)
@@ -4204,6 +4204,8 @@ cfg_layout_initialize (unsigned int flags)
   rtx x;
   basic_block bb;

+  gcc_assert (!crtl->bb_reorder_complete);
+
   initialize_original_copy_tables ();

   cfg_layout_rtl_register_cfg_hooks ();
Index: gcc/passes.def
===
--- gcc/passes.def  (revision 204977)
+++ gcc/passes.def  (working copy)
@@ -387,6 +387,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_regrename);
  NEXT_PASS (pass_cprop_hardreg);
  NEXT_PASS (pass_fast_rtl_dce);
+ NEXT_PASS (pass_duplicate_computed_gotos);
  NEXT_PASS (pass_reorder_blocks);
  NEXT_PASS (pass_branch_target_load_optimize2);
  NEXT_PASS (pass_leaf_regs);
@@ -398,7 +399,6 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_stack_regs_run);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_compute_alignments);
- NEXT_PASS (pass_duplicate_computed_gotos);
  NEXT_PASS (pass_variable_tracking);
  NEXT_PASS (pass_free_cfg);
  NEXT_PASS (pass_machine_reorg);


>
> Ciao!
> Steven



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition

2013-11-18 Thread Teresa Johnson
On Mon, Nov 18, 2013 at 12:23 PM, Jeff Law  wrote:
> On 11/18/13 12:33, Teresa Johnson wrote:
>>
>> On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher 
>> wrote:
>>>
>>> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
>>>>
>>>>  From a bb layout perspective it seems like it would be beneficial to
>>>> do compgotos before layout. Was the current position just to try to
>>>> reduce compile time by keeping the block unified as long as possible?
>>>
>>>
>>> It was more a hack that got out of hand. Apparently it hurt some
>>> interpreters (branch prediction!) when the unified computed goto is
>>> not "unfactored". There was a PR for this, and the unfactoring code I
>>> added only fixed part of the problem.
>>>
>>>
>>>> For your comment "I think those cases would benefit from having at
>>>> least scheduling/reordering and alignments done right." do you mean
>>>> that it would be better from a code quality perspective for those to
>>>> have compgotos done earlier before those passes? That seems to make
>>>> sense to me as well.
>>>
>>>
>>> Theoretically: Yes, perhaps. In practice there isn't much to gain.
>>> Unfactoring before bb-reorder is probably the most helpful thing, to
>>> get better results for basic block alignment and placement. But
>>> scheduling punts on computed gotos (or explodes in some non-linear
>>> bottleneck).
>>>
>>> What used to help is profile-based branch speculation, i.e.
>>>
>>> if (*target_addr == most_frequent_target_addr)
>>>goto most_frequent_target_add;
>>> else
>>>goto *target_addr;
>>>
>>> But I'm not sure if our value profiling based optimizations still have
>>> this case.
>>>
>>>
>>>> I'm doing an lto profiledbootstrap with the change right now. Is there
>>>> other testing that should be done for this change? Can you point me at
>>>> lucier's testcase that you refer to above? I found that PR15242 was
>>>> the bug that inspired adding compgotos, but it is a small test case so
>>>> I'm not sure what I will learn from trying that other than whether
>>>> compgotos still kicks in as expected.
>>>
>>>
>>> ISTR it's http://gcc.gnu.org/PR26854
>>
>>
>> Ok, I have confirmed that moving compgotos earlier, to just before bb
>> reordering, passes an LTO profiledbootstrap with
>> -freorder-blocks-and-partition forced on (with my prior patch to fixup
>> the layout removed). I also added an assert to ensure we aren't going
>> into cfglayout mode after bb reordering is complete. Currently I am
>> running a normal bootstrap and regression test without
>> -freorder-blocks-and-partition force enabled.
>>
>> I confirmed that we still apply compgotos and successfully unfactor
>> the computed goto in the testcase from PR15242.
>>
>> I also tried the two test cases in PR26854 (all.i and compiler.i) both
>> with -O1 -fschedule-insns2 (which shouldn't actually be affected since
>> compgotos/bbro not on) and with -O3 -fschedule-insns, as described in
>> the bug report. I compared the time and mem reports before and after
>> my change to the compgotos pass placement and there is no significant
>> difference in the time or memory used from bb reordering or other
>> downstream passes.
>>
>> Here is the patch I am testing with a normal bootstrap and regression
>> test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming
>> that passes?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-18  Teresa Johnson  
>>
>>  * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
>>  try to go into cfglayout after bb reordering.
>>  * gcc/passes.def: Move compgotos before bb reordering
>>  since it goes into cfglayout.
>
> This is fine once the bootstrap/regression testing passes.

The bootstrap/regression test passed. Added a comment as Honza
suggested and committed as r204985.

Thanks,
Teresa

>
> Thanks,
> jeff



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH i386] Enable -freorder-blocks-and-partition

2013-11-19 Thread Teresa Johnson
This patch enables -freorder-blocks-and-partition by default for x86
at -O2 and up. It is showing some modest gains in cpu2006 performance
with profile feedback and -O2 on an Intel Westmere system. Specifically,
I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
(1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.

Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
configured with --enable-languages=all,obj-c++ and tested for both
32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".

It would be good to enable this for additional targets as a follow on,
but it needs more testing for both correctness and performance on those
other targets (i.e for correctness because I see a number of places
in other config/*/*.c files that do some special handling under this
option for different targets or simply disable it, so I am not sure
how well-tested it is under different architectural constraints).

Ok for trunk?

Thanks,
Teresa

2013-11-19  Teresa Johnson  

* common/config/i386/i386-common.c: Enable
-freorder-blocks-and-partition at -O2 and up for x86.
* opts.c (finish_options): Only warn if -freorder-blocks-and-
partition was set on command line.

Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 205001)
+++ common/config/i386/i386-common.c(working copy)
@@ -789,6 +789,8 @@ static const struct default_options ix86_option_op
   {
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+/* Enable function splitting at -O2 and higher.  */
+{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 },
 /* Turn off -fschedule-insns by default.  It tends to make the
problem with not enough registers even worse.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
Index: opts.c
===
--- opts.c  (revision 205001)
+++ opts.c  (working copy)
@@ -737,9 +737,10 @@ finish_options (struct gcc_options *opts, struct g
   && opts->x_flag_reorder_blocks_and_partition
   && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
 {
-  inform (loc,
- "-freorder-blocks-and-partition does not work "
- "with exceptions on this architecture");
+  if (opts_set->x_flag_reorder_blocks_and_partition)
+inform (loc,
+"-freorder-blocks-and-partition does not work "
+"with exceptions on this architecture");
   opts->x_flag_reorder_blocks_and_partition = 0;
   opts->x_flag_reorder_blocks = 1;
 }
@@ -752,9 +753,10 @@ finish_options (struct gcc_options *opts, struct g
   && opts->x_flag_reorder_blocks_and_partition
   && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
 {
-  inform (loc,
- "-freorder-blocks-and-partition does not support "
- "unwind info on this architecture");
+  if (opts_set->x_flag_reorder_blocks_and_partition)
+inform (loc,
+"-freorder-blocks-and-partition does not support "
+"unwind info on this architecture");
   opts->x_flag_reorder_blocks_and_partition = 0;
   opts->x_flag_reorder_blocks = 1;
 }
@@ -769,9 +771,10 @@ finish_options (struct gcc_options *opts, struct g
  && targetm_common.unwind_tables_default
  && (ui_except == UI_SJLJ || ui_except >= UI_TARGET
 {
-  inform (loc,
- "-freorder-blocks-and-partition does not work "
- "on this architecture");
+  if (opts_set->x_flag_reorder_blocks_and_partition)
+inform (loc,
+"-freorder-blocks-and-partition does not work "
+"on this architecture");
   opts->x_flag_reorder_blocks_and_partition = 0;
   opts->x_flag_reorder_blocks = 1;
 }


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-11-19 Thread Teresa Johnson
On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka  wrote:
> Martin,
> can you, please, generate the updated systemtap with
> -freorder-blocks-and-partition enabled?
>
> I am in favour of enabling this - it is usefull pass and it is pointless ot
> have passes that are not enabled by default.
> Is there reason why this would not work on other ELF target? Is it working
> with Darwin and Windows?

I don't know how to test these (I don't see any machines listed in the
gcc compile farm of those types). For Windows, I assume you mean
MinGW, which should be enabled as it is under i386. Should I disable
it there and for Darwin?

>
>> This patch enables -freorder-blocks-and-partition by default for x86
>> at -O2 and up. It is showing some modest gains in cpu2006 performance
>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
>
> This actually sounds very good ;)
>
> Lets see how the systemtap graphs goes.  If we will end up with problem
> of too many accesses to cold section, I would suggest making cold section
> subdivided into .unlikely and .unlikely.part (we could have better name)
> with the second consisting only of unlikely parts of hot&normal functions.
>
> This should reduce the problems we are seeing with mistakely identifying
> code to be cold because of roundoff errors (and it probably makes sense
> in general, too).
> We will however need to update gold and ld for that.

Note that I don't think this would help much unless the linker is
changed to move the cold split section close to the hot section. There
is probably some fine-tuning we could do eventually in the linker
under -ffunction-sections without putting the split portions in a
separate section. I.e. clump the split parts together within unlikely.
But hopefully this can all be done later on as follow-on work to boost
the performance further.

>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
>> configured with --enable-languages=all,obj-c++ and tested for both
>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
>>
>> It would be good to enable this for additional targets as a follow on,
>> but it needs more testing for both correctness and performance on those
>> other targets (i.e for correctness because I see a number of places
>> in other config/*/*.c files that do some special handling under this
>> option for different targets or simply disable it, so I am not sure
>> how well-tested it is under different architectural constraints).
>>
>> Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-19  Teresa Johnson  
>>
>> * common/config/i386/i386-common.c: Enable
>> -freorder-blocks-and-partition at -O2 and up for x86.
>> * opts.c (finish_options): Only warn if -freorder-blocks-and-
>> partition was set on command line.
>
> You probably mis doc/invoke.texi update.
> Thank you for working on this!

Yes, thanks. Here is the patch with the invoke.texi update.

Teresa


2013-11-19  Teresa Johnson  

* common/config/i386/i386-common.c: Enable
-freorder-blocks-and-partition at -O2 and up for x86.
* doc/invoke.texi: Update -freorder-blocks-and-partition default.
* opts.c (finish_options): Only warn if -freorder-blocks-and-
partition was set on command line.

Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 205001)
+++ common/config/i386/i386-common.c(working copy)
@@ -789,6 +789,8 @@ static const struct default_options ix86_option_op
   {
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+/* Enable function splitting at -O2 and higher.  */
+{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 },
 /* Turn off -fschedule-insns by default.  It tends to make the
problem with not enough registers even worse.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 205001)
+++ doc/invoke.texi (working copy)
@@ -8172,6 +8172,8 @@ exception handling, for linkonce sections, for fun
 section attribute and on any architecture that does not support named
 sections.

+Enabled for x86 at levels @option{-O2}, @option{-O3}.
+
 @item -freorder-functions
 @opindex freorder-functions
 Reorder

Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-11-19 Thread Teresa Johnson
On Tue, Nov 19, 2013 at 1:00 PM, Andi Kleen  wrote:
> Teresa Johnson  writes:
>
>> This patch enables -freorder-blocks-and-partition by default for x86
>> at -O2 and up. It is showing some modest gains in cpu2006 performance
>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
>
> One thing that worries me is what this will do to profilers.
>
> I had to hack some assembler code using out of line sections
> to able to handle the libunwind based perf dwarf unwinder.
>
> My understanding is that these out of line sections can be
> only described properly in dwarf3, and there's still some
> dwarf2 based unwinder code around.
>
> So this may cause problems with profiling and debugging.
>
> It's probably still a good idea, just may need some extra
> care.
>
> -Andi

Sri has approval for a patch that should address this by giving the
split cold sections a label. It should go in today as well:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02143.html

Thanks,
Teresa
>
> --
> a...@linux.intel.com -- Speaking for myself only



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-11-19 Thread Teresa Johnson
On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law  wrote:
> On 11/19/13 10:24, Teresa Johnson wrote:
>>
>> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka  wrote:
>>>
>>> Martin,
>>> can you, please, generate the updated systemtap with
>>> -freorder-blocks-and-partition enabled?
>>>
>>> I am in favour of enabling this - it is usefull pass and it is pointless
>>> ot
>>> have passes that are not enabled by default.
>>> Is there reason why this would not work on other ELF target? Is it
>>> working
>>> with Darwin and Windows?
>>
>>
>> I don't know how to test these (I don't see any machines listed in the
>> gcc compile farm of those types). For Windows, I assume you mean
>> MinGW, which should be enabled as it is under i386. Should I disable
>> it there and for Darwin?
>>
>>>
>>>> This patch enables -freorder-blocks-and-partition by default for x86
>>>> at -O2 and up. It is showing some modest gains in cpu2006 performance
>>>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
>>>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
>>>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
>>>
>>>
>>> This actually sounds very good ;)
>>>
>>> Lets see how the systemtap graphs goes.  If we will end up with problem
>>> of too many accesses to cold section, I would suggest making cold section
>>> subdivided into .unlikely and .unlikely.part (we could have better name)
>>> with the second consisting only of unlikely parts of hot&normal
>>> functions.
>>>
>>> This should reduce the problems we are seeing with mistakely identifying
>>> code to be cold because of roundoff errors (and it probably makes sense
>>> in general, too).
>>> We will however need to update gold and ld for that.
>>
>>
>> Note that I don't think this would help much unless the linker is
>> changed to move the cold split section close to the hot section. There
>> is probably some fine-tuning we could do eventually in the linker
>> under -ffunction-sections without putting the split portions in a
>> separate section. I.e. clump the split parts together within unlikely.
>> But hopefully this can all be done later on as follow-on work to boost
>> the performance further.
>>
>>>>
>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
>>>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
>>>> configured with --enable-languages=all,obj-c++ and tested for both
>>>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
>>>>
>>>> It would be good to enable this for additional targets as a follow on,
>>>> but it needs more testing for both correctness and performance on those
>>>> other targets (i.e for correctness because I see a number of places
>>>> in other config/*/*.c files that do some special handling under this
>>>> option for different targets or simply disable it, so I am not sure
>>>> how well-tested it is under different architectural constraints).
>>>>
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 2013-11-19  Teresa Johnson  
>>>>
>>>>  * common/config/i386/i386-common.c: Enable
>>>>  -freorder-blocks-and-partition at -O2 and up for x86.
>>>>  * opts.c (finish_options): Only warn if -freorder-blocks-and-
>>>>  partition was set on command line.
>>>
>>>
>>> You probably mis doc/invoke.texi update.
>>> Thank you for working on this!
>>
>>
>> Yes, thanks. Here is the patch with the invoke.texi update.
>>
>> Teresa
>>
>>
>> 2013-11-19  Teresa Johnson  
>>
>>  * common/config/i386/i386-common.c: Enable
>>  -freorder-blocks-and-partition at -O2 and up for x86.
>>  * doc/invoke.texi: Update -freorder-blocks-and-partition default.
>>  * opts.c (finish_options): Only warn if -freorder-blocks-and-
>>  partition was set on command line.
>
> This is fine.  Glad to see we're getting some good benefits from this code.

Ok, thanks. It is in as r205058.

>
> FWIW, I would expect the PA to show benefits from this work -- HP showed
> good results for block positioning and procedure splitting eons ago.  A
> secondary effect you would expect to see would be an increase in the number
> of long branch stubs (static count), but a decrease in the number of those
> stubs that actually execute.  Measuring those effects would obviously be
> out-of-scope.
>
> I totally understand that the PA is irrelevant these days, but seeing good
> results on another architecture would go a long way to answering the
> question "should this be enabled by default across the board?"

Yep, we saw benefits for IPF on hp-ux as well. It would be great to
eventually enable this across the board and only selectively disable.
I know there were people interested in trying this with ARM, that
would be a good relevant arch to try this with now to see if it can be
enabled more widely.

Teresa

>
> jeff
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR target/59233 (ICE on *-apple-darwin* with -m32 exposed by -freorder-blocks-and-partition)

2013-11-21 Thread Teresa Johnson
On Thu, Nov 21, 2013 at 3:31 PM, Steven Bosscher  wrote:
> On Thu, Nov 21, 2013 at 8:57 PM, Teresa Johnson wrote:
>> There are two problems I am fixing here (see
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59233 for full analysis).
>>
>> The first is the original ICE in crossjump optimization, which was
>> exposed by enabling -freorder-blocks-and-partition which was
>> conservatively preventing some upstream block combining optimizations.
>> The issue is that the block ended in a NOTE_INSN_DELETED, which
>> old_insns_match_p could not handle. I am fixing this by passing
>> old_insns_match_p the last instruction in the block that it knows how
>> to handle.
>>
>> The second issue this exposed was that we were unnecessarily marking
>> landing pad branches EDGE_PRESERVE since
>> flag_reorder_blocks_and_partition was on, even though this was -Os and
>> we will gate partitioning off.
>
> So we keep an edge to a landing pad... Why is this a problem?

It just seems silly to limit optimization unnecessarily when we can
deduce that we won't actually attempt partitioning.

>
>
>> * bb-reorder.c (do_partition_blocks): New function.
>> (gate_handle_partition_blocks): Call do_partition_blocks.
>> * bb-reorder.h (do_partition_blocks): Declare.
>> * except.c (dw2_build_landing_pads): Call do_partition_blocks.
>
> Exporting this gate function from bb-reorder.c shouldn't be necessary.
> Better fix this at the root, in except.c.

How would we do that? I didn't want to clone the logic that determines
whether we will partition, so I thought outlining the checks from the
gate function and exporting them would be cleaner. But I am open to
suggestions.

>
>
>> * cfgcleanup.c (outgoing_edges_match): Walk up past note instructions
>>     not understood by old_insns_match_p.
>
> This part is OK.

Ok, let me get this part in first, since it will address the ICE.

Thanks,
Teresa

>
> Ciao!
> Steven



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Fix PR target/59233 (ICE on *-apple-darwin* with -m32 exposed by -freorder-blocks-and-partition)

2013-11-21 Thread Teresa Johnson
There are two problems I am fixing here (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59233 for full analysis).

The first is the original ICE in crossjump optimization, which was
exposed by enabling -freorder-blocks-and-partition which was
conservatively preventing some upstream block combining optimizations.
The issue is that the block ended in a NOTE_INSN_DELETED, which
old_insns_match_p could not handle. I am fixing this by passing
old_insns_match_p the last instruction in the block that it knows how
to handle.

The second issue this exposed was that we were unnecessarily marking
landing pad branches EDGE_PRESERVE since
flag_reorder_blocks_and_partition was on, even though this was -Os and
we will gate partitioning off. Added a method for checking whether we
will really invoke partitioning.

Fixes the original problem (confirmed via a cross-compiler for
x86_64-apple-darwin10) and was bootstrapped and tested on
x86-64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2013-11-21  Teresa Johnson  

* bb-reorder.c (do_partition_blocks): New function.
(gate_handle_partition_blocks): Call do_partition_blocks.
* bb-reorder.h (do_partition_blocks): Declare.
* cfgcleanup.c (outgoing_edges_match): Walk up past note instructions
not understood by old_insns_match_p.
* except.c (dw2_build_landing_pads): Call do_partition_blocks.

Index: bb-reorder.c
===
--- bb-reorder.c(revision 205063)
+++ bb-reorder.c(working copy)
@@ -2532,8 +2532,13 @@ make_pass_duplicate_computed_gotos (gcc::context *
   return new pass_duplicate_computed_gotos (ctxt);
 }

-static bool
-gate_handle_partition_blocks (void)
+/* This function will return true if hot/cold partitioning will
+   be performed, and is declared non-static so that other passes
+   (namely landing pad expansion in except.c) can more accurately
+   determine whether they need to assume that partitioning will occur.  */
+
+bool
+do_partition_blocks (void)
 {
   /* The optimization to partition hot/cold basic blocks into separate
  sections of the .o file does not work well with linkonce or with
@@ -2548,6 +2553,12 @@ make_pass_duplicate_computed_gotos (gcc::context *
  && !user_defined_section_attribute);
 }

+static bool
+gate_handle_partition_blocks (void)
+{
+  return do_partition_blocks ();
+}
+
 /* This function is the main 'entrance' for the optimization that
partitions hot and cold basic blocks into separate sections of the
.o file (to improve performance and cache locality).  Ideally it
Index: bb-reorder.h
===
--- bb-reorder.h(revision 205063)
+++ bb-reorder.h(working copy)
@@ -37,4 +37,5 @@ extern int get_uncond_jump_length (void);

 extern void insert_section_boundary_note (void);

+extern bool do_partition_blocks (void);
 #endif
Index: cfgcleanup.c
===
--- cfgcleanup.c(revision 205063)
+++ cfgcleanup.c(working copy)
@@ -1743,12 +1743,20 @@ outgoing_edges_match (int mode, basic_block bb1, b
}
 }

+  /* Find the last non-debug non-note instruction in each bb, except
+ stop when we see the NOTE_INSN_BASIC_BLOCK, as old_insns_match_p
+ handles that case specially. old_insns_match_p does not handle
+ other types of instruction notes.  */
   rtx last1 = BB_END (bb1);
   rtx last2 = BB_END (bb2);
-  if (DEBUG_INSN_P (last1))
-last1 = prev_nondebug_insn (last1);
-  if (DEBUG_INSN_P (last2))
-last2 = prev_nondebug_insn (last2);
+  while (!NOTE_INSN_BASIC_BLOCK_P (last1) &&
+ (DEBUG_INSN_P (last1) || NOTE_P (last1)))
+last1 = PREV_INSN (last1);
+  while (!NOTE_INSN_BASIC_BLOCK_P (last2) &&
+ (DEBUG_INSN_P (last2) || NOTE_P (last2)))
+last2 = PREV_INSN (last2);
+  gcc_assert (last1 && last2);
+
   /* First ensure that the instructions match.  There may be many outgoing
  edges so this test is generally cheaper.  */
   if (old_insns_match_p (mode, last1, last2) != dir_both)
Index: except.c
===
--- except.c(revision 205063)
+++ except.c(working copy)
@@ -125,6 +125,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "except.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "output.h"
 #include "dwarf2asm.h"
 #include "dwarf2out.h"
@@ -1014,7 +1015,7 @@ dw2_build_landing_pads (void)
  new landing pads later, which means that we need to hold on to
  the post-landing-pad block.  Prevent it from being merged away.
  We'll remove this bit after partitioning.  */
-  if (flag_reorder_blocks_and_partition)
+  if (do_partition_blocks ())
     e_flags |= EDGE_P

[Google] Increase max-lipo-mem default to 4000000

2013-11-24 Thread Teresa Johnson
The current default is 300, but increasing to 4M gives better
performance on internal applications and benchmarks. Passes regression
testing. Ok for google/4_8?

Thanks,
Teresa

2013-11-24  Teresa Johnson  

* params.def (PARAM_MAX_LIPO_MEMORY): Increase default to 4M.

Index: params.def
===
--- params.def  (revision 205234)
+++ params.def  (working copy)
@@ -972,7 +972,7 @@ DEFPARAM (PARAM_MAX_LIPO_GROUP,
 DEFPARAM (PARAM_MAX_LIPO_MEMORY,
  "max-lipo-mem",
  "don't import aux files if memory consumption exceeds this value",
- 300, 0, 0)
+ 400, 0, 0)

 /* In LIPO profile-gen, use this parameter to record the cutoff value used at
profile collection runtime.  */

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Handle missing BINFO for LIPO

2014-10-07 Thread Teresa Johnson
We may have missing BINFO on a type if that type is a builtin, since
in LIPO mode we will reset builtin types to their original tree nodes
before parsing subsequent modules. Handle incomplete information by
returning false so we won't put an entry in the type inheritance graph
for optimization.

Passes regression tests. Ok for google branches?

Teresa

2014-10-07  Teresa Johnson  

Google ref b/16511102.
* ipa-devirt.c (polymorphic_type_binfo_p): Handle missing BINFO.

Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 215830)
+++ ipa-devirt.c(working copy)
@@ -177,7 +177,10 @@ static inline bool
 polymorphic_type_binfo_p (tree binfo)
 {
   /* See if BINFO's type has an virtual table associtated with it.  */
-  return BINFO_VTABLE (TYPE_BINFO (BINFO_TYPE (binfo)));
+  tree type_binfo = TYPE_BINFO (BINFO_TYPE (binfo));
+  if (!type_binfo)
+return false;
+  return BINFO_VTABLE (type_binfo);
 }

 /* One Definition Rule hashtable helpers.  */


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Fix PR bootstrap/63432 in jump threading

2014-10-07 Thread Teresa Johnson
This patch addresses PR bootstrap/63432 which was an insanity in the
probabilities created during jump threading. This was caused by threading
multiple times along the same path leading to the second jump thread path being
corrupted, which in turn caused the profile update code to fail. There
was code in mark_threaded_blocks that intended to catch and suppress
these cases of threading multiple times along the same path, but it
was sensitive to the order in which the paths were discovered and recorded.
This patch changes the detection to do two passes and removes the ordering
sensitivity.

Also, while fixing this I realized that the previous method of checking
the entry BB's profile count was not an accurate way to determine whether
the function has any non-zero profile counts. Created a new routine
to walk the path and see if it has all zero profile counts and estimated
frequencies.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also did an LTO
profiledbootstrap.

Ok for trunk?

Thanks,
Teresa

2014-10-07  Teresa Johnson  

PR bootstrap/63432.
* tree-ssa-threadupdate.c (estimated_freqs_path): New function.
(ssa_fix_duplicate_block_edges): Invoke it.
(mark_threaded_blocks): Make two passes to avoid ordering dependences.

Index: tree-ssa-threadupdate.c
===
--- tree-ssa-threadupdate.c (revision 215830)
+++ tree-ssa-threadupdate.c (working copy)
@@ -959,6 +959,43 @@ update_joiner_offpath_counts (edge epath, basic_bl
 }


+/* Check if the paths through RD all have estimated frequencies but zero
+   profile counts.  This is more accurate than checking the entry block
+   for a zero profile count, since profile insanities sometimes creep in.  */
+
+static bool
+estimated_freqs_path (struct redirection_data *rd)
+{
+  edge e = rd->incoming_edges->e;
+  vec *path = THREAD_PATH (e);
+  edge ein;
+  edge_iterator ei;
+  bool non_zero_freq = false;
+  FOR_EACH_EDGE (ein, ei, e->dest->preds)
+{
+  if (ein->count)
+return false;
+  non_zero_freq |= ein->src->frequency != 0;
+}
+
+  for (unsigned int i = 1; i < path->length (); i++)
+{
+  edge epath = (*path)[i]->e;
+  if (epath->src->count)
+return false;
+  non_zero_freq |= epath->src->frequency != 0;
+  edge esucc;
+  FOR_EACH_EDGE (esucc, ei, epath->src->succs)
+{
+  if (esucc->count)
+return false;
+  non_zero_freq |= esucc->src->frequency != 0;
+}
+}
+  return non_zero_freq;
+}
+
+
 /* Invoked for routines that have guessed frequencies and no profile
counts to record the block and edge frequencies for paths through RD
in the profile count fields of those blocks and edges.  This is because
@@ -1058,9 +1095,11 @@ ssa_fix_duplicate_block_edges (struct redirection_
  data we first take a snapshot of the existing block and edge frequencies
  by copying them into the empty profile count fields.  These counts are
  then used to do the incremental updates, and cleared at the end of this
- routine.  */
+ routine.  If the function is marked as having a profile, we still check
+ to see if the paths through RD are using estimated frequencies because
+ the routine had zero profile counts.  */
   bool do_freqs_to_counts = (profile_status_for_fn (cfun) != PROFILE_READ
- || !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
+ || estimated_freqs_path (rd));
   if (do_freqs_to_counts)
 freqs_to_counts_path (rd);

@@ -2077,35 +2116,52 @@ mark_threaded_blocks (bitmap threaded_blocks)

   /* Now iterate again, converting cases where we want to thread
  through a joiner block, but only if no other edge on the path
- already has a jump thread attached to it.  */
+ already has a jump thread attached to it.  We do this in two passes,
+ to avoid situations where the order in the paths vec can hide overlapping
+ threads (the path is recorded on the incoming edge, so we would miss
+ cases where the second path starts at a downstream edge on the same
+ path).  First record all joiner paths, deleting any in the unexpected
+ case where there is already a path for that incoming edge.  */
   for (i = 0; i < paths.length (); i++)
 {
   vec *path = paths[i];

   if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
+{
+ /* Attach the path to the starting edge if none is yet recorded.  */
+  if ((*path)[0]->e->aux == NULL)
+(*path)[0]->e->aux = path;
+ else if (dump_file && (dump_flags & TDF_DETAILS))
+   dump_jump_thread_path (dump_file, *path, false);
+}
+}
+  /* Second, look for paths that have any other jump thread attached to
+ them, and either finish converting them or cancel them.  */
+  for (i =

[Google] Put time profiling under param, disable by default

2014-10-12 Thread Teresa Johnson
Guard the new time profiling under a parameter, off by default for Google
since it isn't used and the extra instrumentation was adding bloat.

Enabled time profiling for two test cases that explicitly test it.

For func_reorder_gold_plugin_1.C the change in instrumentation had
an effect on the ordering of the callgraph group in the linker plugin
dump. The test case already appeared to try to catch different orders,
but missed the new one, so I added a leading ".*" to catch it (already
had a trailing ".*").

old: # Callgraph group : main _Z3barv _Z3foov
new: # Callgraph group : _Z3foov _Z3barv main

Passes regression tests. Ok for google/4_9?

2014-10-12  Teresa Johnson  

Google ref b/17945455.
gcc/
* params.def (PARAM_PROFILE_VALUES_TIME): New parameter.
* value-prof.c (gimple_find_values_to_profile): Use new param.

gcc/testsuite:
* gcc.dg/tree-prof/time-profiler-1.c: Use new parameter.
* gcc.dg/tree-prof/time-profiler-2.c: Ditto.
* g++.dg/tree-prof/func_reorder_gold_plugin_1.C: Fix dumper scan.

Index: params.def
===
--- params.def  (revision 215830)
+++ params.def  (working copy)
@@ -1101,6 +1101,11 @@ DEFPARAM (PARAM_PROFILE_GENERATE_SAMPLING_PERIOD,
  "sampling rate with -fprofile-generate-sampling",
  100, 0, 20)

+DEFPARAM (PARAM_PROFILE_VALUES_TIME,
+ "profile-values-time",
+ "Enable time profiling when value profiling",
+ 0, 0, 1)
+
 DEFPARAM (PARAM_COVERAGE_CALLBACK,
  "coverage-callback",
  "callback a user-define function when for arc counter increments.",
Index: value-prof.c
===
--- value-prof.c(revision 215830)
+++ value-prof.c(working copy)
@@ -2355,7 +2355,10 @@ gimple_find_values_to_profile (histogram_values *v
 for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
   gimple_values_to_profile (gsi_stmt (gsi), values);

-  values->safe_push (gimple_alloc_histogram_value (cfun,
HIST_TYPE_TIME_PROFILE, 0, 0));
+  if (PARAM_VALUE (PARAM_PROFILE_VALUES_TIME))
+values->safe_push (gimple_alloc_histogram_value (cfun,
+ HIST_TYPE_TIME_PROFILE,
+ 0, 0));

   FOR_EACH_VEC_ELT (*values, i, hist)
 {
Index: testsuite/gcc.dg/tree-prof/time-profiler-1.c
===
--- testsuite/gcc.dg/tree-prof/time-profiler-1.c(revision 215830)
+++ testsuite/gcc.dg/tree-prof/time-profiler-1.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-ipa-profile" } */
+/* { dg-options "-O2 -fdump-ipa-profile --param=profile-values-time=1" } */

 __attribute__ ((noinline))
 int foo()
Index: testsuite/gcc.dg/tree-prof/time-profiler-2.c
===
--- testsuite/gcc.dg/tree-prof/time-profiler-2.c(revision 215830)
+++ testsuite/gcc.dg/tree-prof/time-profiler-2.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-ipa-profile" } */
+/* { dg-options "-O2 -fdump-ipa-profile --param=profile-values-time=1" } */

 #include 

Index: testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C
===
--- testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C
(revision 215830)
+++ testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_1.C (working copy)
@@ -38,7 +38,7 @@ int main ()
 /* { dg-final-use { scan-assembler "\.string \"Weight 1000 1000\"" } }  */
 /* { dg-final-use { scan-assembler "\.string \"Weight 1001 1001\"" } }  */
 /* Check if main is next to foo or bar  */
-/* { dg-final-use { scan-file linker.dump "Callgraph group :
*\(_Z3foov main|main _Z3foov|_Z3barv main|main _Z3barv\).*\n" } }  */
+/* { dg-final-use { scan-file linker.dump "Callgraph group :
*\.*(_Z3foov main|main _Z3foov|_Z3barv main|main _Z3barv\).*\n" } }
*/
 /* { dg-final-use { scan-file linker.dump ".text\..*\._Z9notcalledv
entry count = 0 computed = 0 max count = 0" } }  */
 /* { dg-final-use { scan-file linker.dump "Moving .* section\\(s\\)
to new segment" } }  */
 /* { dg-final-use { cleanup-saved-temps } }  */

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Better tolerance of incoming profile insanities in jump threading

2014-10-14 Thread Teresa Johnson
The below patch fixes the overflow detection when recomputing
probabilities after jump threading, in case of incoming profile
insanities. It detects more cases where the computation will overflow
not only the max probability but the max int and possibly wrap around.

LTO profilebootstrapped and tested on x86_64-unknown-linux-gnu.

Ok for trunk?

Thanks,
Teresa


2014-10-14  Teresa Johnson  

PR bootstrap/63432
* tree-ssa-threadupdate.c (recompute_probabilities): Better
overflow checking.

Index: tree-ssa-threadupdate.c
===
--- tree-ssa-threadupdate.c (revision 216150)
+++ tree-ssa-threadupdate.c (working copy)
@@ -871,21 +871,23 @@ recompute_probabilities (basic_block bb)
   edge_iterator ei;
   FOR_EACH_EDGE (esucc, ei, bb->succs)
 {
-  if (bb->count)
+  if (!bb->count)
+continue;
+
+  /* Prevent overflow computation due to insane profiles.  */
+  if (esucc->count < bb->count)
 esucc->probability = GCOV_COMPUTE_SCALE (esucc->count,
  bb->count);
-  if (esucc->probability > REG_BR_PROB_BASE)
-{
- /* Can happen with missing/guessed probabilities, since we
-may determine that more is flowing along duplicated
-path than joiner succ probabilities allowed.
-Counts and freqs will be insane after jump threading,
-at least make sure probability is sane or we will
-get a flow verification error.
-Not much we can do to make counts/freqs sane without
-redoing the profile estimation.  */
- esucc->probability = REG_BR_PROB_BASE;
-   }
+  else
+/* Can happen with missing/guessed probabilities, since we
+   may determine that more is flowing along duplicated
+   path than joiner succ probabilities allowed.
+   Counts and freqs will be insane after jump threading,
+   at least make sure probability is sane or we will
+   get a flow verification error.
+   Not much we can do to make counts/freqs sane without
+   redoing the profile estimation.  */
+esucc->probability = REG_BR_PROB_BASE;
 }
 }

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Treat artificial from aux modules as non-external

2014-10-16 Thread Teresa Johnson
The following patch fixes a LIPO issue with a virtual implicitly
instantiated destructor from an aux module that was needed when we
devirtualized a call from the primary module, but it was not available
(not promoted to global) in aux module since it is DECL_ARTIFICIAL.

Fixes compilation issue, passes regression tests. Ok for google/4_9?

Thanks,
Teresa

2014-10-16  Teresa Johnson  

Google ref b/17971995.
* l-ipo.c (cgraph_is_aux_decl_external): Artificial functions may
not be available in primary module.

Index: l-ipo.c
===
--- l-ipo.c (revision 216286)
+++ l-ipo.c (working copy)
@@ -1140,8 +1140,11 @@ cgraph_is_aux_decl_external (struct cgraph_node *n

   /* Comdat or weak functions in aux modules are not external --
  there is no guarantee that the definitition will be emitted
- in the primary compilation of this auxiliary module.  */
-  if (DECL_COMDAT (decl) || DECL_WEAK (decl))
+ in the primary compilation of this auxiliary module.
+ Functions marked artificial (e.g. an implicitly instantiated virtual
+ destructor) are also not guaranteed to be available in the primary module,
+ as they are not promoted by process_module_scope_static_func.  */
+  if (DECL_COMDAT (decl) || DECL_WEAK (decl) || DECL_ARTIFICIAL (decl))
 return false;

   /* virtual functions won't be deleted in the primary module.  */

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Teresa Johnson
Disabling devirtualization reduces code size, both for instrumentation (because
many more virtual functions are kept longer and therefore instrumented) and for
normal optimization.

Patch attached. Passes internal testing and regression tests. Ok for google/4_9?

Thanks,
Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Disabling devirtualization reduces code size, both for instrumentation (because
many more virtual functions are kept longer and therefore instrumented) and for
normal optimization.

Passes internal testing and regression tests. Ok for google/4_9?

2014-10-18  Teresa Johnson  

gcc/:
Google ref b/17945455
* opts.c (finish_options): Disable -fdevirtualize

gcc/testsuite/:
Google ref b/17945455
* testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test.
* testsuite/g++.dg/ipa/devirt-11.C: Ditto.
* testsuite/g++.dg/ipa/devirt-12.C: Ditto.
* testsuite/g++.dg/ipa/devirt-13.C: Ditto.
* testsuite/g++.dg/ipa/devirt-14.C: Ditto.
* testsuite/g++.dg/ipa/devirt-15.C: Ditto.
* testsuite/g++.dg/ipa/devirt-16.C: Ditto.
* testsuite/g++.dg/ipa/devirt-17.C: Ditto.
* testsuite/g++.dg/ipa/devirt-18.C: Ditto.
* testsuite/g++.dg/ipa/devirt-19.C: Ditto.
* testsuite/g++.dg/ipa/devirt-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-20.C: Ditto.
* testsuite/g++.dg/ipa/devirt-21.C: Ditto.
* testsuite/g++.dg/ipa/devirt-22.C: Ditto.
* testsuite/g++.dg/ipa/devirt-23.C: Ditto.
* testsuite/g++.dg/ipa/devirt-24.C: Ditto.
* testsuite/g++.dg/ipa/devirt-25.C: Ditto.
* testsuite/g++.dg/ipa/devirt-26.C: Ditto.
* testsuite/g++.dg/ipa/devirt-27.C: Ditto.
* testsuite/g++.dg/ipa/devirt-28.C: Ditto.
* testsuite/g++.dg/ipa/devirt-29.C: Ditto.
* testsuite/g++.dg/ipa/devirt-2.C: Ditto.
* testsuite/g++.dg/ipa/devirt-30.C: Ditto.
* testsuite/g++.dg/ipa/devirt-31.C: Ditto.
* testsuite/g++.dg/ipa/devirt-39.C: Ditto.
* testsuite/g++.dg/ipa/devirt-3.C: Ditto.
* testsuite/g++.dg/ipa/devirt-4.C: Ditto.
* testsuite/g++.dg/ipa/devirt-5.C: Ditto.
* testsuite/g++.dg/ipa/devirt-6.C: Ditto.
* testsuite/g++.dg/ipa/devirt-7.C: Ditto.
* testsuite/g++.dg/ipa/devirt-9.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-2.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-3.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-4.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-5.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-6.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-7.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-8.C: Ditto.
* testsuite/g++.dg/ipa/devirt-d-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-g-1.C: Ditto.
* testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto.
* testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-1.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-2.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-3.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-4.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-5.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-7.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-8.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-9.C: Ditto.
* testsuite/g++.dg/ipa/pr60600.C: Ditto.
* testsuite/g++.dg/ipa/pr60640-4.C: Ditto.
* testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto.
* testsuite/g++.dg/opt/devirt1.C: Ditto.
* testsuite/g++.dg/opt/devirt2.C: Ditto.
* testsuite/g++.dg/opt/devirt3.C: Ditto.

Index: opts.c
===
--- opts.c  (revision 216286)
+++ opts.c  (working copy)
@@ -485,7 +485,6 @@ static const struct default_options default_option
 { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
-{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 },
Index: testsuite/g++.dg/ipa/devirt-10.C
===
--- testsuite/g++.dg/ipa/devirt-10.C(revision 216286)
+++ testsuite/g++.dg/ipa/devirt-10.C(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining" } */
+/* { dg-options "-O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining 
-fdevirtualize" } */
 class wxPaintEvent {  };
 struct wxDCBase
 { 
Index: testsuite/g++.dg/ipa/devirt-11.C
===
--- testsuite/g++.dg/ipa/devirt-11.C

[GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

2014-10-18 Thread Teresa Johnson
Increasing the number of early inliner iterations from 1 to 2 enables more
indirect calls to be promoted/inlined before instrumentation. This in turn
reduces the instrumentation overhead, particularly for more expensive indirect
call topn profiling.

Passes internal testing and regression tests. Ok for google/4_9?

2014-10-18  Teresa Johnson  

Google ref b/17934523
* opts.c (finish_options): Increase max-early-inliner-iterations to 2
for profile-gen and profile-use builds.

Index: opts.c
===
--- opts.c  (revision 216286)
+++ opts.c  (working copy)
@@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
 opts->x_param_values, opts_set->x_param_values);
 }

+  if (opts->x_profile_arc_flag
+  || opts->x_flag_branch_probabilities)
+{
+  maybe_set_param_value
+   (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
+opts->x_param_values, opts_set->x_param_values);
+}
+
   if (!(opts->x_flag_auto_profile
 || (opts->x_profile_arc_flag || opts->x_flag_branch_probabilities)))
 {


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

2014-10-18 Thread Teresa Johnson
On Sat, Oct 18, 2014 at 4:26 PM, Jan Hubicka  wrote:
>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka  wrote:
>> >> The difference in instrumentation runtime is huge -- as topn profiler
>> >> is pretty expensive to run.
>> >>
>> >> With FDO, it is probably better to make early inlining more aggressive
>> >> in order to get more context sensitive profiling.
>> >
>> > I agree with that, I just would like to understand where increasing the 
>> > iterations
>> > helps and if we can handle it without iterating (because Richi originally 
>> > requested to
>> > drop the iteration for correcness issues)
>> > Do you have some examples?
>>
>> We can do FDO experiment by shutting down einline. (Note that
>> increasing iteration to 2 did not actually improve performance with
>> our benchmarks).
>
> I would be more interested in case where increasing iteration to 2 actually
> improves train run perfomrance. (einline was originally invented to make
> profiling useable on tramp3d ;)
> It seems to me that the cases handled by iteration are rather rare, so I am
> suprised you get important benefit from these. Perhaps we miss something
> obvious here.

The specific case was actually a call to upper_bound in
bits/stl_algo.h with a specialized compare function. In the more
recent versions of upper_bound, the call to the comparator was
outlined into __upper_bound. With only one iteration of early
inlining, we were inlining __upper_bound into upper_bound and into the
caller. But the indirect call to the comparator was not promoted until
the fre2 pass, so it didn't get early inlined. With 2 iterations of
early inlining, enough optimization is apparently done between
iterations to propagate the actual target and promote the indirect
call after we inline __upper_bound and upper_bound that it is inlined
in the second iteration.

Thanks,
Teresa

>
> Honza
>>
>> David
>>
>> > Honza
>> >>
>> >> David
>> >>
>> >> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka  wrote:
>> >> >> Increasing the number of early inliner iterations from 1 to 2 enables 
>> >> >> more
>> >> >> indirect calls to be promoted/inlined before instrumentation. This in 
>> >> >> turn
>> >> >> reduces the instrumentation overhead, particularly for more expensive 
>> >> >> indirect
>> >> >> call topn profiling.
>> >> >
>> >> > How much difference you get here? One posibility would be also to run 
>> >> > specialized
>> >> > ipa-cp before profile instrumentation.
>> >> >
>> >> > Honza
>> >> >>
>> >> >> Passes internal testing and regression tests. Ok for google/4_9?
>> >> >>
>> >> >> 2014-10-18  Teresa Johnson  
>> >> >>
>> >> >> Google ref b/17934523
>> >> >> * opts.c (finish_options): Increase 
>> >> >> max-early-inliner-iterations to 2
>> >> >> for profile-gen and profile-use builds.
>> >> >>
>> >> >> Index: opts.c
>> >> >> ===
>> >> >> --- opts.c  (revision 216286)
>> >> >> +++ opts.c  (working copy)
>> >> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
>> >> >>  opts->x_param_values, opts_set->x_param_values);
>> >> >>  }
>> >> >>
>> >> >> +  if (opts->x_profile_arc_flag
>> >> >> +  || opts->x_flag_branch_probabilities)
>> >> >> +{
>> >> >> +  maybe_set_param_value
>> >> >> +   (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>> >> >> +opts->x_param_values, opts_set->x_param_values);
>> >> >> +}
>> >> >> +
>> >> >>if (!(opts->x_flag_auto_profile
>> >> >>  || (opts->x_profile_arc_flag || 
>> >> >> opts->x_flag_branch_probabilities)))
>> >> >>  {
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 
>> >> >> 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Teresa Johnson
Hi Honza,

As David says, we will do some more experiments with the change you
suggest and speculative devirtualization, but we needed to make this
change in part to get an internal release out. One of the issues was a
recent change to cp/decl2.c to make virtual function decls needed
under flag_devirtualize.

Thanks!
Teresa

On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka  wrote:
>>
>> The virtual functions will be emitted in some modules, right? If they
>> are hot, they will be included with the auxiliary modules. Note that
>> LIPO indirect call profile will point to the comdat copy that is
>> picked by the linker in the instrumentation build, so it guarantees to
>> exist.
>
> If you have COMDAT virtual inline function that is used by module FOO and
> LIPO's indirect inlining will work out that it goes to comdat copy of module 
> BAR
> (that won the merging).  It will make C++ parser to also parse BAR to get
> the function, but callgarph builder will ignore it, too.
> (this is new logic in analyze_function that with -fno-devirtualize will just
> prevent all virtual functions not directly reachable to appear in symbol 
> table)
>
> I am surprised you hit the size limits with 4.9 only - for quite some time
> we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
> first
> that works harder to drop them early (because I hit the problem with LTO
> where they artifically bloat the size of LTO object files)
>
> Honza
>>
>> David
>>
>>
>> >
>> > Honza
>> >>
>> >> David
>> >>
>> >>
>> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka  wrote:
>> >> >> Disabling devirtualization reduces code size, both for instrumentation 
>> >> >> (because
>> >> >> many more virtual functions are kept longer and therefore 
>> >> >> instrumented) and for
>> >> >> normal optimization.
>> >> >
>> >> > OK, with profile instrumentation (that you seem to try to minimize) i 
>> >> > can see
>> >> > how you get noticeably more counters because virtual functions are kept 
>> >> > longer.
>> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual 
>> >> > functions
>> >> > than earlier compilers).
>> >> >
>> >> > Instead of disabling -fdevirtualize completely (that will get you more 
>> >> > indirect
>> >> > calls and thus more topn profiling) you may consider just hacking
>> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets as
>> >> > reachable. (see the conditional on before_inlining_p).
>> >> >
>> >> > Of course this will get you less devirtualization (but with LTO the 
>> >> > difference
>> >> > should not be big - perhaps I could make switch for that for mainline) 
>> >> > and less
>> >> > accurate profiles when you get speculative devirtualization via topn.
>> >> >
>> >> > I would be very interested to see how much difference this makes.
>> >> >
>> >> > Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Fix LIPO resolved node reference fixup

2014-10-24 Thread Teresa Johnson
This patch makes a fix to the reference fixups performed after LIPO
node resolution, to better handle the case where we are updating the
base address of a reference.

Fixes google benchmark and passes regression tests. Ok for google/4_9?

Thanks,
Teresa

2014-10-24  Teresa Johnson  

Google ref b/18110567.
* cgraphbuild.c (get_base_address_expr): New function.
(fixup_ref): Update the op expression for new base address.

Index: cgraphbuild.c
===
--- cgraphbuild.c   (revision 216667)
+++ cgraphbuild.c   (working copy)
@@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
   pointer_set_destroy (visited_nodes);
 }

+/* Similar to get_base_address but returns the ADDR_EXPR pointing
+   to the base address corresponding to T.  */
+
+static tree
+get_base_address_expr (tree t)
+{
+  while (handled_component_p (t))
+t = TREE_OPERAND (t, 0);
+
+  if ((TREE_CODE (t) == MEM_REF
+   || TREE_CODE (t) == TARGET_MEM_REF)
+  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
+return TREE_OPERAND (t, 0);
+
+  return NULL_TREE;
+}
+
 /* Update any function decl references in base ADDR of operand OP to refer to
the resolved node.  */

 static bool
 fixup_ref (gimple, tree addr, tree op)
 {
+  tree orig_addr = addr;
   addr = get_base_address (addr);
+  /* If the address was changed, update the operand OP to be the
+ ADDR_EXPR pointing to the new base address.  */
+  if (orig_addr != addr)
+op = get_base_address_expr (orig_addr);
   if (addr && TREE_CODE (addr) == FUNCTION_DECL)
 {
   gcc_assert (TREE_CODE (op) == ADDR_EXPR);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fix LIPO resolved node reference fixup

2014-10-24 Thread Teresa Johnson
On Fri, Oct 24, 2014 at 10:55 AM, Xinliang David Li  wrote:
> When orgin_addr == addr, is there a guarantee that this assert:
>
>  gcc_assert (TREE_OPERAND (op,0) == addr);
>
> is always true?

It should be, that is the assumption of the code that we are trying to
enforce with the assert.

Teresa

>
> David
>
>
>
> On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson  wrote:
>> This patch makes a fix to the reference fixups performed after LIPO
>> node resolution, to better handle the case where we are updating the
>> base address of a reference.
>>
>> Fixes google benchmark and passes regression tests. Ok for google/4_9?
>>
>> Thanks,
>> Teresa
>>
>> 2014-10-24  Teresa Johnson  
>>
>> Google ref b/18110567.
>> * cgraphbuild.c (get_base_address_expr): New function.
>> (fixup_ref): Update the op expression for new base address.
>>
>> Index: cgraphbuild.c
>> ===
>> --- cgraphbuild.c   (revision 216667)
>> +++ cgraphbuild.c   (working copy)
>> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
>>pointer_set_destroy (visited_nodes);
>>  }
>>
>> +/* Similar to get_base_address but returns the ADDR_EXPR pointing
>> +   to the base address corresponding to T.  */
>> +
>> +static tree
>> +get_base_address_expr (tree t)
>> +{
>> +  while (handled_component_p (t))
>> +t = TREE_OPERAND (t, 0);
>> +
>> +  if ((TREE_CODE (t) == MEM_REF
>> +   || TREE_CODE (t) == TARGET_MEM_REF)
>> +  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
>> +return TREE_OPERAND (t, 0);
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Update any function decl references in base ADDR of operand OP to refer 
>> to
>> the resolved node.  */
>>
>>  static bool
>>  fixup_ref (gimple, tree addr, tree op)
>>  {
>> +  tree orig_addr = addr;
>>addr = get_base_address (addr);
>> +  /* If the address was changed, update the operand OP to be the
>> + ADDR_EXPR pointing to the new base address.  */
>> +  if (orig_addr != addr)
>> +op = get_base_address_expr (orig_addr);
>>if (addr && TREE_CODE (addr) == FUNCTION_DECL)
>>  {
>>gcc_assert (TREE_CODE (op) == ADDR_EXPR);
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fix LIPO resolved node reference fixup

2014-10-27 Thread Teresa Johnson
Here is the new patch that walks op looking for the reference to addr.

Passes internal benchmarks and regression tests. Ok for google/4_9?

Thanks,
Teresa

2014-10-27  Teresa Johnson  

Google ref b/18110567.
* cgraphbuild.c (fixup_all_refs_1): New function.
(fixup_all_refs): Ditto.
(fixup_ref): Walk tree to find and replace ref.

Index: cgraphbuild.c
===
--- cgraphbuild.c   (revision 216735)
+++ cgraphbuild.c   (working copy)
@@ -665,6 +665,45 @@ record_references_in_initializer (tree decl, bool
   pointer_set_destroy (visited_nodes);
 }

+typedef struct _fixup_decl_info {
+  tree orig_decl;
+  tree new_decl;
+} fixup_decl_info;
+
+/* Check the tree at TP to see if it contains the original decl stored in
+   DATA and if so replace it with the new decl. If original decl is
+   found set WALK_SUBTREES to 0 so the subtree under TP is not traversed.
+   Returns the updated parent tree T or NULL if no update performed.  */
+
+static tree
+fixup_all_refs_1 (tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+  fixup_decl_info *info = (fixup_decl_info *) data;
+
+  /* The original function decl is always the first tree operand.  */
+  if (TREE_OPERAND (t,0) == info->orig_decl)
+{
+  TREE_OPERAND (t,0) = info->new_decl;
+  *walk_subtrees = 0;
+  return t;
+}
+  return NULL_TREE;
+}
+
+/* Walk the whole tree rooted at TP and invoke fixup_all_refs_1 to
+   replace any references to the original decl with the new decl
+   stored in INFO.  */
+
+static inline void
+fixup_all_refs (tree *tp, fixup_decl_info *info)
+{
+  tree t = walk_tree (tp, fixup_all_refs_1, info, NULL);
+  /* This is invoked when we found the original decl, so we expect
+ to have replaced a reference.  */
+  gcc_assert (t != NULL_TREE);
+}
+
 /* Update any function decl references in base ADDR of operand OP to refer to
the resolved node.  */

@@ -674,13 +713,16 @@ fixup_ref (gimple, tree addr, tree op)
   addr = get_base_address (addr);
   if (addr && TREE_CODE (addr) == FUNCTION_DECL)
 {
-  gcc_assert (TREE_CODE (op) == ADDR_EXPR);
-  gcc_assert (TREE_OPERAND (op,0) == addr);
   struct cgraph_node *real_callee;
   real_callee = cgraph_lipo_get_resolved_node (addr);
   if (addr == real_callee->decl)
 return false;
-  TREE_OPERAND (op,0) = real_callee->decl;
+  /* We need to locate and update the tree operand within OP
+ that contains ADDR and update it to the real callee's decl.  */
+  fixup_decl_info info;
+  info.orig_decl = addr;
+  info.new_decl = real_callee->decl;
+  fixup_all_refs (&op, &info);
 }
   return false;
 }


On Fri, Oct 24, 2014 at 11:46 AM, Teresa Johnson  wrote:
> On Fri, Oct 24, 2014 at 10:55 AM, Xinliang David Li  
> wrote:
>> When orgin_addr == addr, is there a guarantee that this assert:
>>
>>  gcc_assert (TREE_OPERAND (op,0) == addr);
>>
>> is always true?
>
> It should be, that is the assumption of the code that we are trying to
> enforce with the assert.
>
> Teresa
>
>>
>> David
>>
>>
>>
>> On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson  
>> wrote:
>>> This patch makes a fix to the reference fixups performed after LIPO
>>> node resolution, to better handle the case where we are updating the
>>> base address of a reference.
>>>
>>> Fixes google benchmark and passes regression tests. Ok for google/4_9?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> 2014-10-24  Teresa Johnson  
>>>
>>> Google ref b/18110567.
>>> * cgraphbuild.c (get_base_address_expr): New function.
>>> (fixup_ref): Update the op expression for new base address.
>>>
>>> Index: cgraphbuild.c
>>> ===
>>> --- cgraphbuild.c   (revision 216667)
>>> +++ cgraphbuild.c   (working copy)
>>> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
>>>pointer_set_destroy (visited_nodes);
>>>  }
>>>
>>> +/* Similar to get_base_address but returns the ADDR_EXPR pointing
>>> +   to the base address corresponding to T.  */
>>> +
>>> +static tree
>>> +get_base_address_expr (tree t)
>>> +{
>>> +  while (handled_component_p (t))
>>> +t = TREE_OPERAND (t, 0);
>>> +
>>> +  if ((TREE_CODE (t) == MEM_REF
>>> +   || TREE_CODE (t) == TARGET_MEM_REF)
>>> +  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
>>> +return TREE_OPERAND (t, 0);
>>> +
>>> +  return NULL_TREE;
>>> 

Re: [PATCH] rebuild frequency after vrp

2014-06-02 Thread Teresa Johnson
On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen  wrote:
> Just tried with Teresa's patch, the ICE in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved.

I will take a look to see why this wasn't fixed by my profile redesign.

Teresa

>
> Dehao
>
> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law  wrote:
>> On 06/02/14 10:17, Dehao Chen wrote:
>>>
>>> We need to rebuild frequency after vrp, otherwise the following code
>>> in tree-ssa-threadupdate.c will make the frequency larger than
>>> upper-bound.
>>>
>>>/* Excessive jump threading may make frequencies large enough
>>> so
>>>   the computation overflows.  */
>>>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>>>  rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>>>
>>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
>>
>> Can you try this with Teresa's revamping of the jump threading frequency
>> updates?  It's still in my queue of things to review.
>>
>> Fixing this stuff in the updater would be better than rebuilding the
>> frequencies, IMHO.
>>
>> Jeff
>>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] rebuild frequency after vrp

2014-06-02 Thread Teresa Johnson
On Mon, Jun 2, 2014 at 11:04 AM, Teresa Johnson  wrote:
> On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen  wrote:
>> Just tried with Teresa's patch, the ICE in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved.
>
> I will take a look to see why this wasn't fixed by my profile redesign.

Turns out this is a case of garbage-in, garbage-out. Jump threading is
producing a block with a frequency >10K, but it is directly due to a
block with an invalid frequency after the upstream ccp2 pass. After
ccp2:

;;   basic block 3, loop depth 1, count 0, freq 9100, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE)
;;pred:   10 [91.0%]  (TRUE_VALUE,EXECUTABLE)
...
;;succ:   5 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;4 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 1, count 0, freq 6825, maybe hot
;;   Invalid sum of incoming frequencies 4550, should be 6825
;;prev block 3, next block 5, flags: (NEW, REACHABLE)
;;pred:   3 [50.0%]  (FALSE_VALUE,EXECUTABLE)
;; , discriminator 4
;;succ:   5 [100.0%]  (FALLTHRU,EXECUTABLE)


Jump threading introduced BB 13, with predecessors 3 and 4 shown above:

;;   basic block 13, loop depth 1, count 0, freq 11375, maybe hot
;;   Invalid sum of outgoing probabilities 40.0%
;;prev block 12, next block 1, flags: (NEW, REACHABLE)
;;pred:   4 [100.0%]  (FALLTHRU,EXECUTABLE)
;;3 [50.0%]  (TRUE_VALUE,EXECUTABLE)

11375 = 9100/2 + 6825

If bb 4 had the correct frequency of 4550 (9100/2), this block would
have gotten the correct count of 9100. BB 13's successor has the
correct frequency of 9100.

Teresa

>
> Teresa
>
>>
>> Dehao
>>
>> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law  wrote:
>>> On 06/02/14 10:17, Dehao Chen wrote:
>>>>
>>>> We need to rebuild frequency after vrp, otherwise the following code
>>>> in tree-ssa-threadupdate.c will make the frequency larger than
>>>> upper-bound.
>>>>
>>>>/* Excessive jump threading may make frequencies large enough
>>>> so
>>>>   the computation overflows.  */
>>>>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>>>>  rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>>>>
>>>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
>>>
>>> Can you try this with Teresa's revamping of the jump threading frequency
>>> updates?  It's still in my queue of things to review.
>>>
>>> Fixing this stuff in the updater would be better than rebuilding the
>>> frequencies, IMHO.
>>>
>>> Jeff
>>>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Fix ICE due to memory corruption in SRA

2014-06-02 Thread Teresa Johnson
This patch fixes an ICE due to memory corruption discovered while building a
large application with FDO and LIPO on the google branch. I don't have a small
reproducer, but the same code appears on trunk, and I believe it could also
silently result in incorrect code generation.

The problem occurs if SRA is applied on a recursive call. In this case,
the redirect_callers vec below contains the recursive edge from node->node.
When rebuild_cgraph_edges is invoked, it will free the callee edges of node,
including the one recorded in redirect_callers. In the case I looked at,
after rebuilding the cgraph edges for node, the address recorded in
redirect_callers now pointed to a different cgraph edge, and we later
got an ICE because the (incorrect) callee that we tried to modify had
the wrong number of arguments.

To fix, I simply moved the collection of caller edges to after the cgraph
edge rebuilding.

Google ref b/15383777.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2014-06-02  Teresa Johnson  

* tree-sra.c (modify_function): Record caller nodes after rebuild.

Index: tree-sra.c
===
--- tree-sra.c  (revision 211139)
+++ tree-sra.c  (working copy)
@@ -4925,12 +4925,15 @@ modify_function (struct cgraph_node *node, ipa_par
 {
   struct cgraph_node *new_node;
   bool cfg_changed;
-  vec redirect_callers = collect_callers_of_node (node);

   rebuild_cgraph_edges ();
   free_dominance_info (CDI_DOMINATORS);
   pop_cfun ();

+  /* This must be done after rebuilding cgraph edges for node above.
+ Otherwise any recursive calls to node that are recorded in
+ redirect_callers will be corrupted.  */
+  vec redirect_callers = collect_callers_of_node (node);
   new_node = cgraph_function_versioning (node, redirect_callers,
 NULL,
 NULL, false, NULL, NULL, "isra");
-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Google/4_8] Fix testsuite/gcc.dg/ipa/ipa-sra-6.c

2014-06-05 Thread Teresa Johnson
I just committed the following patch to google/4_8 as 211279. This
fixes a test failure with my backport of tree-sra fix r211180 from
trunk.

It turns out that the bug I fixed affected this test case, but because
the dumping format is slightly different between google/4_8 and both
google/4_9 and trunk, it didn't show up as a test failure for
either google/4_9 or trunk.

Essentially, after my fix, both in google/4_8 and google/4_9 (and
presumably trunk), I am getting more output in the eipa_sra dump
output. Looks like we in fact were previously not properly updating a
recursive call due to this same issue that I was fixing, although it
didn't manifest as an ICE like in the case I fixed. With the fix, we
now properly update a recursive call being optimized by SRA. The test
case is expecting to see exactly one occurrence of "foo " in the dump
output. In google/4_8, one of the new dump lines matches because it
looks like:

Adjusting call (0 -> 2) foo -> foo.isra.0

In google/4_9 and trunk, the additional dump lines don't match because
the node's order number is being printed after the name:

Adjusting call foo/0 -> foo.isra.0/2


2014-06-05  Teresa Johnson  

* testsuite/gcc.dg/ipa/ipa-sra-6.c: Update to handle
recent tree-sra.c fix.

Index: testsuite/gcc.dg/ipa/ipa-sra-6.c
===
--- testsuite/gcc.dg/ipa/ipa-sra-6.c (revision 210862)
+++ testsuite/gcc.dg/ipa/ipa-sra-6.c (working copy)
@@ -30,5 +30,5 @@ int main (int argc, char *argv[])
   return foo (&cow, 0);
 }

-/* { dg-final { scan-tree-dump-times "foo " 1 "eipa_sra"  } } */
+/* { dg-final { scan-tree-dump-times "foo " 2 "eipa_sra"  } } */
 /* { dg-final { cleanup-tree-dump "eipa_sra" } } */


Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Google/4_8] Reduce memory overhead of LIPO COMDAT fixups

2014-06-05 Thread Teresa Johnson
(cc'ing a few additional people to help with review as David is out
and I'm not sure Rong is available)

This patch greatly reduces the memory overhead of the new COMDAT fixup
analysis, by changing the second level hash tables to linked lists.
I found that almost none of the second level hash tables contained more
than one entry, but each hash table required a lot of memory overhead.

I also now query the fixup type before adding the checksums to the
pointer sets during callgraph building, which would have enabled a workaround
for the memory issue.

Tested with regression tests and internal tests (see ref below for details).

Google ref b/15415042.

2014-06-05  Teresa Johnson  

* dyn-ipa.c (struct lineno_checksum_alias): Replaced pointer set.
(struct checksum_alias_info): Enabled linked list.
(cfg_checksum_get_key): Removed.
(find_cfg_checksum): New function.
(cfg_checksum_insert): Operate on linked list.
(checksum_set_insert): Ditto.
(gcov_build_callgraph): Allow disabling checksum insertion.
(gcov_find_new_ic_target): Operate on linked list.
(gcov_fixup_counters_checksum): Ditto.
(gcov_fixup_counters_lineno): Ditto.
(__gcov_compute_module_groups): Compute fixup type earlier.

Index: dyn-ipa.c
===
--- dyn-ipa.c   (revision 211288)
+++ dyn-ipa.c   (working copy)
@@ -79,16 +79,15 @@ struct dyn_cgraph
   unsigned num_nodes_executed;
   /* used by new algorithm  */
   struct modu_node *modu_nodes;
-  /* Set indexed by lineno_checksum, returns another dyn_pointer_set*,
- indexed by cfg_checksum.  That returns a checksum_alias_info struct.  */
+  /* Set indexed by lineno_checksum, returns a linked list of
+ checksum_alias_info structs.  */
   struct dyn_pointer_set *lineno_pointer_sets;
 };

 /* Struct holding information for functions with the same lineno_checksum.  */
 struct lineno_checksum_alias
 {
-  /* Set indexed by cfg_checksum, holding a checksum_alias_info struct.  */
-  struct dyn_pointer_set *cfg_pointer_set;
+  struct checksum_alias_info *cfg_checksum_list;
   unsigned lineno_checksum;
 };

@@ -96,6 +95,7 @@ struct lineno_checksum_alias
checksums.  */
 struct checksum_alias_info
 {
+  struct checksum_alias_info *next_cfg_checksum;
   struct checksum_alias *alias_list;
   unsigned cfg_checksum;
 };
@@ -205,6 +205,7 @@ pointer_set_create (unsigned (*get_key) (const voi
 static struct dyn_cgraph the_dyn_call_graph;
 static int total_zero_count = 0;
 static int total_insane_count = 0;
+static int fixup_type = 0;

 enum GROUPING_ALGORITHM
 {
@@ -374,14 +375,6 @@ lineno_checksum_get_key (const void *p)
   return ((const struct lineno_checksum_alias *) p)->lineno_checksum;
 }

-/* The cfg_checksum value in P is the key for a cfg_pointer_set.  */
-
-static inline unsigned
-cfg_checksum_get_key (const void *p)
-{
-  return ((const struct checksum_alias_info *) p)->cfg_checksum;
-}
-
 /* Create a new checksum_alias struct for function with GUID, FI_PTR,
and ZERO_COUNTS flag.  Prepends to list NEXT and returns new struct.  */

@@ -398,28 +391,44 @@ new_checksum_alias (gcov_type guid, const struct g
   return alias;
 }

-/* Insert a new checksum_alias struct into pointer set P for function with
+/* Locate the checksum_alias_info in LIST that matches CFG_CHECKSUM.  */
+
+static struct checksum_alias_info *
+find_cfg_checksum (struct checksum_alias_info *list, unsigned cfg_checksum)
+{
+  for (; list; list = list->next_cfg_checksum)
+{
+  if (list->cfg_checksum == cfg_checksum)
+return list;
+}
+  return NULL;
+}
+
+/* Insert a new checksum_alias struct into LIST for function with
CFG_CHECKSUM and associated GUID, FI_PTR, and ZERO_COUNTS flag.  */

-static void
-cfg_checksum_set_insert (struct dyn_pointer_set *p, unsigned cfg_checksum,
- gcov_type guid, const struct gcov_fn_info *fi_ptr,
- int zero_counts)
+static struct checksum_alias_info *
+cfg_checksum_insert (unsigned cfg_checksum, gcov_type guid,
+ const struct gcov_fn_info *fi_ptr, int zero_counts,
+ struct checksum_alias_info *list)
 {
-  struct checksum_alias_info **m = (struct checksum_alias_info **)
-pointer_set_find_or_insert (p, cfg_checksum);
-  if (*m)
+  struct checksum_alias_info *alias_info;
+  alias_info = find_cfg_checksum (list, cfg_checksum);
+  if (alias_info)
 {
-  gcc_assert ((*m)->alias_list);
-  (*m)->alias_list = new_checksum_alias (guid, fi_ptr, zero_counts,
- (*m)->alias_list);
+  gcc_assert (alias_info->alias_list);
+  alias_info->alias_list = new_checksum_alias (guid, fi_ptr, zero_counts,
+   alias_info->alias_list);
+  return list;
 }
   else
 {
-  *m = XNEW (struct checksu

[Google] Fix AFDO early inline ICEs due to DFE

2014-06-12 Thread Teresa Johnson
These two patches fix multiple ICE that occurred due to DFE being
recently enabled after AutoFDO LIPO linking.

Passes regression and internal testing. Ok for Google/4_8?

Teresa

2014-06-12  Teresa Johnson  
Dehao Chen  

Google ref b/15521327.

* cgraphclones.c (cgraph_clone_edge): Use resolved node.
* l-ipo.c (resolve_cgraph_node): Resolve to non-removable node.

Index: cgraphclones.c
===
--- cgraphclones.c  (revision 211386)
+++ cgraphclones.c  (working copy)
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "lto-streamer.h"
 #include "except.h"
+#include "l-ipo.h"

 /* Create clone of E in the node N represented by CALL_EXPR the callgraph.  */
 struct cgraph_edge *
@@ -118,7 +119,11 @@ cgraph_clone_edge (struct cgraph_edge *e, struct c

   if (call_stmt && (decl = gimple_call_fndecl (call_stmt)))
{
- struct cgraph_node *callee = cgraph_get_node (decl);
+  struct cgraph_node *callee;
+  if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done)
+callee = cgraph_lipo_get_resolved_node (decl);
+  else
+callee = cgraph_get_node (decl);
  gcc_checking_assert (callee);
  new_edge = cgraph_create_edge (n, callee, call_stmt, count, freq);
}
Index: l-ipo.c
===
--- l-ipo.c (revision 211386)
+++ l-ipo.c (working copy)
@@ -1542,6 +1542,18 @@ resolve_cgraph_node (struct cgraph_sym **slot, str
   gcc_assert (decl1_defined);
   add_define_module (*slot, decl2);

+  /* Pick the node that cannot be removed, to avoid a situation
+ where we remove the resolved node and later try to access
+ it for the remaining non-removable copy.  E.g. one may be
+ extern and the other weak, only the extern copy can be removed.  */
+  if (cgraph_can_remove_if_no_direct_calls_and_refs_p ((*slot)->rep_node)
+  && !cgraph_can_remove_if_no_direct_calls_and_refs_p (node))
+{
+  (*slot)->rep_node = node;
+  (*slot)->rep_decl = decl2;
+  return;
+}
+
   has_prof1 = has_profile_info (decl1);
   bool is_aux1 = cgraph_is_auxiliary (decl1);
   bool is_aux2 = cgraph_is_auxiliary (decl2);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Fix -femit-function-names in LIPO profile-use mode

2014-06-24 Thread Teresa Johnson
Emit the proper module name in LIPO profile-use mode.

Passes regression tests, ok for google branches?

Thanks,
Teresa

2014-06-24  Teresa Johnson  

* coverage.c (emit_function_name): Emit module name in LIPO mode.

Index: coverage.c
===
--- coverage.c  (revision 211893)
+++ coverage.c  (working copy)
@@ -1882,7 +1882,9 @@ void
 emit_function_name (void)
 {
   fprintf (stderr, "Module %s FuncId %u Name %s\n",
-   main_input_file_name,
+   (L_IPO_COMP_MODE
+? get_module_name (FUNC_DECL_MODULE_ID (cfun))
+: main_input_file_name),
FUNC_DECL_FUNC_ID (cfun),
IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (current_function_decl)));
 }
-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Add missing -fdump-* options

2014-06-25 Thread Teresa Johnson
On Tue, May 13, 2014 at 8:19 AM, Xinliang David Li  wrote:
> On Tue, May 13, 2014 at 1:39 AM, Richard Biener
>  wrote:
>> On Fri, May 9, 2014 at 5:54 PM, Teresa Johnson  wrote:
>>> I discovered that the support for the documented -fdump-* options
>>> "optimized", "missed", "note" and "optall" was missing. Added that and
>>> fixed a minor typo in the documentation.
>>>
>>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>>
>> I'm not sure they were intented for user-consumption.  ISTR they
>> are just an implementation detail exposed by -fopt-info-X (which is
>> where they are documented).
>>
>> The typo fix is ok, also adding a comment before the dump flags
>> definition to the above fact.
>>
>> David, do I remember correctly?
>
> I remember we talked about content filtering dump flags. Things like
>
> -fdump-xxx-ir  <-- dump IR only
> -fdump-xxx-transformation <-- optimization note
> -fdump-xxx-debug <-- other debug traces
>
> Other than that, now I think 'details' and 'all' seem redundant.
> 'verbose' flag/modifier can achieve the same effect depending on the
> context.
>
> -fdump-xxx-ir-verbose <-- dump IR, and turn on IR modifiers such as
> vops, lineno, etc
> -fdump-xxx-transforamtion-verbose <-- dump transformations + missed
> optimizations + notes
> -fdump-xxx-debug-verbose <-- turn on detailed trace.

The above proposal seems fine to me as a longer-term direction, but
also seems somewhat orthogonal to the issue my patch is trying to
solve in the short term, namely inconsistent documentation and
behavior:

1) "optimized", "missed", "note" and "optall" are documented as being
sub-options for -fdump-tree-* in doc/invoke.texi, but not implemented.
2) "optimized", "missed", "note" and "optall" are however enabled via
-fdump-tree-all

Could we at least fix these issues in the short term, as it doesn't
affect the documented behavior (but rather adds the documented
behavior)?

Thanks,
Teresa

>
> thanks,
>
> David
>
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Teresa
>>>
>>> 2014-05-09  Teresa Johnson  
>>>
>>> * doc/invoke.texi: Fix typo.
>>> * dumpfile.c: Add support for documented -fdump-* options
>>> optimized/missed/note/optall.
>>>
>>> Index: doc/invoke.texi
>>> ===
>>> --- doc/invoke.texi (revision 210157)
>>> +++ doc/invoke.texi (working copy)
>>> @@ -6278,7 +6278,7 @@ passes).
>>>  @item missed
>>>  Enable showing missed optimization information (only available in certain
>>>  passes).
>>> -@item notes
>>> +@item note
>>>  Enable other detailed optimization information (only available in
>>>  certain passes).
>>>  @item =@var{filename}
>>> Index: dumpfile.c
>>> ===
>>> --- dumpfile.c  (revision 210157)
>>> +++ dumpfile.c  (working copy)
>>> @@ -107,6 +107,10 @@ static const struct dump_option_value_info dump_op
>>>{"nouid", TDF_NOUID},
>>>{"enumerate_locals", TDF_ENUMERATE_LOCALS},
>>>{"scev", TDF_SCEV},
>>> +  {"optimized", MSG_OPTIMIZED_LOCATIONS},
>>> +  {"missed", MSG_MISSED_OPTIMIZATION},
>>> +  {"note", MSG_NOTE},
>>> +  {"optall", MSG_ALL},
>>>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA
>>> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
>>> | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)},
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Improve -fdump-tree-all efficiency

2014-06-26 Thread Teresa Johnson
The following patch fixes a big inefficiency when using -fdump-tree-all
for large source files. I found that when using this option the
compile time became unreasonably slow, and I traced this to the fact
that dump_begin/dump_end are called around every function/class that
are dumped via -fdump-tree-original and -fdump-class-hierarchy.
I fixed this by opening the .original and .class dumps once before
invoking the parser and closing them once after.

For a file containing ~7000 each of functions and classes, the real time
measured for the compile is:
  no dumping: 8s
  -fdump-tree-all, no patch: 10m30s
  -fdump-tree-all, my patch: 1m21s

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2014-06-26  Teresa Johnson  

* c-family/c-common.h (get_dump_info): Declare.
* c-family/c-gimplify.c (c_genericize): Use saved dump files.
* c-family/c-opts.c (c_common_parse_file): Begin and end dumps
once around parsing invocation.
(get_dump_info): New function.
* cp/class.c (dump_class_hierarchy): Use saved dump files.
(dump_vtable): Ditto.
(dump_vtt): Ditto.

Index: c-family/c-common.h
===
--- c-family/c-common.h (revision 211980)
+++ c-family/c-common.h (working copy)
@@ -835,6 +835,7 @@ extern bool c_common_post_options (const char **);
 extern bool c_common_init (void);
 extern void c_common_finish (void);
 extern void c_common_parse_file (void);
+extern FILE *get_dump_info (int, int *);
 extern alias_set_type c_common_get_alias_set (tree);
 extern void c_register_builtin_type (tree, const char*);
 extern bool c_promoting_integer_type_p (const_tree);
Index: c-family/c-gimplify.c
===
--- c-family/c-gimplify.c   (revision 211980)
+++ c-family/c-gimplify.c   (working copy)
@@ -123,7 +123,7 @@ c_genericize (tree fndecl)
 }

   /* Dump the C-specific tree IR.  */
-  dump_orig = dump_begin (TDI_original, &local_dump_flags);
+  dump_orig = get_dump_info (TDI_original, &local_dump_flags);
   if (dump_orig)
 {
   fprintf (dump_orig, "\n;; Function %s",
@@ -140,8 +140,6 @@ c_genericize (tree fndecl)
   else
print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
   fprintf (dump_orig, "\n");
-
-  dump_end (TDI_original, dump_orig);
 }

   /* Dump all nested functions now.  */
Index: c-family/c-opts.c
===
--- c-family/c-opts.c   (revision 211980)
+++ c-family/c-opts.c   (working copy)
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
   TARGET_FLT_EVAL_METHOD_NON_DEFAULT and
   TARGET_OPTF.  */
 #include "tm_p.h"  /* For C_COMMON_OVERRIDE_OPTIONS.  */
+#include "dumpfile.h"

 #ifndef DOLLARS_IN_IDENTIFIERS
 # define DOLLARS_IN_IDENTIFIERS true
@@ -102,6 +103,12 @@ static size_t deferred_count;
 /* Number of deferred options scanned for -include.  */
 static size_t include_cursor;

+/* Dump files/flags to use during parsing.  */
+static FILE *original_dump_file = NULL;
+static int original_dump_flags;
+static FILE *class_dump_file = NULL;
+static int class_dump_flags;
+
 /* Whether any standard preincluded header has been preincluded.  */
 static bool done_preinclude;

@@ -1088,6 +1095,10 @@ c_common_parse_file (void)
   for (;;)
 {
   c_finish_options ();
+  /* Open the dump files to use for the original and class dump output
+ here, to be used during parsing for the current file.  */
+  original_dump_file = dump_begin (TDI_original, &original_dump_flags);
+  class_dump_file = dump_begin (TDI_class, &class_dump_flags);
   pch_init ();
   push_file_scope ();
   c_parse_file ();
@@ -1101,6 +1112,16 @@ c_common_parse_file (void)
   cpp_clear_file_cache (parse_in);
   this_input_filename
= cpp_read_main_file (parse_in, in_fnames[i]);
+  if (original_dump_file)
+{
+  dump_end (TDI_original, original_dump_file);
+  original_dump_file = NULL;
+}
+  if (class_dump_file)
+{
+  dump_end (TDI_class, class_dump_file);
+  class_dump_file = NULL;
+}
   /* If an input file is missing, abandon further compilation.
 cpplib has issued a diagnostic.  */
   if (!this_input_filename)
@@ -1108,6 +1129,23 @@ c_common_parse_file (void)
 }
 }

+/* Returns the appropriate dump file for PHASE to dump with FLAGS.  */
+FILE *
+get_dump_info (int phase, int *flags)
+{
+  gcc_assert (phase == TDI_original || phase == TDI_class);
+  if (phase == TDI_original)
+{
+  *flags = original_dump_flags;
+  return original_dump_file;
+}
+  else
+{
+  *flags = class_dump_flags;
+  return class_dump_file;
+}
+}
+
 

Re: [PATCH] Add missing -fdump-* options

2014-06-26 Thread Teresa Johnson
On Thu, Jun 26, 2014 at 12:40 AM, Richard Biener
 wrote:
> On Wed, Jun 25, 2014 at 4:21 PM, Teresa Johnson  wrote:
>> On Tue, May 13, 2014 at 8:19 AM, Xinliang David Li  
>> wrote:
>>> On Tue, May 13, 2014 at 1:39 AM, Richard Biener
>>>  wrote:
>>>> On Fri, May 9, 2014 at 5:54 PM, Teresa Johnson  
>>>> wrote:
>>>>> I discovered that the support for the documented -fdump-* options
>>>>> "optimized", "missed", "note" and "optall" was missing. Added that and
>>>>> fixed a minor typo in the documentation.
>>>>>
>>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>>>>
>>>> I'm not sure they were intented for user-consumption.  ISTR they
>>>> are just an implementation detail exposed by -fopt-info-X (which is
>>>> where they are documented).
>>>>
>>>> The typo fix is ok, also adding a comment before the dump flags
>>>> definition to the above fact.
>>>>
>>>> David, do I remember correctly?
>>>
>>> I remember we talked about content filtering dump flags. Things like
>>>
>>> -fdump-xxx-ir  <-- dump IR only
>>> -fdump-xxx-transformation <-- optimization note
>>> -fdump-xxx-debug <-- other debug traces
>>>
>>> Other than that, now I think 'details' and 'all' seem redundant.
>>> 'verbose' flag/modifier can achieve the same effect depending on the
>>> context.
>>>
>>> -fdump-xxx-ir-verbose <-- dump IR, and turn on IR modifiers such as
>>> vops, lineno, etc
>>> -fdump-xxx-transforamtion-verbose <-- dump transformations + missed
>>> optimizations + notes
>>> -fdump-xxx-debug-verbose <-- turn on detailed trace.
>>
>> The above proposal seems fine to me as a longer-term direction, but
>> also seems somewhat orthogonal to the issue my patch is trying to
>> solve in the short term, namely inconsistent documentation and
>> behavior:
>>
>> 1) "optimized", "missed", "note" and "optall" are documented as being
>> sub-options for -fdump-tree-* in doc/invoke.texi, but not implemented.
>> 2) "optimized", "missed", "note" and "optall" are however enabled via
>> -fdump-tree-all
>>
>> Could we at least fix these issues in the short term, as it doesn't
>> affect the documented behavior (but rather adds the documented
>> behavior)?
>
> Sure.
>
> Richard.

Thanks, retested and committed as r212040.
Teresa

>
>> Thanks,
>> Teresa
>>
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>> 2014-05-09  Teresa Johnson  
>>>>>
>>>>> * doc/invoke.texi: Fix typo.
>>>>> * dumpfile.c: Add support for documented -fdump-* options
>>>>> optimized/missed/note/optall.
>>>>>
>>>>> Index: doc/invoke.texi
>>>>> ===
>>>>> --- doc/invoke.texi (revision 210157)
>>>>> +++ doc/invoke.texi (working copy)
>>>>> @@ -6278,7 +6278,7 @@ passes).
>>>>>  @item missed
>>>>>  Enable showing missed optimization information (only available in certain
>>>>>  passes).
>>>>> -@item notes
>>>>> +@item note
>>>>>  Enable other detailed optimization information (only available in
>>>>>  certain passes).
>>>>>  @item =@var{filename}
>>>>> Index: dumpfile.c
>>>>> ===
>>>>> --- dumpfile.c  (revision 210157)
>>>>> +++ dumpfile.c  (working copy)
>>>>> @@ -107,6 +107,10 @@ static const struct dump_option_value_info dump_op
>>>>>{"nouid", TDF_NOUID},
>>>>>{"enumerate_locals", TDF_ENUMERATE_LOCALS},
>>>>>{"scev", TDF_SCEV},
>>>>> +  {"optimized", MSG_OPTIMIZED_LOCATIONS},
>>>>> +  {"missed", MSG_MISSED_OPTIMIZATION},
>>>>> +  {"note", MSG_NOTE},
>>>>> +  {"optall", MSG_ALL},
>>>>>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | 
>>>>> TDF_IPA
>>>>> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
>>>>> | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | 
>>>>> TDF_SCEV)},
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] LIPO resolved node handling for inline clone references

2014-07-01 Thread Teresa Johnson
This patch addresses an issue with looking up the resolved node
rebuilding cgraph references during clone materialization. It is
possible that in ipa-cp cases the resolved node was eliminated during
inlining and indirect call promotion, but the clone still exists as a
function call argument in an inline virtual clone. The clones would be
removed later when we actually do the inline transformation, but in
the meantime during clone materialization the cgraph references are
rebuilt, and the temporary reference on the call still exists in the
IR. Handle this by skipping such references in inline clones.

Passes regression tests. Ok for google/4_9?

Teresa

2014-07-01  Teresa Johnson  

Google ref b/15411384.
* cgraphbuild.c (mark_address): Skip resolved node lookup
in inline copies.

Index: cgraphbuild.c
===
--- cgraphbuild.c   (revision 211957)
+++ cgraphbuild.c   (working copy)
@@ -389,8 +389,28 @@ mark_address (gimple stmt, tree addr, tree, void *
   addr = get_base_address (addr);
   if (TREE_CODE (addr) == FUNCTION_DECL)
 {
+  /* Before possibly cloning the node in cgraph_get_create_node,
+ save the current cgraph node for addr.  */
+  struct cgraph_node *first_clone = cgraph_get_node (addr);
   struct cgraph_node *node = cgraph_get_create_node (addr);
-  if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done)
+  /* In LIPO mode we use the resolved node. However, there is
+ a possibility that it may not exist at this point. This
+ can happen in cases of ipa-cp, where this is a reference
+ that will eventually go away during inline_transform when we
+ invoke cgraph_redirect_edge_call_stmt_to_callee to rewrite
+ the call_stmt and skip some arguments. It is possible
+ that earlier during inline_call the references to the original
+ non-cloned resolved node were all eliminated, and it was removed.
+ However, virtual clones may stick around until inline_transform,
+ due to references in other virtual clones, at which point they
+ will all be removed. In between inline_call and inline_transform,
+ however, we will materialize clones which would rebuild references
+ and end up here upon still seeing the reference on the call.
+ Handle this by skipping the resolved node lookup when the first
+ clone was marked global.inlined_to (i.e. it is a virtual clone,
+ the original is gone).  */
+  if (L_IPO_COMP_MODE && cgraph_pre_profiling_inlining_done
+  && first_clone && !first_clone->global.inlined_to)
 node = cgraph_lipo_get_resolved_node (addr);

   cgraph_mark_address_taken_node (node);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate

2014-07-02 Thread Teresa Johnson
The following patch adds support for a new libgcov interface,
__gcov_profiling_enabled, that can be used by applications to
determine whether a binary has been instrumented for test coverage or
profile optimization.

Passes regression tests and manual testing with different options. Ok
for google branches?

Thanks,
Teresa

2014-07-02  Teresa Johnson  

Google ref b/15378201.
* gcc/tree-profile.c (gcov_test_coverage_decl): Declare.
(tree_init_instrumentation): Initialize gcov_test_coverage_decl.
* libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define.
* libgcc/libgcov-interface.c (__gcov_profiling_enabled): New function.

Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c  (revision 212044)
+++ gcc/tree-profile.c  (working copy)
@@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL
 /* extern gcov_unsigned_t __gcov_profile_prefix  */
 static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE;

+/* extern gcov_unsigned_t __gcov_test_coverage  */
+static tree GTY(()) gcov_test_coverage_decl = NULL_TREE;
+
 /* extern gcov_unsigned_t __gcov_sampling_period  */
 static GTY(()) tree gcov_sampling_period_decl = NULL_TREE;

@@ -498,6 +501,27 @@ tree_init_instrumentation (void)
   DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr;
   varpool_finalize_decl (gcov_profile_prefix_decl);
 }
+
+  if (!gcov_test_coverage_decl)
+{
+  /* Initialize __gcov_test_coverage to 1 if -ftest-coverage
+ specified, 0 otherwise. Used by libgcov to determine whether
+ a binary was instrumented for coverage or profile optimization.  */
+  gcov_test_coverage_decl = build_decl (
+  UNKNOWN_LOCATION,
+  VAR_DECL,
+  get_identifier ("__gcov_test_coverage"),
+  get_gcov_unsigned_t ());
+  TREE_PUBLIC (gcov_test_coverage_decl) = 1;
+  DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1;
+  DECL_COMDAT_GROUP (gcov_test_coverage_decl)
+  = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl);
+  TREE_STATIC (gcov_test_coverage_decl) = 1;
+  DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst (
+  get_gcov_unsigned_t (),
+  flag_test_coverage ? 1 : 0);
+  varpool_finalize_decl (gcov_test_coverage_decl);
+}
 }

 /* Initialization function for FDO sampling.  */
Index: libgcc/libgcov-driver.c
===
--- libgcc/libgcov-driver.c (revision 212044)
+++ libgcc/libgcov-driver.c (working copy)
@@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void)
 char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled;
 extern void __gcov_flush (void);
 char *(*__gcov_dummy_ref6)(void) = &__gcov_flush;
+extern unsigned int __gcov_profiling_enabled (void);
+char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_enabled;

 /* Default callback function for profile instrumentation callback.  */
 extern void __coverage_callback (gcov_type, int);
Index: libgcc/libgcov-interface.c
===
--- libgcc/libgcov-interface.c  (revision 212044)
+++ libgcc/libgcov-interface.c  (working copy)
@@ -116,6 +116,20 @@ __gcov_dump (void)
   set_gcov_dump_complete ();
 }

+/* Emitted in coverage.c.  */
+extern gcov_unsigned_t __gcov_test_coverage;
+
+unsigned int __gcov_profiling_enabled (void);
+
+/* Function that can be called from application to distinguish binaries
+   instrumented from coverage fro those instrumented for profiling
+   (e.g. -fprofile-generate).  */
+
+unsigned int __gcov_profiling_enabled (void)
+{
+  return !__gcov_test_coverage;
+}
+
 #endif /* L_gcov_dump */

 #ifdef L_gcov_sampling


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate

2014-07-02 Thread Teresa Johnson
On Wed, Jul 2, 2014 at 1:15 PM, Xinliang David Li  wrote:
> Should the interface be something like:
>
> int __gcov_profiling_for_test_coverage(void)?

I was equating the term "profiling" with -fprofile-generate, as
opposed to -ftest-coverage instrumentation. But I can change it to
this if you think that is clearer.

Teresa

>
> David
>
>
> On Wed, Jul 2, 2014 at 12:55 PM, Teresa Johnson  wrote:
>> The following patch adds support for a new libgcov interface,
>> __gcov_profiling_enabled, that can be used by applications to
>> determine whether a binary has been instrumented for test coverage or
>> profile optimization.
>>
>> Passes regression tests and manual testing with different options. Ok
>> for google branches?
>>
>> Thanks,
>> Teresa
>>
>> 2014-07-02  Teresa Johnson  
>>
>> Google ref b/15378201.
>> * gcc/tree-profile.c (gcov_test_coverage_decl): Declare.
>> (tree_init_instrumentation): Initialize gcov_test_coverage_decl.
>> * libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define.
>> * libgcc/libgcov-interface.c (__gcov_profiling_enabled): New 
>> function.
>>
>> Index: gcc/tree-profile.c
>> ===
>> --- gcc/tree-profile.c  (revision 212044)
>> +++ gcc/tree-profile.c  (working copy)
>> @@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL
>>  /* extern gcov_unsigned_t __gcov_profile_prefix  */
>>  static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE;
>>
>> +/* extern gcov_unsigned_t __gcov_test_coverage  */
>> +static tree GTY(()) gcov_test_coverage_decl = NULL_TREE;
>> +
>>  /* extern gcov_unsigned_t __gcov_sampling_period  */
>>  static GTY(()) tree gcov_sampling_period_decl = NULL_TREE;
>>
>> @@ -498,6 +501,27 @@ tree_init_instrumentation (void)
>>DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr;
>>varpool_finalize_decl (gcov_profile_prefix_decl);
>>  }
>> +
>> +  if (!gcov_test_coverage_decl)
>> +{
>> +  /* Initialize __gcov_test_coverage to 1 if -ftest-coverage
>> + specified, 0 otherwise. Used by libgcov to determine whether
>> + a binary was instrumented for coverage or profile optimization.  */
>> +  gcov_test_coverage_decl = build_decl (
>> +  UNKNOWN_LOCATION,
>> +  VAR_DECL,
>> +  get_identifier ("__gcov_test_coverage"),
>> +  get_gcov_unsigned_t ());
>> +  TREE_PUBLIC (gcov_test_coverage_decl) = 1;
>> +  DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1;
>> +  DECL_COMDAT_GROUP (gcov_test_coverage_decl)
>> +  = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl);
>> +  TREE_STATIC (gcov_test_coverage_decl) = 1;
>> +  DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst (
>> +  get_gcov_unsigned_t (),
>> +  flag_test_coverage ? 1 : 0);
>> +  varpool_finalize_decl (gcov_test_coverage_decl);
>> +}
>>  }
>>
>>  /* Initialization function for FDO sampling.  */
>> Index: libgcc/libgcov-driver.c
>> ===
>> --- libgcc/libgcov-driver.c (revision 212044)
>> +++ libgcc/libgcov-driver.c (working copy)
>> @@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void)
>>  char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled;
>>  extern void __gcov_flush (void);
>>  char *(*__gcov_dummy_ref6)(void) = &__gcov_flush;
>> +extern unsigned int __gcov_profiling_enabled (void);
>> +char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_enabled;
>>
>>  /* Default callback function for profile instrumentation callback.  */
>>  extern void __coverage_callback (gcov_type, int);
>> Index: libgcc/libgcov-interface.c
>> ===
>> --- libgcc/libgcov-interface.c  (revision 212044)
>> +++ libgcc/libgcov-interface.c  (working copy)
>> @@ -116,6 +116,20 @@ __gcov_dump (void)
>>set_gcov_dump_complete ();
>>  }
>>
>> +/* Emitted in coverage.c.  */
>> +extern gcov_unsigned_t __gcov_test_coverage;
>> +
>> +unsigned int __gcov_profiling_enabled (void);
>> +
>> +/* Function that can be called from application to distinguish binaries
>> +   instrumented from coverage fro those instrumented for profiling
>> +   (e.g. -fprofile-generate).  */
>> +
>> +unsigned int __gcov_profiling_enabled (void)
>> +{
>> +  return !__gcov_test_coverage;
>> +}
>> +
>>  #endif /* L_gcov_dump */
>>
>>  #ifdef L_gcov_sampling
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Define libgcov interface for distinguishing -ftest-coverage from -fprofile-generate

2014-07-02 Thread Teresa Johnson
New patch below. Retested. Ok for google branches?

Thanks,
Teresa

2014-07-02  Teresa Johnson  

Google ref b/15378201.
* gcc/tree-profile.c (gcov_test_coverage_decl): Declare.
(tree_init_instrumentation): Initialize gcov_test_coverage_decl.
* libgcc/libgcov-driver.c (__gcov_dummy_ref7): Define.
* libgcc/libgcov-interface.c (__gcov_profiling_for_test_coverage): New
function.

Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c  (revision 212044)
+++ gcc/tree-profile.c  (working copy)
@@ -164,6 +164,9 @@ static GTY(()) tree gcov_sample_counter_decl = NUL
 /* extern gcov_unsigned_t __gcov_profile_prefix  */
 static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE;

+/* extern gcov_unsigned_t __gcov_test_coverage  */
+static tree GTY(()) gcov_test_coverage_decl = NULL_TREE;
+
 /* extern gcov_unsigned_t __gcov_sampling_period  */
 static GTY(()) tree gcov_sampling_period_decl = NULL_TREE;

@@ -498,6 +501,27 @@ tree_init_instrumentation (void)
   DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr;
   varpool_finalize_decl (gcov_profile_prefix_decl);
 }
+
+  if (!gcov_test_coverage_decl)
+{
+  /* Initialize __gcov_test_coverage to 1 if -ftest-coverage
+ specified, 0 otherwise. Used by libgcov to determine whether
+ a binary was instrumented for coverage or profile optimization.  */
+  gcov_test_coverage_decl = build_decl (
+  UNKNOWN_LOCATION,
+  VAR_DECL,
+  get_identifier ("__gcov_test_coverage"),
+  get_gcov_unsigned_t ());
+  TREE_PUBLIC (gcov_test_coverage_decl) = 1;
+  DECL_ARTIFICIAL (gcov_test_coverage_decl) = 1;
+  DECL_COMDAT_GROUP (gcov_test_coverage_decl)
+  = DECL_ASSEMBLER_NAME (gcov_test_coverage_decl);
+  TREE_STATIC (gcov_test_coverage_decl) = 1;
+  DECL_INITIAL (gcov_test_coverage_decl) = build_int_cst (
+  get_gcov_unsigned_t (),
+  flag_test_coverage ? 1 : 0);
+  varpool_finalize_decl (gcov_test_coverage_decl);
+}
 }

 /* Initialization function for FDO sampling.  */
Index: libgcc/libgcov-driver.c
===
--- libgcc/libgcov-driver.c (revision 212044)
+++ libgcc/libgcov-driver.c (working copy)
@@ -79,6 +79,8 @@ extern unsigned int __gcov_sampling_enabled (void)
 char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled;
 extern void __gcov_flush (void);
 char *(*__gcov_dummy_ref6)(void) = &__gcov_flush;
+extern unsigned int __gcov_profiling_for_test_coverage (void);
+char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_for_test_coverage;

 /* Default callback function for profile instrumentation callback.  */
 extern void __coverage_callback (gcov_type, int);
Index: libgcc/libgcov-interface.c
===
--- libgcc/libgcov-interface.c  (revision 212044)
+++ libgcc/libgcov-interface.c  (working copy)
@@ -116,6 +116,20 @@ __gcov_dump (void)
   set_gcov_dump_complete ();
 }

+/* Emitted in coverage.c.  */
+extern gcov_unsigned_t __gcov_test_coverage;
+
+unsigned int __gcov_profiling_for_test_coverage (void);
+
+/* Function that can be called from application to distinguish binaries
+   instrumented for coverage from those instrumented for profile
+   optimization (e.g. -fprofile-generate).  */
+
+unsigned int __gcov_profiling_for_test_coverage (void)
+{
+  return __gcov_test_coverage;
+}
+
 #endif /* L_gcov_dump */

 #ifdef L_gcov_sampling

On Wed, Jul 2, 2014 at 1:25 PM, Xinliang David Li  wrote:
> The reason is that test coverage is using the same underlying
> profiling. Returning false when calling '__gcov_profiling_enabled())
> in coverage mode is a little misleading.
>
> David
>
> On Wed, Jul 2, 2014 at 1:22 PM, Teresa Johnson  wrote:
>> On Wed, Jul 2, 2014 at 1:15 PM, Xinliang David Li  wrote:
>>> Should the interface be something like:
>>>
>>> int __gcov_profiling_for_test_coverage(void)?
>>
>> I was equating the term "profiling" with -fprofile-generate, as
>> opposed to -ftest-coverage instrumentation. But I can change it to
>> this if you think that is clearer.
>>
>> Teresa
>>
>>>
>>> David
>>>
>>>
>>> On Wed, Jul 2, 2014 at 12:55 PM, Teresa Johnson  
>>> wrote:
>>>> The following patch adds support for a new libgcov interface,
>>>> __gcov_profiling_enabled, that can be used by applications to
>>>> determine whether a binary has been instrumented for test coverage or
>>>> profile optimization.
>>>>
>>>> Passes regression tests and manual testing with different options. Ok
>>>> for google branches?
>>>>
>>>> Thanks,
>>>

Re: [PATCH] Handle more COMDAT profiling issues

2014-07-07 Thread Teresa Johnson
Finally had some time to pick this up again. See responses and
questions below. But it looks like you are commenting on a stale
version of the patch. See the update I sent in March:

https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01176.html


On Sun, May 25, 2014 at 10:43 PM, Jan Hubicka  wrote:
>> This patch attempts to address the lost profile issue for COMDATs in
>> more circumstances, exposed by function splitting.
>>
>> My earlier patch handled the case where the comdat had 0 counts since
>> the linker kept the copy in a different module. In that case we
>> prevent the guessed frequencies on 0-count functions from being
>> dropped by counts_to_freq, and then later mark any reached via
>> non-zero callgraph edges as guessed. Finally, when one such 0-count
>> comdat is inlined the call count is propagated to the callee blocks
>> using the guessed probabilities.
>>
>> However, in this case, there was a comdat that had a very small
>> non-zero count, that was being inlined to a much hotter callsite. This
>> could happen when there was a copy that was ipa-inlined
>> in the profile gen compile, so the copy in that module gets some
>> non-zero counts from the ipa inlined instance, but when the out of
>> line copy was eliminated by the linker (selected from a different
>> module). In this case the inliner was scaling the bb counts up quite a
>> lot when inlining. The problem is that you most likely can't trust
>> that the 0 count bbs in such a case are really not executed by the
>> callsite it is being into, since the counts are very small and
>> correspond to a different callsite. In some internal C++ applications
>> I am seeing that we execute out of the split cold portion of COMDATs
>> for this reason.
>
> With all the problems we have with COMDATs, it still seems to me that perhaps
> best approach would be to merge them in runtime, as we discussed some time ago
> and incrementally handle the problem how to get callstack sensitive profiles.

I recently committed some functionality to a google branch to do some
COMDAT profile merging at runtime, but that was to handle all-zero
count copies and operates on top of the LIPO runtime infrastructure.
We don't do any fixup in non-zero count COMDATs to avoid losing
context-sensitive profiles.

>>
>> This problem is more complicated to address than the 0-count instance,
>> because we need the cgraph to determine which functions to drop the
>> profile on, and at that point the estimated frequencies have already
>> been overwritten by counts_to_freqs. To handle this broader case, I
>> have changed the drop_profile routine to simply re-estimate the
>> probabilities/frequencies (and translate these into counts scaled from
>> the incoming call edge counts). This unfortunately necessitates
>> rebuilding the cgraph, to propagate the new synthesized counts and
>> avoid checking failures downstream. But it will only be rebuilt if we
>> dropped any profiles. With this solution, some of the older approach
>> can be removed (e.g. propagating counts using the guessed
>> probabilities during inlining).
>>
>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
>> Also tested on
>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
>>
>> Thanks,
>> Teresa
>
> 2014-02-12  Teresa Johnson  
>
> * graphite.c (graphite_finalize): Pass new parameter.
> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
> * predict.c (tree_estimate_probability): New parameter.
> (tree_estimate_probability_worker): Renamed from
> tree_estimate_probability_driver.
> (tree_reestimate_probability): New function.
> (tree_estimate_probability_driver): Invoke
> tree_estimate_probability_worker.
> (freqs_to_counts): Move here from tree-inline.c.
> (drop_profile): Re-estimated profiles when dropping counts.
> (handle_missing_profiles): Drop for some non-zero functions as well,
> get counts from bbs to support invocation before cgraph rebuild.
> (counts_to_freqs): Remove code obviated by reestimation.
> * predict.h (tree_estimate_probability): Update declaration.
> * tree-inline.c (freqs_to_counts): Move to predict.c.
> (copy_cfg_body): Remove code obviated by reestimation.
> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles
> before cgraph rebuild.
>
> Index: params.def
> ===
> --- params.def  (revision 207436)
> +++ params.def  (working copy)
> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_B

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
Here is the new patch. Bootstrapped and tested on
x86_64-unknown-linux-gnu. OK for trunk?

Thanks,
Teresa

2014-11-13

gcc:
PR tree-optimization/63841
* tree.c (initializer_zerop): A constructor with no elements
does not zero initialize.

gcc/testsuite:
PR tree-optimization/63841
* g++.dg/tree-ssa/pr63841.C: New test.

Index: tree.c
===
--- tree.c  (revision 217190)
+++ tree.c  (working copy)
@@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
   {
unsigned HOST_WIDE_INT idx;

+if (TREE_CLOBBER_P (init))
+  return false;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
  if (!initializer_zerop (elt))
return false;
Index: testsuite/g++.dg/tree-ssa/pr63841.C
===
--- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+std::string __attribute__ ((noinline)) comp_test_write() {
+  std::string data;
+
+  for (int i = 0; i < 2; ++i) {
+char b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+std::string __attribute__ ((noinline)) comp_test_write_good() {
+  std::string data;
+
+  char b;
+  for (int i = 0; i < 2; ++i) {
+b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+int main() {
+  std::string good = comp_test_write_good();
+  printf("expected: %hx\n", *(short*)good.c_str());
+
+  std::string bad = comp_test_write();
+  printf("got: %hx\n", *(short*)bad.c_str());
+
+  return good != bad;
+}

On Wed, Nov 12, 2014 at 9:40 PM, Andrew Pinski  wrote:
> On Wed, Nov 12, 2014 at 9:38 PM, Teresa Johnson  wrote:
>> On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski  wrote:
>>> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson  
>>> wrote:
>>>> Added testcase. Here is the new patch:
>>>>
>>>> 2014-11-12
>>>>
>>>> gcc:
>>>> PR tree-optimization/63841
>>>> * tree.c (initializer_zerop): A constructor with no elements
>>>> does not zero initialize.
>>>
>>> Actually an empty constructor does zero initialize.  A clobber does not.
>>
>> Ok, thanks, I wasn't sure.
>>
>>>
>>> The check you want is TREE_CLOBBER_P instead.
>>
>> So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would
>> return false, right? I will make that change and retest.
>
> Yes that is correct.  Note TREE_CLOBBER_P already checks if it is an
> constructor.
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Teresa
>>
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>>
>>>> gcc/testsuite:
>>>> * g++.dg/tree-ssa/pr63841.C: New test.
>>>>
>>>> Index: tree.c
>>>> ===
>>>> --- tree.c  (revision 217190)
>>>> +++ tree.c  (working copy)
>>>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>>>{
>>>> unsigned HOST_WIDE_INT idx;
>>>>
>>>> +if (!CONSTRUCTOR_NELTS (init))
>>>> +  return false;
>>>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>>>   if (!initializer_zerop (elt))
>>>> return false;
>>>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>>>> ===
>>>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>>>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>>>> @@ -0,0 +1,38 @@
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +#include 
>>>> +#include 
>>>> +
>>>> +std::string __attribute__ ((noinline)) comp_test_write() {
>>>> +  std::string data;
>>>> +
>>>> +  for (int i = 0; i < 2; ++i) {
>>>> +char b = 1 >> (i * 8);
>>>> +data.append(&b, 1);
>>>> +  }
>>>> +
>>>> +  return data;
>>>> +}
>>>> +
>>>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>>>> +  std::string data;
>>>> +
>>>> +  char b;
>>>> +  for (int i = 0; i < 2; ++i) {

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 6:32 AM, Richard Biener
 wrote:
> On Thu, Nov 13, 2014 at 3:32 PM, Richard Biener
>  wrote:
>> On Thu, Nov 13, 2014 at 3:20 PM, Teresa Johnson  wrote:
>>> Here is the new patch. Bootstrapped and tested on
>>> x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Ok for trunk and branches.
>
> Err - please fix the changelog entry wording to "A clobber does
> not zero inintiaize"

Thanks for catching that - copied from the original patch.

Will commit to trunk now then to gcc-4_9 after regression testing with
the branch.

Thanks,
Teresa

>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Teresa
>>>
>>> 2014-11-13
>>>
>>> gcc:
>>> PR tree-optimization/63841
>>> * tree.c (initializer_zerop): A constructor with no elements
>>> does not zero initialize.
>>>
>>> gcc/testsuite:
>>> PR tree-optimization/63841
>>> * g++.dg/tree-ssa/pr63841.C: New test.
>>>
>>> Index: tree.c
>>> ===
>>> --- tree.c  (revision 217190)
>>> +++ tree.c  (working copy)
>>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>>{
>>> unsigned HOST_WIDE_INT idx;
>>>
>>> +if (TREE_CLOBBER_P (init))
>>> +  return false;
>>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>>   if (!initializer_zerop (elt))
>>> return false;
>>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>>> ===
>>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +std::string __attribute__ ((noinline)) comp_test_write() {
>>> +  std::string data;
>>> +
>>> +  for (int i = 0; i < 2; ++i) {
>>> +char b = 1 >> (i * 8);
>>> +data.append(&b, 1);
>>> +  }
>>> +
>>> +  return data;
>>> +}
>>> +
>>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>>> +  std::string data;
>>> +
>>> +  char b;
>>> +  for (int i = 0; i < 2; ++i) {
>>> +b = 1 >> (i * 8);
>>> +data.append(&b, 1);
>>> +  }
>>> +
>>> +  return data;
>>> +}
>>> +
>>> +int main() {
>>> +  std::string good = comp_test_write_good();
>>> +  printf("expected: %hx\n", *(short*)good.c_str());
>>> +
>>> +  std::string bad = comp_test_write();
>>> +  printf("got: %hx\n", *(short*)bad.c_str());
>>> +
>>> +  return good != bad;
>>> +}
>>>
>>> On Wed, Nov 12, 2014 at 9:40 PM, Andrew Pinski  wrote:
>>>> On Wed, Nov 12, 2014 at 9:38 PM, Teresa Johnson  
>>>> wrote:
>>>>> On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski  wrote:
>>>>>> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson  
>>>>>> wrote:
>>>>>>> Added testcase. Here is the new patch:
>>>>>>>
>>>>>>> 2014-11-12
>>>>>>>
>>>>>>> gcc:
>>>>>>> PR tree-optimization/63841
>>>>>>> * tree.c (initializer_zerop): A constructor with no elements
>>>>>>> does not zero initialize.
>>>>>>
>>>>>> Actually an empty constructor does zero initialize.  A clobber does not.
>>>>>
>>>>> Ok, thanks, I wasn't sure.
>>>>>
>>>>>>
>>>>>> The check you want is TREE_CLOBBER_P instead.
>>>>>
>>>>> So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would
>>>>> return false, right? I will make that change and retest.
>>>>
>>>> Yes that is correct.  Note TREE_CLOBBER_P already checks if it is an
>>>> constructor.
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
&

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 6:36 AM, Jakub Jelinek  wrote:
> On Thu, Nov 13, 2014 at 06:20:16AM -0800, Teresa Johnson wrote:
>> Here is the new patch. Bootstrapped and tested on
>> x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2014-11-13
>>
>> gcc:
>> PR tree-optimization/63841
>> * tree.c (initializer_zerop): A constructor with no elements
>> does not zero initialize.
>>
>> gcc/testsuite:
>> PR tree-optimization/63841
>> * g++.dg/tree-ssa/pr63841.C: New test.
>>
>> Index: tree.c
>> ===
>> --- tree.c  (revision 217190)
>> +++ tree.c  (working copy)
>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>{
>> unsigned HOST_WIDE_INT idx;
>>
>> +if (TREE_CLOBBER_P (init))
>> +  return false;
>
> Wrong formatting.

Sorry, not sure I understand why? My mailer does tend to eat spaces,
but it is indented the correct amount and I think has the right spaces
within the line.

>
> Also, while this perhaps is useful, I'd say the right fix is that 
> strlen_optimize_stmt
> should just ignore gimple_clobber_p (stmt) statements (well, call
> the maybe_invalidate at the end for them).
> So
> -  else if (is_gimple_assign (stmt))
> +  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
> in strlen_optimize_stmt.

Ok, I have held off on my commit for now. I considered fixing this in
tree-ssa-strlen, but thought this was a potentially larger problem
with initializer_zerop, where it shouldn't be considering a clobber to
be a zero init and might be affecting other callers as well.

If we make the change to initializer_zerop is it still useful to
change tree-strlen?

Teresa

>
> Jakub



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek  wrote:
> On Thu, Nov 13, 2014 at 06:46:23AM -0800, Teresa Johnson wrote:
>> >> --- tree.c  (revision 217190)
>> >> +++ tree.c  (working copy)
>> >> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>> >>{
>> >> unsigned HOST_WIDE_INT idx;
>> >>
>> >> +if (TREE_CLOBBER_P (init))
>> >> +  return false;
>> >
>> > Wrong formatting.
>>
>> Sorry, not sure I understand why? My mailer does tend to eat spaces,
>> but it is indented the correct amount and I think has the right spaces
>> within the line.
>
> I meant that you are using spaces instead of tabs and the unsigned line
> above it being one char off suggested to me visually it has the tab in there
> (apparently it is eaten too).  Use MUA that doesn't eat it, or convince your
> employer that tabs are important in e-mails? ;)
>
> If you commit tabs in there, it is fine with me.

Changed the spaces to tabs to match the surrounding lines and
committed to trunk (r217505).

>
>> Ok, I have held off on my commit for now. I considered fixing this in
>> tree-ssa-strlen, but thought this was a potentially larger problem
>> with initializer_zerop, where it shouldn't be considering a clobber to
>> be a zero init and might be affecting other callers as well.
>
> As I said, I think your tree.c change is useful, but perhaps too risky
> for the release branches, because we don't know what all it will affect?
>
>> If we make the change to initializer_zerop is it still useful to
>> change tree-strlen?
>
> I think it is better to be clear on it that right now we want clobbers
> to clobber the memory for strlen pass purposes, maybe in the future we want
> to perform some optimizations for them as Richard suggested in the PR
> (though, extending the lifetime in that case by removing the clobber
> shouldn't be done unconditionally (i.e. limited to when we'd actually want
> to benefit from the known length) and assuming we are close to the clobber
> (i.e. it is fine to extend it a little bit, but not extend the lifetime
> across multi-megabyte function e.g.).
> And for release branches I'd really prefer tree-ssa-strlen.c change.

Ok, I started testing the initializer_zerop change on the 4_9 branch,
will also test the strlen fix and send that patch for review here when
it completes.

Thanks,
Teresa

>
> Jakub



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 7:39 AM, Teresa Johnson  wrote:
> On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek  wrote:
>> And for release branches I'd really prefer tree-ssa-strlen.c change.
>
> Ok, I started testing the initializer_zerop change on the 4_9 branch,
> will also test the strlen fix and send that patch for review here when
> it completes.

Here is the more conservative patch for 4_9. Bootstrapped and tested
on x86_64-unknown-linux-gnu. Ok for gcc-4_9 branch?

Thanks,
Teresa

2014-11-13  Teresa Johnson  

gcc:
PR tree-optimization/63841
* tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers.

2014-11-13  Teresa Johnson  

gcc/testsuite:
PR tree-optimization/63841
* testsuite/g++.dg/tree-ssa/pr63841.C: New test.

Index: tree-ssa-strlen.c
===
--- tree-ssa-strlen.c   (revision 217503)
+++ tree-ssa-strlen.c   (working copy)
@@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi)
break;
  }
 }
-  else if (is_gimple_assign (stmt))
+  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
 {
   tree lhs = gimple_assign_lhs (stmt);

Index: testsuite/g++.dg/tree-ssa/pr63841.C
===
--- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+std::string __attribute__ ((noinline)) comp_test_write() {
+  std::string data;
+
+  for (int i = 0; i < 2; ++i) {
+char b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+std::string __attribute__ ((noinline)) comp_test_write_good() {
+  std::string data;
+
+  char b;
+  for (int i = 0; i < 2; ++i) {
+b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+int main() {
+  std::string good = comp_test_write_good();
+  printf("expected: %hx\n", *(short*)good.c_str());
+
+  std::string bad = comp_test_write();
+  printf("got: %hx\n", *(short*)bad.c_str());
+
+  return good != bad;
+}



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 12:55 PM, Jakub Jelinek  wrote:
> On Thu, Nov 13, 2014 at 12:51:05PM -0800, Teresa Johnson wrote:
>> On Thu, Nov 13, 2014 at 7:39 AM, Teresa Johnson  wrote:
>> > On Thu, Nov 13, 2014 at 7:12 AM, Jakub Jelinek  wrote:
>> >> And for release branches I'd really prefer tree-ssa-strlen.c change.
>> >
>> > Ok, I started testing the initializer_zerop change on the 4_9 branch,
>> > will also test the strlen fix and send that patch for review here when
>> > it completes.
>>
>> Here is the more conservative patch for 4_9. Bootstrapped and tested
>> on x86_64-unknown-linux-gnu. Ok for gcc-4_9 branch?
>
> Ok, thanks.  But I have a comment regarding the test below:
>
>> 2014-11-13  Teresa Johnson  
>>
>> gcc:
>> PR tree-optimization/63841
>> * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers.
>>
>> 2014-11-13  Teresa Johnson  
>>
>> gcc/testsuite:
>> PR tree-optimization/63841
>> * testsuite/g++.dg/tree-ssa/pr63841.C: New test.
>>
>> Index: tree-ssa-strlen.c
>> ===
>> --- tree-ssa-strlen.c   (revision 217503)
>> +++ tree-ssa-strlen.c   (working copy)
>> @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi)
>> break;
>>   }
>>  }
>> -  else if (is_gimple_assign (stmt))
>> +  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
>>  {
>>tree lhs = gimple_assign_lhs (stmt);
>>
>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>> ===
>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>> @@ -0,0 +1,38 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include 
>> +#include 
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write() {
>> +  std::string data;
>> +
>> +  for (int i = 0; i < 2; ++i) {
>> +char b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>> +  std::string data;
>> +
>> +  char b;
>> +  for (int i = 0; i < 2; ++i) {
>> +b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +int main() {
>> +  std::string good = comp_test_write_good();
>> +  printf("expected: %hx\n", *(short*)good.c_str());
>> +
>> +  std::string bad = comp_test_write();
>> +  printf("got: %hx\n", *(short*)bad.c_str());
>
> Supposedly the printfs should have been removed and the #include 
> isn't needed then either.  No need to clutter the test output and log files.
> On the other side, tests should abort (); or __builtin_abort (); on failure,
> not just return non-zero.

Ok, sounds good. Here is the new patch. Confirmed that the test case
passes with the fix, aborts without it.

If this looks good I will also update the trunk version of the test.

Thanks,
Teresa

2014-11-13  Teresa Johnson  

gcc:
PR tree-optimization/63841
* tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers.

2014-11-13  Teresa Johnson  

gcc/testsuite:
PR tree-optimization/63841
* testsuite/g++.dg/tree-ssa/pr63841.C: New test.

Index: tree-ssa-strlen.c
===
--- tree-ssa-strlen.c   (revision 217503)
+++ tree-ssa-strlen.c   (working copy)
@@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi)
break;
  }
 }
-  else if (is_gimple_assign (stmt))
+  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
 {
   tree lhs = gimple_assign_lhs (stmt);

Index: testsuite/g++.dg/tree-ssa/pr63841.C
===
--- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+
+std::string __attribute__ ((noinline)) comp_test_write() {
+  std::string data;
+
+  for (int i = 0; i < 2; ++i) {
+char b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+std::string __attribute__ ((noinline)) comp_test_write_good() {
+  std::string data;
+
+  char b;
+  for (int i = 0; i < 2; ++i) {
+b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+int main() {
+  std::string good = comp_test_write_good();
+  std::string bad = comp_test_write();
+
+  if (good != bad)
+__builtin_abort ();
+}

>
>> +
>> +  return good != bad;
>> +}
>
> Jakub



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-13 Thread Teresa Johnson
On Thu, Nov 13, 2014 at 1:45 PM, Jakub Jelinek  wrote:
> On Thu, Nov 13, 2014 at 01:33:07PM -0800, Teresa Johnson wrote:
>> > Supposedly the printfs should have been removed and the #include 
>> > isn't needed then either.  No need to clutter the test output and log 
>> > files.
>> > On the other side, tests should abort (); or __builtin_abort (); on 
>> > failure,
>> > not just return non-zero.
>>
>> Ok, sounds good. Here is the new patch. Confirmed that the test case
>> passes with the fix, aborts without it.
>>
>> If this looks good I will also update the trunk version of the test.
>
> LGTM, thanks.
>
> Please commit also the tree-ssa-strlen.c change to the trunk, it will not
> hurt.

Ok. It is on gcc-4_9 now, will commit both to trunk as soon as
regression testing completes.

Thanks,
Teresa

>
>> 2014-11-13  Teresa Johnson  
>>
>> gcc:
>>     PR tree-optimization/63841
>> * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers.
>>
>> 2014-11-13  Teresa Johnson  
>>
>> gcc/testsuite:
>> PR tree-optimization/63841
>> * testsuite/g++.dg/tree-ssa/pr63841.C: New test.
>>
>> Index: tree-ssa-strlen.c
>> ===
>> --- tree-ssa-strlen.c   (revision 217503)
>> +++ tree-ssa-strlen.c   (working copy)
>> @@ -1856,7 +1856,7 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi)
>> break;
>>   }
>>  }
>> -  else if (is_gimple_assign (stmt))
>> +  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
>>  {
>>tree lhs = gimple_assign_lhs (stmt);
>>
>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>> ===
>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>> @@ -0,0 +1,35 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include 
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write() {
>> +  std::string data;
>> +
>> +  for (int i = 0; i < 2; ++i) {
>> +char b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>> +  std::string data;
>> +
>> +  char b;
>> +  for (int i = 0; i < 2; ++i) {
>> +b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +int main() {
>> +  std::string good = comp_test_write_good();
>> +  std::string bad = comp_test_write();
>> +
>> +  if (good != bad)
>> +__builtin_abort ();
>> +}
>
> Jakub



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-19 Thread Teresa Johnson
On Wed, Mar 18, 2015 at 9:25 PM, Xinliang David Li  wrote:
> get_local_tls_init_fn can be called from other contexts other than
> 'handle_tls_init' -- is the added new assertion safe ?

In fact there is still an issue here, for file static TLS vars. Will
work on a fix and send the original test case (forgot to include in
first patch) and the new one with the fixed patch.

Teresa

>
> David
>
> On Wed, Mar 18, 2015 at 9:18 PM, Teresa Johnson 
> wrote:
>>
>> Ensure TLS variable wrapper and init functions are recorded at
>> the module scope upon creation so that they are cleared when
>> popping the module scope in between modules in LIPO mode.
>>
>> Also, do not generate a local TLS init function (__tls_init)
>> for aux functions, only in the primary modules.
>>
>> Passes regression tests. Ok for Google branches?
>> Teresa
>>
>> 2015-03-18  Teresa Johnson  
>>
>> Google ref b/19618364.
>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module.
>> (get_tls_init_fn): Record new global decl.
>> (get_tls_wrapper_fn): Ditto.
>> (handle_tls_init): Skip aux modules.
>>
>> Index: cp/decl2.c
>> ===
>> --- cp/decl2.c  (revision 221425)
>> +++ cp/decl2.c  (working copy)
>> @@ -3036,6 +3036,9 @@ get_local_tls_init_fn (void)
>>DECL_ARTIFICIAL (fn) = true;
>>mark_used (fn);
>>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
>> +  /* In LIPO mode we should not generate local init functions for
>> + the aux modules (see handle_tls_init).  */
>> +  gcc_assert (!L_IPO_IS_AUXILIARY_MODULE);
>>  }
>>return fn;
>>  }
>> @@ -3100,6 +3103,11 @@ get_tls_init_fn (tree var)
>>DECL_BEFRIENDING_CLASSES (fn) = var;
>>
>>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
>> +  /* In LIPO mode make sure we record the new global value so that it
>> + is cleared before parsing the next aux module.  */
>> +  if (L_IPO_COMP_MODE && !is_parsing_done_p ())
>> +add_decl_to_current_module_scope (fn,
>> +  NAMESPACE_LEVEL
>> (global_namespace));
>>  }
>>return fn;
>>  }
>> @@ -3157,6 +3165,11 @@ get_tls_wrapper_fn (tree var)
>>DECL_BEFRIENDING_CLASSES (fn) = var;
>>
>>SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
>> +  /* In LIPO mode make sure we record the new global value so that it
>> + is cleared before parsing the next aux module.  */
>> +  if (L_IPO_COMP_MODE && !is_parsing_done_p ())
>> +add_decl_to_current_module_scope (fn,
>> +  NAMESPACE_LEVEL
>> (global_namespace));
>>  }
>>return fn;
>>  }
>> @@ -4179,6 +4192,14 @@ clear_decl_external (struct cgraph_node *node, voi
>>  static void
>>  handle_tls_init (void)
>>  {
>> +  /* In LIPO mode we should not generate local init functions for
>> + aux modules. The wrapper will reference the variable's init function
>> + that is defined in its own primary module. Otherwise there is
>> + a name conflict between the primary and aux __tls_init functions,
>> + and difficulty ensuring each TLS variable is initialized exactly
>> once.  */
>> +  if (L_IPO_IS_AUXILIARY_MODULE)
>> +return;
>> +
>>tree vars = prune_vars_needing_no_initialization (&tls_aggregates);
>>if (vars == NULL_TREE)
>>  return;
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-19 Thread Teresa Johnson
New patch below. Passes regression tests plus internal application build.

2015-03-19  Teresa Johnson  

gcc/
Google ref b/19618364.
* cp/decl2.c (get_local_tls_init_fn): Assert on aux module.
(get_tls_init_fn): Promote non-public init functions if necessary
in LIPO mode, record new global at module scope.
(get_tls_wrapper_fn): Record new global at module scope.
(handle_tls_init): Skip in aux module, setup alias in exported
primary module.

testsuite/
Google ref b/19618364.
* testsuite/g++.dg/tree-prof/lipo/tls.h: New test.
* testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto.

Index: cp/decl2.c
===
--- cp/decl2.c  (revision 221425)
+++ cp/decl2.c  (working copy)
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
 #include "asan.h"
+#include "gcov-io.h"

 extern cpp_reader *parse_in;

@@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var)
 static tree
 get_local_tls_init_fn (void)
 {
+  /* In LIPO mode we should not generate local init functions for
+ the aux modules (see handle_tls_init).  */
+  gcc_assert (!L_IPO_IS_AUXILIARY_MODULE);
   tree sname = get_identifier ("__tls_init");
   tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
   if (!fn)
@@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var)
   if (!flag_extern_tls_init && DECL_EXTERNAL (var))
 return NULL_TREE;

+  /* In LIPO mode we should not generate local init functions for
+ aux modules. The wrapper will reference the variable's init function
+ that is defined in its own primary module. Otherwise there is
+ a name conflict between the primary and aux __tls_init functions,
+ and difficulty ensuring each TLS variable is initialized exactly once.
+ Therefore, if this is an aux module or an exported primary module, we
+ need to ensure that the init function is available externally by
+ promoting it if it is not already public. This is similar to the
+ handling in promote_static_var_func, but we do it as the init function
+ is created to avoid the need to detect and properly promote this
+ artificial decl later on.  */
+  bool promote = L_IPO_IS_AUXILIARY_MODULE ||
+  (L_IPO_COMP_MODE && PRIMARY_MODULE_EXPORTED);
+
 #ifdef ASM_OUTPUT_DEF
   /* If the variable is internal, or if we can't generate aliases,
- call the local init function directly.  */
-  if (!TREE_PUBLIC (var))
+ call the local init function directly.  Don't do this if we
+ are in LIPO mode an may need to export the init function.  Note
+ that get_local_tls_init_fn will assert if it is called on an aux
+ module.  */
+  if (!TREE_PUBLIC (var) && !promote)
 #endif
 return get_local_tls_init_fn ();

   tree sname = mangle_tls_init_fn (var);
+  if (promote)
+{
+  const char *name = IDENTIFIER_POINTER (sname);
+  char *assembler_name = (char*) alloca (strlen (name) + 30);
+  sprintf (assembler_name, "%s.cmo.%u", name, current_module_id);
+  sname = get_identifier (assembler_name);
+}
+
   tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
   if (!fn)
 {
@@ -3072,7 +3101,18 @@ get_tls_init_fn (tree var)
build_function_type (void_type_node,
 void_list_node));
   SET_DECL_LANGUAGE (fn, lang_c);
-  TREE_PUBLIC (fn) = TREE_PUBLIC (var);
+  if (promote)
+{
+  TREE_PUBLIC (fn) = 1;
+  DECL_VISIBILITY (fn) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (fn) = 1;
+}
+  else
+{
+  TREE_PUBLIC (fn) = TREE_PUBLIC (var);
+  DECL_VISIBILITY (fn) = DECL_VISIBILITY (var);
+  DECL_VISIBILITY_SPECIFIED (fn) = DECL_VISIBILITY_SPECIFIED (var);
+}
   DECL_ARTIFICIAL (fn) = true;
   DECL_COMDAT (fn) = DECL_COMDAT (var);
   DECL_EXTERNAL (fn) = DECL_EXTERNAL (var);
@@ -3091,8 +3131,6 @@ get_tls_init_fn (tree var)
  else
DECL_WEAK (fn) = DECL_WEAK (var);
}
-  DECL_VISIBILITY (fn) = DECL_VISIBILITY (var);
-  DECL_VISIBILITY_SPECIFIED (fn) = DECL_VISIBILITY_SPECIFIED (var);
   DECL_DLLIMPORT_P (fn) = DECL_DLLIMPORT_P (var);
   DECL_IGNORED_P (fn) = 1;
   mark_used (fn);
@@ -3100,6 +3138,11 @@ get_tls_init_fn (tree var)
   DECL_BEFRIENDING_CLASSES (fn) = var;

   SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+  /* In LIPO mode make sure we record the new global value so that it
+ is cleared before parsing the next aux module.  */

Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-19 Thread Teresa Johnson
On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li  wrote:
> does generate_tls_wrapper also need to be suppressed for aux module?

No, we generate the wrapper in the aux module and use it to access the
TLS variable and invoke it's init function (which is defined in the
variable's own module). Presumably we could generate that only in the
original module and promote it if necessary, but generating it in the
aux module does allow it to be inlined. This is essentially how the
TLS accesses are handled for extern TLS variables defined elsewhere
(wrapper generated in referring module).

Teresa

>
> David
>
> On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson  wrote:
>> New patch below. Passes regression tests plus internal application build.
>>
>> 2015-03-19  Teresa Johnson  
>>
>> gcc/
>> Google ref b/19618364.
>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module.
>> (get_tls_init_fn): Promote non-public init functions if necessary
>> in LIPO mode, record new global at module scope.
>> (get_tls_wrapper_fn): Record new global at module scope.
>> (handle_tls_init): Skip in aux module, setup alias in exported
>> primary module.
>>
>> testsuite/
>> Google ref b/19618364.
>> * testsuite/g++.dg/tree-prof/lipo/tls.h: New test.
>> * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto.
>> * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto.
>> * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto.
>> * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto.
>> * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto.
>>
>> Index: cp/decl2.c
>> ===
>> --- cp/decl2.c  (revision 221425)
>> +++ cp/decl2.c  (working copy)
>> @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "c-family/c-ada-spec.h"
>>  #include "asan.h"
>> +#include "gcov-io.h"
>>
>>  extern cpp_reader *parse_in;
>>
>> @@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var)
>>  static tree
>>  get_local_tls_init_fn (void)
>>  {
>> +  /* In LIPO mode we should not generate local init functions for
>> + the aux modules (see handle_tls_init).  */
>> +  gcc_assert (!L_IPO_IS_AUXILIARY_MODULE);
>>tree sname = get_identifier ("__tls_init");
>>tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
>>if (!fn)
>> @@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var)
>>if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>>  return NULL_TREE;
>>
>> +  /* In LIPO mode we should not generate local init functions for
>> + aux modules. The wrapper will reference the variable's init function
>> + that is defined in its own primary module. Otherwise there is
>> + a name conflict between the primary and aux __tls_init functions,
>> + and difficulty ensuring each TLS variable is initialized exactly once.
>> + Therefore, if this is an aux module or an exported primary module, we
>> + need to ensure that the init function is available externally by
>> + promoting it if it is not already public. This is similar to the
>> + handling in promote_static_var_func, but we do it as the init function
>> + is created to avoid the need to detect and properly promote this
>> + artificial decl later on.  */
>> +  bool promote = L_IPO_IS_AUXILIARY_MODULE ||
>> +  (L_IPO_COMP_MODE && PRIMARY_MODULE_EXPORTED);
>> +
>>  #ifdef ASM_OUTPUT_DEF
>>/* If the variable is internal, or if we can't generate aliases,
>> - call the local init function directly.  */
>> -  if (!TREE_PUBLIC (var))
>> + call the local init function directly.  Don't do this if we
>> + are in LIPO mode an may need to export the init function.  Note
>> + that get_local_tls_init_fn will assert if it is called on an aux
>> + module.  */
>> +  if (!TREE_PUBLIC (var) && !promote)
>>  #endif
>>  return get_local_tls_init_fn ();
>>
>>tree sname = mangle_tls_init_fn (var);
>> +  if (promote)
>> +{
>> +  const char *name = IDENTIFIER_POINTER (sname);
>> +  char *assembler_name = (char*) alloca (strlen (name) + 30);
>> +  sprintf (assembler_name, "%s.cmo.%u", name, current_module_id);
>> +  sname = get_identifier (assembler_name);
>> +}
>> +
>>tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
>

Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-19 Thread Teresa Johnson
On Thu, Mar 19, 2015 at 8:00 PM, Xinliang David Li  wrote:
> ok -- then there is an over assertion in get_local_tls_init_fn. The
> method can be called for aux module -- the decl created will also be
> promoted.

No, it won't ever be called for an aux module. Both callsites are
guarded (handle_tls_init has an early return at the top, and the
callsite is guarded in get_tls_iniit_fn).

>
> Also can we rely on late promotion (special case of artificial
> function __tls_init? This can avoid duplicated logic here.

We don't need to promote __tls_init, but rather the init functions for
each TLS variable (which are each aliased to __tls_init). I thought
about both approaches, but promoting it as it was created seemed more
straightforward. I can give that approach a try though if you prefer
it (special casing DECL_ARTIFICIAL functions that start with _ZTH).
Incidentally they are not marked static, so that would also need to be
done to get them promoted.

Teresa

>
> David
>
> On Thu, Mar 19, 2015 at 5:51 PM, Teresa Johnson  wrote:
>> On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li  
>> wrote:
>>> does generate_tls_wrapper also need to be suppressed for aux module?
>>
>> No, we generate the wrapper in the aux module and use it to access the
>> TLS variable and invoke it's init function (which is defined in the
>> variable's own module). Presumably we could generate that only in the
>> original module and promote it if necessary, but generating it in the
>> aux module does allow it to be inlined. This is essentially how the
>> TLS accesses are handled for extern TLS variables defined elsewhere
>> (wrapper generated in referring module).
>>
>> Teresa
>>
>>>
>>> David
>>>
>>> On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson  
>>> wrote:
>>>> New patch below. Passes regression tests plus internal application build.
>>>>
>>>> 2015-03-19  Teresa Johnson  
>>>>
>>>> gcc/
>>>> Google ref b/19618364.
>>>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module.
>>>> (get_tls_init_fn): Promote non-public init functions if necessary
>>>> in LIPO mode, record new global at module scope.
>>>> (get_tls_wrapper_fn): Record new global at module scope.
>>>> (handle_tls_init): Skip in aux module, setup alias in exported
>>>> primary module.
>>>>
>>>> testsuite/
>>>> Google ref b/19618364.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls.h: New test.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto.
>>>> * testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto.
>>>>
>>>> Index: cp/decl2.c
>>>> ===
>>>> --- cp/decl2.c  (revision 221425)
>>>> +++ cp/decl2.c  (working copy)
>>>> @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "langhooks.h"
>>>>  #include "c-family/c-ada-spec.h"
>>>>  #include "asan.h"
>>>> +#include "gcov-io.h"
>>>>
>>>>  extern cpp_reader *parse_in;
>>>>
>>>> @@ -3024,6 +3025,9 @@ var_needs_tls_wrapper (tree var)
>>>>  static tree
>>>>  get_local_tls_init_fn (void)
>>>>  {
>>>> +  /* In LIPO mode we should not generate local init functions for
>>>> + the aux modules (see handle_tls_init).  */
>>>> +  gcc_assert (!L_IPO_IS_AUXILIARY_MODULE);
>>>>tree sname = get_identifier ("__tls_init");
>>>>tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
>>>>if (!fn)
>>>> @@ -3057,14 +3061,39 @@ get_tls_init_fn (tree var)
>>>>if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>>>>  return NULL_TREE;
>>>>
>>>> +  /* In LIPO mode we should not generate local init functions for
>>>> + aux modules. The wrapper will reference the variable's init function
>>>> + that is defined in its own primary module. Otherwise there is
>>>> + a name conflict between the primary and aux __tls_init functions,
>>>> + and difficulty ensuring each TLS variable is initialized e

Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-20 Thread Teresa Johnson
On Thu, Mar 19, 2015 at 10:38 PM, Xinliang David Li  wrote:
> On Thu, Mar 19, 2015 at 9:57 PM, Teresa Johnson  wrote:
>> On Thu, Mar 19, 2015 at 8:00 PM, Xinliang David Li  
>> wrote:
>>> ok -- then there is an over assertion in get_local_tls_init_fn. The
>>> method can be called for aux module -- the decl created will also be
>>> promoted.
>>
>> No, it won't ever be called for an aux module. Both callsites are
>> guarded (handle_tls_init has an early return at the top, and the
>> callsite is guarded in get_tls_iniit_fn).
>>
>>>
>>> Also can we rely on late promotion (special case of artificial
>>> function __tls_init? This can avoid duplicated logic here.
>>
>> We don't need to promote __tls_init, but rather the init functions for
>> each TLS variable (which are each aliased to __tls_init).
>
>> I thought
>> about both approaches, but promoting it as it was created seemed more
>> straightforward. I can give that approach a try though if you prefer
>> it (special casing DECL_ARTIFICIAL functions that start with _ZTH).
>> Incidentally they are not marked static, so that would also need to be
>> done to get them promoted.
>
> For public tls symbols in aux module, tls wrappers will reference
> _ZTHxx init functions which are aliases to __tls_init. If __tls_init
> is not created for aux modules, what symbol will those _ZTHxx
> funcitons aliased to?
>
> Is it better just let _tls_init function to be created as usual, but
> let the standard promotion kick in later? The ZTHxx functions can
> still be marked as alias to the promoted __tls_init?

So there are 3 function decls being generated for TLS variables:

1) A single __tls_init function per module
This initializes *all* TLS variables in the module, guarded by a
single __tls_guard variable to ensure it is only done once.

2) One _ZTHxx init function per TLS variable
Each of these is aliased to the module's __tls_init

3) One _ZTWxx wrapper function per TLS variable
This is called in place of accesses to that TLS variable. It invokes
the _ZTHxx init function for the variable to initialize it if
necessary.

I am concerned about generating the aux module __tls_init functions
and then promoting them as they contain initializations for all TLS
variables from the aux module - generating it each time the module is
included as an aux module seems inefficient at best.

What happens in the case where the TLS variable is declared extern and
accessed by a separate module (and essentially is what happens with my
patch when it is file static, promoted and accessed via an aux module)
is this: The _ZTHxx init functions are aliased to __tls_init in the
module containing the definition, and are externally-visible (after
promoting in the file-static exported primary module case). The
external module containing the reference (or the primary module that
imported the defining module as an aux module in the LIPO case)
generates a _ZTWxx wrapper for the TLS variable, which references its
_ZTHxx init function. This ends up resolved to the exported/promoted
_ZTHxx symbol in the defining module, which in turn is aliased to the
defining module's __tls_init function.

Teresa

>
> David
>
>
>
>
>>
>> Teresa
>>
>>>
>>> David
>>>
>>> On Thu, Mar 19, 2015 at 5:51 PM, Teresa Johnson  
>>> wrote:
>>>> On Thu, Mar 19, 2015 at 4:45 PM, Xinliang David Li  
>>>> wrote:
>>>>> does generate_tls_wrapper also need to be suppressed for aux module?
>>>>
>>>> No, we generate the wrapper in the aux module and use it to access the
>>>> TLS variable and invoke it's init function (which is defined in the
>>>> variable's own module). Presumably we could generate that only in the
>>>> original module and promote it if necessary, but generating it in the
>>>> aux module does allow it to be inlined. This is essentially how the
>>>> TLS accesses are handled for extern TLS variables defined elsewhere
>>>> (wrapper generated in referring module).
>>>>
>>>> Teresa
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Thu, Mar 19, 2015 at 4:11 PM, Teresa Johnson  
>>>>> wrote:
>>>>>> New patch below. Passes regression tests plus internal application build.
>>>>>>
>>>>>> 2015-03-19  Teresa Johnson  
>>>>>>
>>>>>> gcc/
>>>>>> Google ref b/19618364.
>>>>>> * cp/decl2.c (get_local_tls_init_fn): Assert on aux module.
>>>>>> (get_tls

Re: [GOOGLE] Fixes for TLS wrapper and init functions in LIPO mode

2015-03-20 Thread Teresa Johnson
After offline discussions with David, I changed to the approach of
generating __tls_init for the aux module, but then modifying the LIPO
code to allow it to be statically promoted and treated as an aux
function as a special case of an artificial decl. That works well. New
patch below. Fixed the tests to validate rather than print the
expected output.

Ok for google branches?

2015-03-20  Teresa Johnson  

gcc/
Google ref b/19618364.
* cp/decl2.c (get_local_tls_init_fn): Mark static,
record new global at module scope.
(get_tls_init_fn): Record new global at module scope.
(get_tls_wrapper_fn): Ditto.
(handle_tls_init): Suppress alias in aux module.
* l-ipo.c (decl_artificial_nopromote): New function.
(cgraph_is_aux_decl_external): Call new function.
(process_module_scope_static_func): Ditto.

testsuite/
Google ref b/19618364.
* testsuite/g++.dg/tree-prof/lipo/tls.h: New test.
* testsuite/g++.dg/tree-prof/lipo/tls2.h: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls2_0.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls2_1.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls_0.C: Ditto.
* testsuite/g++.dg/tree-prof/lipo/tls_1.C: Ditto.

Index: cp/decl2.c
===
--- cp/decl2.c  (revision 221425)
+++ cp/decl2.c  (working copy)
@@ -3033,9 +3033,15 @@ get_local_tls_init_fn (void)
  void_list_node));
   SET_DECL_LANGUAGE (fn, lang_c);
   TREE_PUBLIC (fn) = false;
+  TREE_STATIC (fn) = true;
   DECL_ARTIFICIAL (fn) = true;
   mark_used (fn);
   SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+  /* In LIPO mode make sure we record the new global value so that it
+ is cleared before parsing the next aux module.  */
+  if (L_IPO_COMP_MODE && !is_parsing_done_p ())
+add_decl_to_current_module_scope (fn,
+  NAMESPACE_LEVEL (global_namespace));
 }
   return fn;
 }
@@ -3100,6 +3106,11 @@ get_tls_init_fn (tree var)
   DECL_BEFRIENDING_CLASSES (fn) = var;

   SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+  /* In LIPO mode make sure we record the new global value so that it
+ is cleared before parsing the next aux module.  */
+  if (L_IPO_COMP_MODE && !is_parsing_done_p ())
+add_decl_to_current_module_scope (fn,
+  NAMESPACE_LEVEL (global_namespace));
 }
   return fn;
 }
@@ -3157,6 +3168,11 @@ get_tls_wrapper_fn (tree var)
   DECL_BEFRIENDING_CLASSES (fn) = var;

   SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+  /* In LIPO mode make sure we record the new global value so that it
+ is cleared before parsing the next aux module.  */
+  if (L_IPO_COMP_MODE && !is_parsing_done_p ())
+add_decl_to_current_module_scope (fn,
+  NAMESPACE_LEVEL (global_namespace));
 }
   return fn;
 }
@@ -4213,8 +4229,12 @@ handle_tls_init (void)
   one_static_initialization_or_destruction (var, init, true);

 #ifdef ASM_OUTPUT_DEF
-  /* Output init aliases even with -fno-extern-tls-init.  */
-  if (TREE_PUBLIC (var))
+  /* Output init aliases even with -fno-extern-tls-init.  Don't emit
+ aliases in LIPO aux modules, since the corresponding __tls_init
+ will be static promoted and deleted, so the variable's tls init
+ function will be resolved by its own primary module.  An alias
+ would prevent the promoted aux __tls_init from being deleted.  */
+  if (TREE_PUBLIC (var) && !L_IPO_IS_AUXILIARY_MODULE)
{
   tree single_init_fn = get_tls_init_fn (var);
  if (single_init_fn == NULL_TREE)
Index: l-ipo.c
===
--- l-ipo.c (revision 221425)
+++ l-ipo.c (working copy)
@@ -1120,6 +1120,27 @@ cgraph_unify_type_alias_sets (void)
   htab_delete (type_hash_tab);
 }

+/* Return true if DECL is an artificial function that we do not want
+   to promote and which may not be available in the primary module.
+   The sole exception is currently __tls_init.  */
+
+static bool
+decl_artificial_nopromote (tree decl)
+{
+  if (!DECL_ARTIFICIAL (decl))
+return false;
+
+  /* Handle the __tls_init function specially as we do want to promote it and
+ allow the aux module to be resolved to the version in the primary module.
+ We check if it is prefixed by __tls_init to catch it after promotion
+ as well from cgraph_is_aux_decl_external.  */
+  if (!strncmp (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)), "__tls_init",
+10))
+return false;
+
+  return true;
+}
+
 /* Return true if NODE->decl from an auxiliary module has external
definition (and therefor

Re: [GOOGLE] Only save command-line options for LIPO

2015-07-08 Thread Teresa Johnson
Resending in plain text mode.

On Wed, Jul 8, 2015 at 11:34 AM, Teresa Johnson  wrote:
> This patch avoids saving optimization pragma or function level attributes in
> the list of command line options saved in the LIPO module info structure.
> Passes regression and internal testing. Ok for google/4_9?
>
> Thanks,
> Teresa
>
> 2015-07-08  Teresa Johnson  
>
> * gcc/c-family/c-common.c (parse_optimize_options): New parameter.
> * gcc/opts-global.c (read_cmdline_options): Use new parameter to
> guard saving of LIPO command line options.
> (decode_options): New parameter.
> * gcc/opts.h (extern void decode_options): Ditto.
> * gcc/toplev.c (toplev_main): Ditto.
>
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c (revision 225521)
> +++ gcc/c-family/c-common.c (working copy)
> @@ -9200,7 +9200,7 @@ parse_optimize_options (tree args, bool attr_p)
> &decoded_options_count);
>decode_options (&global_options, &global_options_set,
>   decoded_options, decoded_options_count,
> - input_location, global_dc);
> + input_location, global_dc, false);
>
>targetm.override_options_after_change();
>
> Index: gcc/opts-global.c
> ===
> --- gcc/opts-global.c   (revision 225521)
> +++ gcc/opts-global.c   (working copy)
> @@ -299,7 +299,9 @@ lipo_save_cl_args (struct cl_decoded_option *decod
> the results of processing DECODED_OPTIONS and DECODED_OPTIONS_COUNT
> in OPTS and OPTS_SET and using DC for diagnostic state.  LANG_MASK
> contains has a single bit set representing the current language.
> -   HANDLERS describes what functions to call for the options.  */
> +   HANDLERS describes what functions to call for the options.
> +   If COMMAND_LINE is true, this is being invoked for file level command
> +   line options, otherwise for an optimize pragma or function attribute.
> */
>
>  static void
>  read_cmdline_options (struct gcc_options *opts, struct gcc_options
> *opts_set,
> @@ -308,7 +310,8 @@ read_cmdline_options (struct gcc_options *opts, st
>   location_t loc,
>   unsigned int lang_mask,
>   const struct cl_option_handlers *handlers,
> - diagnostic_context *dc)
> + diagnostic_context *dc,
> +  bool command_line)
>  {
>unsigned int i;
>int force_multi_module = 0;
> @@ -341,7 +344,8 @@ read_cmdline_options (struct gcc_options *opts, st
>read_cmdline_option (opts, opts_set,
>decoded_options + i, loc, lang_mask, handlers,
>dc);
> -  lipo_save_cl_args (decoded_options + i);
> +  if (command_line)
> +lipo_save_cl_args (decoded_options + i);
>  }
>  }
>
> @@ -393,12 +397,14 @@ set_default_handlers (struct cl_option_handlers *h
>  /* Parse command line options and set default flag values.  Do minimal
> options processing.  The decoded options are in *DECODED_OPTIONS
> and *DECODED_OPTIONS_COUNT; settings go in OPTS, OPTS_SET and DC;
> -   the options are located at LOC.  */
> +   the options are located at LOC.  If COMMAND_LINE is true, this is
> +   being invoked for file level command line options, otherwise for
> +   an optimize pragma or function attribute.  */
>  void
>  decode_options (struct gcc_options *opts, struct gcc_options *opts_set,
> struct cl_decoded_option *decoded_options,
> unsigned int decoded_options_count,
> -   location_t loc, diagnostic_context *dc)
> +   location_t loc, diagnostic_context *dc, bool command_line)
>  {
>struct cl_option_handlers handlers;
>
> @@ -415,7 +421,7 @@ decode_options (struct gcc_options *opts, struct g
>read_cmdline_options (opts, opts_set,
> decoded_options, decoded_options_count,
> loc, lang_mask,
> -   &handlers, dc);
> +   &handlers, dc, command_line);
>
>finish_options (opts, opts_set, loc);
>  }
> Index: gcc/opts.h
> ===
> --- gcc/opts.h  (revision 225521)
> +++ gcc/opts.h  (working copy)
> @@ -344,7 +344,8 @@ extern void decode_options (struct gcc_options *op
> struct cl_decoded_option *decoded_options,
> unsigned int decoded_options_count,
>

Re: [PATCH]Partially fix PR61529, bound basic block frequency

2014-10-29 Thread Teresa Johnson
Hi Renlin,

Are the incoming edge counts or probabilities insane in this case? I
guess the patch is ok if we need to do this to handle those incoming
insanitiles. But I can't approve patches myself.

However, this is a fix to code (r215739) committed after the ICE in
the original bug report and in comment 2 were reported, so I wonder if
it is just hiding the original problem. Originally this was reported
to be due to r210538 - ccing Dehao who was the author of that patch.
Dehao, did you get a chance to look at this bug and see why your
change triggered it? It is possible that Dehao's patch simply
amplified an even further upstream profile insanity, but it would be
good to confirm.

Thanks!
Teresa

On Wed, Oct 29, 2014 at 2:26 AM, Renlin Li  wrote:
> Hi all,
>
> This is a simple patch to fix ICE in comment 2 of PR61529:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61529
>
> Bound checking code is added to make sure the frequency is within legal
> range.
>
> As far as I have observed, r215830 patch fixes the glibc building ICE. And
> this patch should fix the ICE while building the sample code in comment 2
> using aarch64-none-elf toolchain. Until now, all the ICEs reported in this
> bug ticket should be fixed.
>
> x86_64-unknown-linux-gnu bootstrap and regression test have been done, no
> new issue.
> aarch64-none-elf toolchain has been test on the model. No new regression.
>
> Is this Okay for trunk?
>
> gcc/ChangeLog:
>
> 2014-10-29  Renlin Li  
>  PR middle-end/61529
> * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH]Partially fix PR61529, bound basic block frequency

2014-11-06 Thread Teresa Johnson
On Thu, Nov 6, 2014 at 7:09 AM, Renlin Li  wrote:
>
> Hi Jeff,
>
> Test case has been added. With the patch, both x86_64-unknown-linux-gnu and
> aarch64-none-elf compile the test case successfully.
>
> Okay to commit?
>
>
> On 04/11/14 21:59, Jeff Law wrote:
>>
>> On 11/03/14 08:29, Renlin Li wrote:
>>>
>>> On 29/10/14 12:42, Teresa Johnson wrote:
>>>>
>>>> Hi Renlin,
>>>>
>>>> Are the incoming edge counts or probabilities insane in this case? I
>>>> guess the patch is ok if we need to do this to handle those incoming
>>>> insanitiles. But I can't approve patches myself.
>>>
>>> Not really, it's just a little bigger than the limit.
>>>
>>> For this particular test case, ABC is a threaded path.
>>> B is the fallthrough basic block of A, D is a basic block split from B
>>> (used to be a self loop). A, B and D have roughly the same frequency (
>>> 8281, 9100, 8281).
>>> When calculating the path_in_freq, frequencies from AB and DB edges are
>>> accumulated, and the final result is large than BB_FREQ_MAX.
>>>
>>>
>>> A
>>> 100% |
>>> |  9%
>>> -->B-->C
>>> | |
>>> |100%| 91%
>>> | |
>>> D

The frequencies look insane given these probabilities. If most of the
execution stays in the loop then B should have a much higher frequency
than A.

>>>
>>>
>>>
>>> There are 2 suspicious points:
>>> 1, The BD edge is not correctly guessed at the profile stage. However,
>>> anyway it's heuristic, so I don't think, it's here the problem starts.
>>> 2, The BD edge is not eliminated before jump threading. But the jump
>>> threading pass will analysis the condition jump statement in B block (In
>>> this particular case, the BD probability should be zero), and makes the
>>> decision to thread it.
>>>
>>> Later in the dom pass, the BD edge is indeed removed.
>>
>> Can you add a testcase please?  With a testcase, this patch is OK for
>> the trunk.
>>
>> jeff
>>
>
> x86_64-unknown-linux-gnu bootstrap and regression test have been done, no
> new issue.
> aarch64-none-elf toolchain has been test on the model. No new regression.
>
> gcc/ChangeLog:
>
> 2014-11-06  Renlin Li  
> PR middle-end/61529
> * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq.

Please add a comment that this is needed due to insane incoming frequencies.

>
> gcc/testsuite/ChangeLog:
>
> 2014-11-06  Renlin Li  
> PR middle-end/61529
> * gcc.dg/pr61529.c: New.

The 'b' variable is uninitialized. Also, 'd' and 'a' may end up
uninitialized depending on the initial value of 'b'. Please initialize
these.

Thanks,
Teresa


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH]Partially fix PR61529, bound basic block frequency

2014-11-06 Thread Teresa Johnson
Thanks for fixing the test case. Can you also add the comment I
suggested to the source change?

> Please add a comment that this is needed due to insane incoming frequencies.

Thanks,
Teresa

On Thu, Nov 6, 2014 at 9:53 AM, Renlin Li  wrote:
> Hi Teresa,
>
> Thank you for the suggestion, updated!
>
>> Please add a comment that this is needed due to insane incoming
>> frequencies.
>>
>> The 'b' variable is uninitialized. Also, 'd' and 'a' may end up
>> uninitialized depending on the initial value of 'b'. Please initialize
>> these.
>>
>
> Test case has been added. With the patch, both x86_64-unknown-linux-gnu and
> aarch64-none-elf compile the test case successfully.
>
> x86_64-unknown-linux-gnu bootstrap and regression test have been done, no
> new issue.
> aarch64-none-elf toolchain has been test on the model. No new regression.
>
> gcc/ChangeLog:
>
> 2014-11-06  Renlin Li  
> PR middle-end/61529
> * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq.
>  This is needed due to insane incoming frequencies.
>
> gcc/testsuite/ChangeLog:
>
> 2014-11-06  Renlin Li  
> PR middle-end/61529
> * gcc.dg/pr61529.c: New.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PING][PATCH]Partially fix PR61529, bound basic block frequency

2014-11-10 Thread Teresa Johnson
Hi Renlin,

Looks like Jeff already approved it:

>Can you add a testcase please?  With a testcase, this patch is OK for the 
>trunk.

Teresa

On Mon, Nov 10, 2014 at 8:59 AM, Renlin Li  wrote:
> On 06/11/14 18:07, Renlin Li wrote:
>>
>> On 06/11/14 17:59, Teresa Johnson wrote:
>>>
>>> Thanks for fixing the test case. Can you also add the comment I
>>> suggested to the source change?
>>>
>>>> Please add a comment that this is needed due to insane incoming
>>>> frequencies.
>>>>
>>
>> Sorry, I mistakenly add it to the ChangeLog. Should be correct now.
>>
>>
>> x86_64-unknown-linux-gnu bootstrap and regression test have been done, no
>> new issue.
>> aarch64-none-elf toolchain has been test on the model. No new regression.
>>
>> gcc/ChangeLog:
>>
>> 2014-11-06  Renlin Li 
>> PR middle-end/61529
>> * tree-ssa-threadupdate.c (compute_path_counts): Bound path_in_freq.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2014-11-06  Renlin Li 
>> PR middle-end/61529
>> * gcc.dg/pr61529.c: New.
>
>
> Hi,
>
> Can anybody help to review or approve this?
>
> Kind regards,
> Renlin Li
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Teresa Johnson
This patch fixes an issue where tree-strlen was incorrectly removing a
store of 0 into a string because it thought a prior CLOBBER (which is
an empty constructor with no elements) was zero-initializing the
string.

Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?

Thanks,
Teresa

2014-11-12

PR tree-optimization/63841
* tree.c (initializer_zerop): A constructor with no elements
does not zero initialize.

Index: tree.c
===
--- tree.c  (revision 217190)
+++ tree.c  (working copy)
@@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
   {
unsigned HOST_WIDE_INT idx;

+if (!CONSTRUCTOR_NELTS (init))
+  return false;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
  if (!initializer_zerop (elt))
return false;


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Teresa Johnson
Added testcase. Here is the new patch:

2014-11-12

gcc:
PR tree-optimization/63841
* tree.c (initializer_zerop): A constructor with no elements
does not zero initialize.

gcc/testsuite:
* g++.dg/tree-ssa/pr63841.C: New test.

Index: tree.c
===
--- tree.c  (revision 217190)
+++ tree.c  (working copy)
@@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
   {
unsigned HOST_WIDE_INT idx;

+if (!CONSTRUCTOR_NELTS (init))
+  return false;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
  if (!initializer_zerop (elt))
return false;
Index: testsuite/g++.dg/tree-ssa/pr63841.C
===
--- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+std::string __attribute__ ((noinline)) comp_test_write() {
+  std::string data;
+
+  for (int i = 0; i < 2; ++i) {
+char b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+std::string __attribute__ ((noinline)) comp_test_write_good() {
+  std::string data;
+
+  char b;
+  for (int i = 0; i < 2; ++i) {
+b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+int main() {
+  std::string good = comp_test_write_good();
+  printf("expected: %hx\n", *(short*)good.c_str());
+
+  std::string bad = comp_test_write();
+  printf("got: %hx\n", *(short*)bad.c_str());
+
+  return good != bad;
+}

On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li  wrote:
> missing test case?
>
> David
>
> On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  wrote:
>> This patch fixes an issue where tree-strlen was incorrectly removing a
>> store of 0 into a string because it thought a prior CLOBBER (which is
>> an empty constructor with no elements) was zero-initializing the
>> string.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2014-11-12
>>
>> PR tree-optimization/63841
>> * tree.c (initializer_zerop): A constructor with no elements
>> does not zero initialize.
>>
>> Index: tree.c
>> ===
>> --- tree.c  (revision 217190)
>> +++ tree.c  (working copy)
>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>{
>> unsigned HOST_WIDE_INT idx;
>>
>> +    if (!CONSTRUCTOR_NELTS (init))
>> +  return false;
>>     FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>   if (!initializer_zerop (elt))
>> return false;
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


<    1   2   3   4   5   6   >