http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051



--- Comment #26 from Teresa Johnson <tejohnson at google dot com> 2012-11-15 
22:42:12 UTC ---

On Thu, Nov 15, 2012 at 6:33 AM, Teresa Johnson <tejohn...@google.com> wrote:

>

>

>

> On Thu, Nov 15, 2012 at 2:56 AM, hubicka at ucw dot cz

> <gcc-bugzi...@gcc.gnu.org> wrote:

>>

>>

>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051

>>

>> --- Comment #24 from Jan Hubicka <hubicka at ucw dot cz> 2012-11-15

>> 10:56:53 UTC ---

>> > Note though that this is not an assert. It just emits a message to

>> > stderr. Do you think a better error message is appropriate? I'm not

>> > sure the "some data files may have been removed" is an accurate

>> > description of the issue. Perhaps something like "Profile data file

>> > mismatch may indicate corrupt profile data"?

>>

>> Well, we should figure out why sum_all starts to diverge.  If we had

>> problems mixing cc1 and cc1plus executions, we should get mismatches in

>> number of counters.

>

>

> Right, it doesn't appear to be different executables since the number of

> counters is identical. I'll instrument it and see if I can figure out why

> they diverge.

>

>>

>> What happens after the miscompare?

>

>

> A flag is set so that the error is emitted at most once per merge, and then

> we continue on with the merge and ignore it. Basically what it is doing is

> saving the first merged summary (for the first object file's gcda we merge

> into), and then for each additional object file that gets its counters

> merged the resulting program summary is compared against the saved program

> summary. But only if the number of runs is the same as the saved summary.

> This could happen if the gcda files are walked in a different order during

> updates (i.e. the gcov_list is in a different order for different processes

> of the same executable), but I am not sure if that can happen.



It appears that this is what is happening, and I think it makes sense

that it can.



We're essentially doing this:



  /* Now merge each file.  */

  for (gi_ptr = gcov_list; gi_ptr; gi_ptr = gi_ptr->next)

    {

        // Open existing gcda file for gi_ptr

        // Find program summary corresponding to this executable -> save in prg

        // Merge execution counts for each function

        // Merge program summary

        //      - If this is the first merged file for this execution,

save merged summary in all_prg

        //      - Otherwise if #runs the same in prg and all_prg,

print error message if prg != all_prg.

        // Write merged gcda

    }



I found that in a couple cases, we printed the error message for

libcpp/directives.gcda, where the saved all_prg summary was from

gcc/gcc.gcda.



I then instrumented the code so that each time we merge into one of

these 2 gcda files I emit the pids, the number of runs, the number of

counters and the merged sum_all. Comparing the results from all the

merges to these two gcda files I see that most of the time the merges

proceed in the same order, but there are a few cases where the order

is different, resulting in a different sum_all with the same number of

runs, and then things go back to normal and the sum_all matches again.

E.g., here is one place where things get out of order briefly,

resulting in one of the error messages being printed:



...

pid 28432 ppid 28429 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 254 num

13193 sum_all 17058327

pid 28437 ppid 28365 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 255 num

13193 sum_all 17064832

pid 28439 ppid 28367 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 256 num

13193 sum_all 17071340

pid 28440 ppid 28436 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 257 num

13193 sum_all 17177525

...



vs



...

pid 28432 ppid 28429 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs

254 num 13193 sum_all 17058327

pid 28439 ppid 28367 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs

255 num 13193 sum_all 17064835

pid 28437 ppid 28365 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs

256 num 13193 sum_all 17071340

pid 28440 ppid 28436 Merging summary for

/home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs

257 num 13193 sum_all 17177525

...



Notice the middle two pids are flipped, resulting in the sum_all being

different after run 255, and back to the same after run 256.



I believe this could happen if pids 28437 and 28439 finished

near-simultaneously, waited for the lock for gcc.gcda, and 28437 won

first, but then by some luck of timing they subsequently both

attempted to open directives.gcda at around the same time and 28439

happened to win the lock in the fcntl loop first.



I believe it is also possible for object files to be in different

orders in the gcov_list in different processes, since they are added

to the head of that list in __gcov_init, which is invoked when running

an object file's global constructors, according to the header comment.

And for C++ at least, the order of initialization across translation

units is undefined. That could also cause the sum_all to go

temporarily out of sync between different object file gcda files.



Overall, I think it makes sense to remove this check altogether. Would

you agree? Testing the patch to remove this right now.



Teresa



>

> Teresa

>

>>

>> Honza

>>

>> --

>> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email

>> ------- You are receiving this mail because: -------

>> You are on the CC list for the bug.

>

>

>

>

> --

> Teresa Johnson | Software Engineer |  tejohn...@google.com |  408-460-2413

>







--

Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to