On 08/18/2016 06:06 PM, Richard Biener wrote:
> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <ja...@redhat.com> 
> wrote:
>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>> The best proxy we have for that is -pthread.
>>>>
>>>> Is it slower, most definitely, but odds are we're giving folks
>>>> garbage data otherwise, which in many ways is even worse.
>>>
>>> It will likely be catastrophically slower in some cases. 
>>>
>>> Catastrophically as in too slow to be usable.
>>>
>>> An atomic instruction is a lot more expensive than a single
>> increment. Also
>>> they sometimes are really slow depending on the state of the machine.
>>
>> Can't we just have thread-local copies of all the counters (perhaps
>> using
>> __thread pointer as base) and just atomically merge at thread
>> termination?
> 
> I suggested that as well but of course it'll have its own class of issues 
> (short lived threads, so we need to somehow re-use counters from terminated 
> threads, large number of threads and thus using too much memory for the 
> counters)
> 
> Richard.

Hello.

I've got written the approach on my TODO list, let's see whether it would be 
doable in a reasonable amount of time.

I've just finished some measurements to illustrate slow-down of 
-fprofile-update=atomic approach.
All numbers are: no profile, -fprofile-generate, -fprofile-generate 
-fprofile-update=atomic
c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
tramp3d (1 thread, -O3): 18.0, 46.6, 168s

So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having 
much experience with default option
selection, but these numbers can probably help.

Thoughts?
Martin

> 
>>      Jakub
> 
> 

Reply via email to