On 08/18/2016 06:06 PM, Richard Biener wrote: > On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <ja...@redhat.com> > wrote: >> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote: >>>> I'd prefer to make updates atomic in multi-threaded applications. >>>> The best proxy we have for that is -pthread. >>>> >>>> Is it slower, most definitely, but odds are we're giving folks >>>> garbage data otherwise, which in many ways is even worse. >>> >>> It will likely be catastrophically slower in some cases. >>> >>> Catastrophically as in too slow to be usable. >>> >>> An atomic instruction is a lot more expensive than a single >> increment. Also >>> they sometimes are really slow depending on the state of the machine. >> >> Can't we just have thread-local copies of all the counters (perhaps >> using >> __thread pointer as base) and just atomically merge at thread >> termination? > > I suggested that as well but of course it'll have its own class of issues > (short lived threads, so we need to somehow re-use counters from terminated > threads, large number of threads and thus using too much memory for the > counters) > > Richard.
Hello. I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time. I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach. All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s tramp3d (1 thread, -O3): 18.0, 46.6, 168s So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option selection, but these numbers can probably help. Thoughts? Martin > >> Jakub > >