https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744

--- Comment #28 from torvald at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #20)
> (In reply to Gleb Natapov from comment #19)
> > (In reply to Jakub Jelinek from comment #18)
> > > (In reply to Gleb Natapov from comment #16)
> > > > Can you suggest an alternative to libgcc patch? Use other TLS model?
> > > > Allocate per thread storage dynamically somehow?
> > > 
> > > If we want to use TLS (which I hope we don't), then e.g. a single __thread
> > > pointer with some registered destructor that would free it on process exit
> > > could do the job, and on the first exception it would try to allocate 
> > > memory
> > > for the cache and other stuff and use that (otherwise, if memory 
> > > allocation
> > > fails, just take a lock and be non-scalable).
> > >
> > I see that sjlj uses __gthread_setspecific/__gthread_getspecific. Can we do
> > the same here?
> 
> Can? Yes.  Want?  Nope.  It is worse than TLS.
> 
> > > Another alternative, perhaps much better, if Torvald is going to improve
> > > rwlocks sufficiently, would be to use rwlock to guard writes to the cache
> > > etc. too, and perhaps somewhat enlarge the cache (either statically, or
> > > allow extending it through allocation).
> > > I'd expect that usually these apps that use exceptions too much only care
> > > about a couple of shared libraries, so writes to the cache ought to be 
> > > rare.
> > >
> > As I said in my previous reply, I tested the new rwlock and in congested 
> > case
> > it still slows does the system significantly, not the implementation fault,
> > cpu just does not like locked instruction much. Not having a lock will be
> > significantly better.
> 
> You still need at least one lock, the array of locks is definitely a bad
> idea.

I don't think it is necessarily a bad idea -- spreading out the data used for
synchronization to avoid cache conflicts when there is an asymmetrical workload
is a commonly used technique.  But we need to find a balance between compute
and space overhead that works for us and in this case.  Also, spreading out the
data should ideally happen in a NUMA-aware way, so there may be more space
overhead if we consider that too.

I've been thinking about using more scalable ways of locking in glibc which
would use per-thread data, which could be useful for more than one lock.  So if
we had that, we could just use that to deal with the exception scalability
problem, without having space overhead just for exceptions.b

Reply via email to