* Mike Galbraith <efa...@gmx.de> wrote:

> On Sun, 2012-09-16 at 12:57 -0700, Linus Torvalds wrote: 
> > On Sat, Sep 15, 2012 at 9:35 PM, Mike Galbraith <efa...@gmx.de> wrote:
> > >
> > > Oh, while I'm thinking about it, there's another scenario that could
> > > cause the select_idle_sibling() change to affect pgbench on largeish
> > > packages, but it boils down to preemption odds as well.
> > 
> > So here's a possible suggestion..
> > 
> > Let's assume that the scheduler code to find the next idle CPU on the
> > package is actually a good idea, and we shouldn't mess with the idea.
> 
> We should definitely mess with the idea, as it causes some problems.
> 
> > But at the same time, it's clearly an *expensive* idea, 
> > which is why you introduced the "only test a single CPU 
> > buddy" approach instead. But that didn't work, and you can 
> > come up with multiple reasons why it wouldn't work. Plus, 
> > quite fundamentally, it's rather understandable that "try to 
> > find an idle CPU on the same package" really would be a good 
> > idea, right?
> 
> I would argue that it did work, it shut down the primary 
> source of pain, which I do not believe to be the traversal 
> cost, rather the bouncing.
> 
>     4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is 
> better:
>     
>      clients     1       2       4        8       16       32       64      
> 128
>      
> ..........................................................................
>      pre        30      41     118      645     3769     6214    12233    
> 14312
>      post      299     603    1211     2418     4697     6847    11606    
> 14557

That's a very tempting speedup for a simpler and more 
fundamental workload than postgresql's somewhat weird
user-space spinlocks that burn CPU time in user-space
instead of blocking/waiting on a futex.

IIRC mysql does this properly and outperforms postgresql
on this benchmark, in an apples-to-apples configuration?

> 10x at 1 pair shouldn't be traversal, the whole box is 
> otherwise idle. We'll do a lot more (ever more futile) 
> traversal as load increases, but at the same time, our futile 
> attempts fail more frequently, so we shoot ourselves in the 
> foot less frequently.
> 
> The down side is (appears to be) that I also shut down some 
> ~odd case preemption salvation, salvation that only large 
> packages will receive.
> 
> The problem as I see it is that we're making light tasks _too_ 
> mobile, turning an optimization into a pessimization for light 
> tasks.  For longer running tasks this mobility within a large 
> package isn't such a big deal, but for fast movers, it hurts a 
> lot.

There's not enough time to resolve this for v3.6, so I agree 
with the revert - would you be willing to post a v2 of your 
original patch? I really think we want your tbench speedups, 
quite a few real-world messaging applications use the tbench 
patterns of scheduling.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to