On Fri, Nov 06, 2020 at 10:58:58AM +0800, Li, Aubrey wrote:

> > 
> >     -- workload D, new added syscall workload, performance drop in cs_on:
> >     +----------------------+------+-------------------------------+
> >     |                      | **   | will-it-scale  * 192          |
> >     |                      |      | (pipe based context_switch)   |
> >     +======================+======+===============================+
> >     | cgroup               | **   | cg_will-it-scale              |
> >     +----------------------+------+-------------------------------+
> >     | record_item          | **   | threads_avg                   |
> >     +----------------------+------+-------------------------------+
> >     | coresched_normalized | **   | 0.2                           |
> >     +----------------------+------+-------------------------------+
> >     | default_normalized   | **   | 1                             |
> >     +----------------------+------+-------------------------------+
> >     | smtoff_normalized    | **   | 0.89                          |
> >     +----------------------+------+-------------------------------+
> 
> will-it-scale may be a very extreme case. The story here is,
> - On one sibling reader/writer gets blocked and tries to schedule another 
> reader/writer in.
> - The other sibling tries to wake up reader/writer.
> 
> Both CPUs are acquiring rq->__lock,
> 
> So when coresched off, they are two different locks, lock stat(1 second 
> delta) below:
> 
> class name    con-bounces    contentions   waittime-min   waittime-max 
> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   
> holdtime-max holdtime-total   holdtime-avg
> &rq->__lock:          210            210           0.10           3.04        
>  180.87           0.86            797       79165021           0.03          
> 20.69    60650198.34           0.77
> 
> But when coresched on, they are actually one same lock, lock stat(1 second 
> delta) below:
> 
> class name    con-bounces    contentions   waittime-min   waittime-max 
> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   
> holdtime-max holdtime-total   holdtime-avg
> &rq->__lock:      6479459        6484857           0.05         216.46    
> 60829776.85           9.38        8346319       15399739           0.03       
>    95.56    81119515.38           5.27
> 
> This nature of core scheduling may degrade the performance of similar 
> workloads with frequent context switching.

When core sched is off, is SMT off as well? From the above table, it seems to
be. So even for core sched off, there will be a single lock per physical CPU
core (assuming SMT is also off) right? Or did I miss something?

thanks,

 - Joel

Reply via email to