On 2020/11/7 1:54, Joel Fernandes wrote:
> On Fri, Nov 06, 2020 at 10:58:58AM +0800, Li, Aubrey wrote:
> 
>>>
>>>     -- workload D, new added syscall workload, performance drop in cs_on:
>>>     +----------------------+------+-------------------------------+
>>>     |                      | **   | will-it-scale  * 192          |
>>>     |                      |      | (pipe based context_switch)   |
>>>     +======================+======+===============================+
>>>     | cgroup               | **   | cg_will-it-scale              |
>>>     +----------------------+------+-------------------------------+
>>>     | record_item          | **   | threads_avg                   |
>>>     +----------------------+------+-------------------------------+
>>>     | coresched_normalized | **   | 0.2                           |
>>>     +----------------------+------+-------------------------------+
>>>     | default_normalized   | **   | 1                             |
>>>     +----------------------+------+-------------------------------+
>>>     | smtoff_normalized    | **   | 0.89                          |
>>>     +----------------------+------+-------------------------------+
>>
>> will-it-scale may be a very extreme case. The story here is,
>> - On one sibling reader/writer gets blocked and tries to schedule another 
>> reader/writer in.
>> - The other sibling tries to wake up reader/writer.
>>
>> Both CPUs are acquiring rq->__lock,
>>
>> So when coresched off, they are two different locks, lock stat(1 second 
>> delta) below:
>>
>> class name    con-bounces    contentions   waittime-min   waittime-max 
>> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   
>> holdtime-max holdtime-total   holdtime-avg
>> &rq->__lock:          210            210           0.10           3.04       
>>   180.87           0.86            797       79165021           0.03         
>>  20.69    60650198.34           0.77
>>
>> But when coresched on, they are actually one same lock, lock stat(1 second 
>> delta) below:
>>
>> class name    con-bounces    contentions   waittime-min   waittime-max 
>> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   
>> holdtime-max holdtime-total   holdtime-avg
>> &rq->__lock:      6479459        6484857           0.05         216.46    
>> 60829776.85           9.38        8346319       15399739           0.03      
>>     95.56    81119515.38           5.27
>>
>> This nature of core scheduling may degrade the performance of similar 
>> workloads with frequent context switching.
> 
> When core sched is off, is SMT off as well? From the above table, it seems to
> be. So even for core sched off, there will be a single lock per physical CPU
> core (assuming SMT is also off) right? Or did I miss something?
> 

The table includes 3 cases:
- default:      SMT on,  coresched off
- coresched:    SMT on,  coresched on
- smtoff:       SMT off, coresched off

I was comparing the default(coresched off & SMT on) case with (coresched
on & SMT on) case.

If SMT off, then reader and writer on the different cores have different 
rq->lock,
so the lock contention is not that serious.

class name    con-bounces    contentions   waittime-min   waittime-max 
waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   
holdtime-max holdtime-total   holdtime-avg
&rq->__lock:           60             60           0.11           1.92          
41.33           0.69            127       67184172           0.03          
22.95    33160428.37           0.49

Does this address your concern?

Thanks,
-Aubrey

Reply via email to