On 06/28/2017 05:19 PM, Jens Axboe wrote:
> On 06/28/2017 04:07 PM, Brian King wrote:
>> On 06/28/2017 04:59 PM, Jens Axboe wrote:
>>> On 06/28/2017 03:54 PM, Jens Axboe wrote:
>>>> On 06/28/2017 03:12 PM, Brian King wrote:
>>>>> -static inline int part_in_flight(struct hd_struct *part)
>>>>> +static inline unsigned long part_in_flight(struct hd_struct *part)
>>>>>  {
>>>>> - return atomic_read(&part->in_flight[0]) + 
>>>>> atomic_read(&part->in_flight[1]);
>>>>> + return part_stat_read(part, in_flight[0]) + part_stat_read(part, 
>>>>> in_flight[1]);
>>>>
>>>> One obvious improvement would be to not do this twice, but only have to
>>>> loop once. Instead of making this an array, make it a structure with a
>>>> read and write count.
>>>>
>>>> It still doesn't really fix the issue of someone running on a kernel
>>>> with a ton of possible CPUs configured. But it does reduce the overhead
>>>> by 50%.
>>>
>>> Or something as simple as this:
>>>
>>> #define part_stat_read_double(part, field1, field2)                 \
>>> ({                                                                  \
>>>     typeof((part)->dkstats->field1) res = 0;                        \
>>>     unsigned int _cpu;                                              \
>>>     for_each_possible_cpu(_cpu) {                                   \
>>>             res += per_cpu_ptr((part)->dkstats, _cpu)->field1;      \
>>>             res += per_cpu_ptr((part)->dkstats, _cpu)->field2;      \
>>>     }                                                               \
>>>     res;                                                            \
>>> })
>>>
>>> static inline unsigned long part_in_flight(struct hd_struct *part)
>>> {
>>>     return part_stat_read_double(part, in_flight[0], in_flight[1]);
>>> }
>>>
>>
>> I'll give this a try and also see about running some more exhaustive
>> runs to see if there are any cases where we go backwards in performance.
>>
>> I'll also run with partitions and see how that impacts this.
> 
> And do something nuts, like setting NR_CPUS to 512 or whatever. What do
> distros ship with?

Both RHEL and SLES set NR_CPUS=2048 for the Power architecture. I can easily
switch the SMT mode of the machine I used for this from 4 to 8 to have a total
of 160 online logical CPUs and see how that affects the performance. I'll
see if I can find a larger machine as well.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to