On 18/09/25 12:35 AM, David Hildenbrand wrote:
> On 17.09.25 20:51, Kyle Meyer wrote:
>> On Wed, Sep 17, 2025 at 09:02:55AM +0200, David Hildenbrand wrote:
>>>
>>>>> +
>>>>> + 0 - Enable soft offline
>>>>> + 1 - Disable soft offline for HugeTLB pages
>>>>> +
>>>>> +Supported values::
>>>>> +
>>>>> + 0 - Soft offline is disabled
>>>>> + 1 - Soft offline is enabled
>>>>> + 3 - Soft offline is enabled (disabled for HugeTLB pages)
>>>>
>>>> This looks very adhoc even though existing behavior is preserved.
>>>>
>>>> - Are HugeTLB pages the only page types to be considered ?
>>>> - How the remaining bits here are going to be used later ?
>>>>
>>>
>>> What I proposed (that could be better documented here) is that all other
>>> bits except the first one will be a disable mask when bit 0 is set.
>>>
>>> 2 - ... but yet disabled for hugetlb
>>> 4 - ... but yet disabled for $WHATEVER
>>> 8 - ... but yet disabled for $WHATEVERELSE
>>>
>>>> Also without a bit-wise usage roadmap, is not changing a procfs
>>>> interface (ABI) bit problematic ?
>>>
>>> For now we failed setting it to values that are neither 0 or 1, IIUC
>>> set_enable_soft_offline() correctly?
>>
>> Yes, -EINVAL will be returned.
>>
>>> So there should not be any problem, or which scenario do you have in mind?
>>
>> Here's an alternative approach.
>>
>> Do not modify the existing sysctl parameter:
>>
>> /proc/sys/vm/enable_soft_offline
>>
>> 0 - Soft offline is disabled
>> 1 - Soft offline is enabled
>>
>> Instead, introduce a new sysctl parameter:
>>
>> /proc/sys/vm/enable_soft_offline_hugetlb
>>
>> 0 - Soft offline is disabled for HugeTLB pages
>> 1 - Soft offline is enabled for HugeTLB pages
>>
>> and note in documentation that this setting only takes effect if
>> enable_soft_offline is enabled.
>>
>> Anshuman (and David), would you prefer this?
>
> Hmm, at least I don't particularly like that. For each new exception we would
> create a new file, and the file has weird semantics such that it has no
> meaning when enable_soft_offline=0.
Agree with David here. Adding a new procfs file for a particular page
type's soft offline disable scenario does not really make sense. This
will extend the ABI unnecessarily without adding much benefit.