On 8 Jun 2026, at 16:25, Gregory Price wrote:

> On Mon, Jun 08, 2026 at 03:52:20PM -0400, Zi Yan wrote:
>> On 8 Jun 2026, at 15:43, Gregory Price wrote:
>>
>>> On Mon, Jun 08, 2026 at 08:39:10PM +0200, David Hildenbrand (Arm) wrote:
>>>> On 6/8/26 17:27, Zi Yan wrote:
>>>>> On 8 Jun 2026, at 7:08, David Hildenbrand (Arm) wrote:
>>>>>
>>>>> Or should we defer zeroing after a page is returned from allocator? So 
>>>>> that
>>>>> user_addr does not need to be passed through irrelevant allocation APIs.
>>>>> Something like:
>>>>>
>>>>> alloc_page_wrapper(gfp, order, user_addr)
>>>>> {
>>>>>   page = alloc_pages();
>>>>>   if (gfp & __GFP_ZERO)
>>>>>           clear_page(page);
>>>>> }
>>>>>
>>>>
>>>> Not really sure what's best here. I think we'd want to limit the lifting 
>>>> to some
>>>> internal API, so it cannot easily be messed up by random kernel code 
>>>> calling
>>>> into the wrong API and not getting pages cleared.
>>>>
>>>
>>> We're a bit in circles on this.  We discussed explicit interfaces a few
>>> months back and the trade off was:
>>>
>>> a) add user_addr to the existing API and cause churn
>>>
>>> or
>>>
>>> b) add special interface like above
>>>    increase the buddy surface
>>>    leaves open the ability for users to get it wrong easily
>>>
>>> If we forget VMs for a moment and break this step out separately, the
>>> core question is whether page_alloc.c is the right place to be calling
>>> the folio_user_zero() or whatever it is.
>>
>> page_alloc.c calling folio_user_zero() is fine, but my question is
>> whether we should do the zeroing inside post_alloc_hook(), part of
>> allocation.
>>
>> What I propose is to lift __GFP_ZERO up as much as possible,
>> so that most of allocation code does not need to care about it.
>> We do the zeroing right before the page is returned to callers.
>>
>
> essentially we end up with something like
>
> alloc_frozen_...(..., gfp)
> {
>   folio = whatever(..., gfp);
>   if (gfp & __GFP_ZERO)
>     folio_zero(folio, -1); /* don't do cache flush part */
> }
>
> alloc_frozen_user_...(..., gfp, user_addr)
> {
>   folio = whatever(..., gfp);
>   if (gfp & __GFP_ZERO)
>     folio_zero(folio, user_addr); /* do cache flush part */
> }
>
> The downside of this is obvious: it's easy for developers to get this
> wrong and call the non-user interface for user-bound allocations and
> miss the cache flush (that is only needed on some archs).
>
> Not saying that's a deal breaker, but it's something to chew on.

I agree that misuse can cause trouble. But if we do the churn approach,
what prevents developer from doing alloc_frozen(..., user_addr = -1)
and using the returned page for userspace? It is possible the allocated
page can be exported to userspace later.

BTW, that cache flush thing is fragile even today, you probably can
do alloc_page() + vm_insert() to get a page without doing proper flush
and export it to userspace. There seems to be no mechanism to
prevent that.

Best Regards,
Yan, Zi

Reply via email to