from:"T.J. Mercier"

Re: [PATCH v2] dma-buf: handle testing kthreads creation failure

2024-05-22 Thread T.J. Mercier

On Wed, May 22, 2024 at 11:14 AM Fedor Pchelkin  wrote:
>
> kthread creation may possibly fail inside race_signal_callback(). In
> such a case stop the already started threads, put the already taken
> references to them and return with error code.
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: 2989f6451084 ("dma-buf: Add selftests for dma-fence")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Fedor Pchelkin 
Reviewed-by: T.J. Mercier 
> ---
> v2: use kthread_stop_put() to actually put the last reference as
> T.J. Mercier noticed;
> link to v1: 
> https://lore.kernel.org/lkml/20240522122326.696928-1-pchel...@ispras.ru/
>
>  drivers/dma-buf/st-dma-fence.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
> index b7c6f7ea9e0c..6a1bfcd0cc21 100644
> --- a/drivers/dma-buf/st-dma-fence.c
> +++ b/drivers/dma-buf/st-dma-fence.c
> @@ -540,6 +540,12 @@ static int race_signal_callback(void *arg)
> t[i].before = pass;
> t[i].task = kthread_run(thread_signal_callback, [i],
> "dma-fence:%d", i);
> +   if (IS_ERR(t[i].task)) {
> +   ret = PTR_ERR(t[i].task);
> +   while (--i >= 0)
> +   kthread_stop_put(t[i].task);
> +   return ret;
> +   }
> get_task_struct(t[i].task);
> }
>
> --
> 2.39.2
>

Re: [PATCH] dma-buf: handle testing kthreads creation failure

2024-05-22 Thread T.J. Mercier

On Wed, May 22, 2024 at 5:24 AM Fedor Pchelkin  wrote:
>
> kthread creation may possibly fail inside race_signal_callback(). In
> such case stop the already started threads and return with error code.
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: 2989f6451084 ("dma-buf: Add selftests for dma-fence")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Fedor Pchelkin 
> ---
>  drivers/dma-buf/st-dma-fence.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
> index b7c6f7ea9e0c..ab1ec4631578 100644
> --- a/drivers/dma-buf/st-dma-fence.c
> +++ b/drivers/dma-buf/st-dma-fence.c
> @@ -540,6 +540,12 @@ static int race_signal_callback(void *arg)
> t[i].before = pass;
> t[i].task = kthread_run(thread_signal_callback, [i],
> "dma-fence:%d", i);
> +   if (IS_ERR(t[i].task)) {
> +   ret = PTR_ERR(t[i].task);
> +   while (--i >= 0)
> +   kthread_stop(t[i].task);

This looks like it needs to be kthread_stop_put since get_task_struct
was called for previous successful kthread_run calls.

> +   return ret;
> +   }
> get_task_struct(t[i].task);
> }
>
> --
> 2.39.2
>

Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue

2024-05-07 Thread T.J. Mercier

On Tue, May 7, 2024 at 7:04 AM Christian König  wrote:
>
> Am 07.05.24 um 15:39 schrieb Daniel Vetter:
> > On Tue, May 07, 2024 at 12:10:07PM +0200, Christian König wrote:
> >> Am 06.05.24 um 21:04 schrieb T.J. Mercier:
> >>> On Mon, May 6, 2024 at 2:30 AM Charan Teja Kalla
> >>>  wrote:
> >>>> Hi TJ,
> >>>>
> >>>> Seems I have got answers from [1], where it is agreed upon epoll() is
> >>>> the source of issue.
> >>>>
> >>>> Thanks a lot for the discussion.
> >>>>
> >>>> [1] https://lore.kernel.org/lkml/2d631f0615918...@google.com/
> >>>>
> >>>> Thanks
> >>>> Charan
> >>> Oh man, quite a set of threads on this over the weekend. Thanks for the 
> >>> link.
> >> Yeah and it also has some interesting side conclusion: We should probably
> >> tell people to stop using DMA-buf with epoll.
> >>
> >> The background is that the mutex approach epoll uses to make files 
> >> disappear
> >> from the interest list on close results in the fact that each file can only
> >> be part of a single epoll at a time.
> >>
> >> Now since DMA-buf is build around the idea that we share the buffer
> >> representation as file between processes it means that only one process at 
> >> a
> >> time can use epoll with each DMA-buf.
> >>
> >> So for example if a window manager uses epoll everything is fine. If a
> >> client is using epoll everything is fine as well. But if *both* use epoll 
> >> at
> >> the same time it won't work.
> >>
> >> This can lead to rather funny and hard to debug combinations of failures 
> >> and
> >> I think we need to document this limitation and explicitly point it out.
> > Ok, I tested this with a small C program, and you're mixing things up.
> > Here's what I got
> >
> > - You cannot add a file twice to the same epoll file/fd. So that part is
> >correct, and also my understanding from reading the kernel code.
> >
> > - You can add the same file to two different epoll file instaces. Which
> >means it's totally fine to use epoll on a dma_buf in different processes
> >like both in the compositor and in clients.
>
> Ah! Than I misunderstood that comment in the discussion. Thanks for
> clarifying that.
>
> >
> > - Substantially more entertaining, you can nest epoll instances, and e.g.
> >add a 2nd epoll file as an event to the first one. That way you can add
> >the same file to both epoll fds, and so end up with the same file
> >essentially being added twice to the top-level epoll file. So even
> >within one application there's no real issue when e.g. different
> >userspace drivers all want to use epoll on the same fd, because you can
> >just throw in another level of epoll and it's fine again and you won't
> >get an EEXISTS on EPOLL_CTL_ADD.
> >
> >But I also don't think we have this issue right now anywhere, since it's
> >kinda a general epoll issue that happens with any duplicated file.
>
> I actually have been telling people to (ab)use the epoll behavior to
> check if two file descriptors point to the same underlying file when
> KCMP isn't available.
>
> Some environments (Android?) disable KCMP because they see it as
> security problem.
>
Didn't know about this so I checked. Our kernel has CONFIG_KCMP, but
seccomp does look like it's blocking kcmp for apps.
https://cs.android.com/android/platform/superproject/main/+/main:bionic/libc/SYSCALLS.TXT

Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue

2024-05-06 Thread T.J. Mercier

On Mon, May 6, 2024 at 2:30 AM Charan Teja Kalla
 wrote:
>
> Hi TJ,
>
> Seems I have got answers from [1], where it is agreed upon epoll() is
> the source of issue.
>
> Thanks a lot for the discussion.
>
> [1] https://lore.kernel.org/lkml/2d631f0615918...@google.com/
>
> Thanks
> Charan

Oh man, quite a set of threads on this over the weekend. Thanks for the link.

Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue

2024-05-03 Thread T.J. Mercier

On Fri, May 3, 2024 at 6:40 AM Charan Teja Kalla
 wrote:
>
> Thanks Christian/TJ for the inputs!!
>
> On 4/18/2024 12:16 PM, Christian König wrote:
> > As far as I can see the EPOLL holds a reference to the files it
> > contains. So it is perfectly valid to add the file descriptor to EPOLL
> > and then close it.
> >
> > In this case the file won't be closed, but be kept alive by it's
> > reference in the EPOLL file descriptor.
>
> I am not seeing that adding a 'fd' into the epoll monitoring list will
> increase its refcount.  Confirmed by writing a testcase that just do
> open + EPOLL_CTL_ADD and see the /proc/../fdinfo of epoll fd(Added
> file_count() info to the output)
> # cat /proc/136/fdinfo/3
> pos:0
> flags:  02
> mnt_id: 14
> ino:1041
> tfd:4 events:   19 data:4  pos:0 ino:81 sdev:5
> refcount: 1<-- The fd added to the epoll monitoring is still 1(same as
> open file refcount)
>
> From the code too, I don't see a file added in the epoll monitoring list
> will have an extra refcount but momentarily (where it increases the
> refcount of target file, add it to the rbtree of the epoll context and
> then decreasing the file refcount):
> do_epoll_ctl():
> f = fdget(epfd);
> tf = fdget(fd);
> epoll_mutex_lock(>mtx)
> EPOLL_CTL_ADD:
> ep_insert(ep, epds, tf.file, fd, full_check); // Added to the 
> epoll
> monitoring rb tree list as epitem.
> mutex_unlock(>mtx);
> fdput(tf);
> fdput(f);
>
>
> Not sure If i am completely mistaken what you're saying here.
>
> > The fs layer which calls dma_buf_poll() should make sure that the file
> > can't go away until the function returns.
> >
> > Then inside dma_buf_poll() we add another reference to the file while
> > installing the callback:
> >
> > /* Paired with fput in dma_buf_poll_cb */
> > get_file(dmabuf->file); No, exactly that can't
> > happen either.
> >
>
> I am not quite comfortable with epoll internals but I think below race
> is possible where "The 'file' passed to dma_buf_poll() is proper but
> ->f_count == 0, which is indicating that a parallel freeing is
> happening". So, we should check the file->f_count value before taking
> the refcount.
>
> (A 'fd' registered for the epoll monitoring list is maintained as
> 'epitem(epi)' in the rbtree of 'eventpoll(ep, called as epoll context).
>
> epoll_wait()__fput()(as file->f_count=0)
> 
> a) ep_poll_callback():
>  Is the waitqueue function
>called when signaled on the
>wait_queue_head_t of the registered
>poll() function.
>
>1) It links the 'epi' to the ready list
>   of 'ep':
>if (!ep_is_linked(epi))
>  list_add_tail_lockless(>rdllink,
> >rdllist)
>
>2) wake_up(>wq);
> Wake up the process waiting
> on epoll_wait() that endup
> in ep_poll.
>
>
> b) ep_poll():
> 1) while (1) {
> eavail = ep_events_available(ep);
> (checks ep->rdlist)
> ep_send_events(ep);
> (notify the events to user)
> }
> (epoll_wait() calling process gets
>  woken up from a.2 and process the
>  events raised added to rdlist in a.1)
>
>2) ep_send_events():
> mutex_lock(>mtx);
> (** The sync lock is taken **)
> list_for_each_entry_safe(epi, tmp,
> , rdllink) {
> list_del_init(>rdllink);
> revents = ep_item_poll(epi, , 1)
> (vfs_poll()-->...f_op->poll(=dma_buf_poll)
> }
> mutex_unlock(>mtx)
> (**release the lock.**)
>
> (As part of procession of events,
>  each epitem is removed from rdlist
>  and call the f_op->poll() of a file
>  which will endup in dma_buf_poll())
>
>3) dma_buf_poll():
> get_file(dmabuf->file);
> (** f_count changed from 0->1 **)
> dma_buf_poll_add_cb(resv, true, dcb):
> (registers dma_buf_poll_cb() against fence)
>
>
> c) eventpoll_release_file():
>mutex_lock(>mtx);
>__ep_remove(ep, epi, true):
>mutex_unlock(>mtx);
>   (__ep_remove() will remove the
>'epi' from rbtree and if present
>from rdlist as well)
>
> d) file_free(file), free the 'file'.
>
> e) dma_buf_poll_cb:
>  /* Paired with get_file in dma_buf_poll */
>  fput(dmabuf->file);
>  (f_count changed from 1->0, where
>   we try to free the 'file' again
>   which is UAF/double free).
>
>
>
> In the above race, If c) gets called first, then the 'epi' is removed
> from both rbtree and 'rdlink' under ep->mtx lock thus b.2 don't end up
>

Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-11 Thread T.J. Mercier

On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065...@vivo.com> wrote:
>
>
> 在 2024/4/10 0:37, T.J. Mercier 写道:
> > [You don't often get email from tjmerc...@google.com. Learn why this is 
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >>>> [SNIP]
> >>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>>>> offset and len.
> >>>>> You have not given an use case for this so it is a bit hard to
> >>>>> review. And from the existing use cases I don't see why this should
> >>>>> be necessary.
> >>>>>
> >>>>> Even worse from the existing backend implementation I don't even see
> >>>>> how drivers should be able to fulfill this semantics.
> >>>>>
> >>>>> Please explain further,
> >>>>> Christian.
> >>>> Here is a practical case:
> >>>> The user space can allocate a large chunk of dma-buf for
> >>>> self-management, used as a shared memory pool.
> >>>> Small dma-buf can be allocated from this shared memory pool and
> >>>> released back to it after use, thus improving the speed of dma-buf
> >>>> allocation and release.
> >>>> Additionally, custom functionalities such as memory statistics and
> >>>> boundary checking can be implemented in the user space.
> >>>> Of course, the above-mentioned functionalities require the
> >>>> implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >>   From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker. That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377
> >
> Thank you for the reminder.
>
> The page pool of the system heap can meet the needs of most scenarios,
> but the camera shooting scenario is quite special.
>
> Why the camera shooting scenario needs to have a dma-buf memory pool in
> the user-level？
>
> (1) The memory demand is extremely high and time requirements are
> stringent.

Preallocating for this makes sense to me, whether it is individual
buffers or one large one.

> (2) The memory in the page pool(system heap) is easily reclaimed or used
> by other apps.

Yeah, by design I guess. I have seen an implementation where the page
pool is proactively refilled after it has been empty for some time
which would help in scenarios where the pool is often reclaimed and
low/empty. You could add that in a vendor heap.

> (3) High concurrent allocation and release (with deferred-free) lead to
> high memory usage peaks.

Hopefully this is not every frame? I saw enough complaints about the
deferred free of pool pages that it hasn't been carried forward since
Android 13, so this should be less of a problem on recent kernels.

> Additionally, after the camera exits, the shared memory pool can be
> released, with minimal impact.

Why do you care about the difference here? In one case it's when the
buffer refcount goes to 0 and memory is freed immediately, and in the
other it's the next time reclaim runs the shrinker.


I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
without it being in the upstream kernel. I don't think we can get that
without an in-kernel user of dma_buf_begin_cpu_access_partial first,
even though your use case wouldn't rely on that in-kernel usage. :\ So
if you want to add this to phones for your camera app, then I think
your best option is to add a vendor driver which implements this IOCTL
and calls the dma_buf_begin_cpu_access_partial functions which are
already exported.

Best,
T.J.

Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-10 Thread T.J. Mercier

On Wed, Apr 10, 2024 at 7:22 AM Christian König
 wrote:
>
> Am 09.04.24 um 18:37 schrieb T.J. Mercier:
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >>>> [SNIP]
> >>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>>>> offset and len.
> >>>>> You have not given an use case for this so it is a bit hard to
> >>>>> review. And from the existing use cases I don't see why this should
> >>>>> be necessary.
> >>>>>
> >>>>> Even worse from the existing backend implementation I don't even see
> >>>>> how drivers should be able to fulfill this semantics.
> >>>>>
> >>>>> Please explain further,
> >>>>> Christian.
> >>>> Here is a practical case:
> >>>> The user space can allocate a large chunk of dma-buf for
> >>>> self-management, used as a shared memory pool.
> >>>> Small dma-buf can be allocated from this shared memory pool and
> >>>> released back to it after use, thus improving the speed of dma-buf
> >>>> allocation and release.
> >>>> Additionally, custom functionalities such as memory statistics and
> >>>> boundary checking can be implemented in the user space.
> >>>> Of course, the above-mentioned functionalities require the
> >>>> implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >>   From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker.
>
> Well, it should be obvious but I'm going to repeat it here: Submitting
> kernel patches for our of tree code is a rather *extreme* no-go.
>
Sorry I think my comment led to some confusion. I wasn't suggesting
you should take the patch; it's clearly incomplete. I was pointing out
another option to Rong. It appears Rong is creating a single oversized
dma-buf, and subdividing it in userspace to avoid multiple allocations
for multiple buffers. That would lead to a need for a partial sync of
the buffer from userspace. Instead of that, using a heap with a page
pool gets you kind of the same thing with a lot less headache in
userspace, and no need for partial sync. So I wanted to share that
option, and that can go in just Android if necessary without this
patch.

> That in kernel code *must* have an in kernel user is a number one rule.
>
> When somebody violates this rule he pretty much disqualifying himself as
> reliable source of patches since maintainers then have to assume that
> this person tries to submit code which doesn't have a justification to
> be upstream.
>
> So while this actually looks useful from the technical side as long as
> nobody does an implementation based on an upstream driver I absolutely
> have to reject it from the organizational side.
>
> Regards,
> Christian.
>
> >   That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377
> >
> >
> >> To be honest, I didn't understand your concern "...how drivers should be
> >> able to fulfill this semantics." Can you please help explain it in more
> >> detail?
> >>
> >> Thanks,
> >>
> >> Rong Qianfeng.
> >>
> >>>> Thanks
> >>>> Rong Qianfeng.
>

Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-09 Thread T.J. Mercier

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
>
>
> 在 2024/4/8 15:58, Christian König 写道:
> > Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >> [SNIP]
> >>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
>  Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
>  offset and len.
> >>>
> >>> You have not given an use case for this so it is a bit hard to
> >>> review. And from the existing use cases I don't see why this should
> >>> be necessary.
> >>>
> >>> Even worse from the existing backend implementation I don't even see
> >>> how drivers should be able to fulfill this semantics.
> >>>
> >>> Please explain further,
> >>> Christian.
> >> Here is a practical case:
> >> The user space can allocate a large chunk of dma-buf for
> >> self-management, used as a shared memory pool.
> >> Small dma-buf can be allocated from this shared memory pool and
> >> released back to it after use, thus improving the speed of dma-buf
> >> allocation and release.
> >> Additionally, custom functionalities such as memory statistics and
> >> boundary checking can be implemented in the user space.
> >> Of course, the above-mentioned functionalities require the
> >> implementation of a partial cache sync interface.
> >
> > Well that is obvious, but where is the code doing that?
> >
> > You can't send out code without an actual user of it. That will
> > obviously be rejected.
> >
> > Regards,
> > Christian.
>
> In fact, we have already used the user-level dma-buf memory pool in the
> camera shooting scenario on the phone.
>
>  From the test results, The execution time of the photo shooting
> algorithm has been reduced from 3.8s to 3s.
>
For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377


> To be honest, I didn't understand your concern "...how drivers should be
> able to fulfill this semantics." Can you please help explain it in more
> detail?
>
> Thanks,
>
> Rong Qianfeng.
>
> >
> >>
> >> Thanks
> >> Rong Qianfeng.
> >
>

Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue

2024-04-02 Thread T.J. Mercier

On Tue, Apr 2, 2024 at 1:08 AM Christian König  wrote:
>
> Am 02.04.24 um 08:49 schrieb zhiguojiang:
> >> As far as I can see that's not because of the DMA-buf code, but
> >> because you are somehow using this interface incorrectly.
> >>
> >> When dma_buf_poll() is called it is mandatory for the caller to hold
> >> a reference to the file descriptor on which the poll operation is
> >> executed.
> >>
> >> So adding code like "if (!file_count(file))" in the beginning of
> >> dma_buf_poll() is certainly broken.
> >>
> >> My best guess is that you have some unbalanced
> >> dma_buf_get()/dma_buf_put() somewhere instead.
> >>
> >>
> > Hi Christian,
> >
> > The kernel dma_buf_poll() code shound not cause system crashes due to
> > the user mode usage logical issues ?
>
> What user mode logical issues are you talking about? Closing a file
> while polling on it is perfectly valid.
>
> dma_buf_poll() is called by the filesystem layer and it's the filesystem
> layer which should make sure that a file can't be closed while polling
> for an event.
>
> If that doesn't work then you have stumbled over a massive bug in the fs
> layer. And I have some doubts that this is actually the case.
>
> What is more likely is that some driver messes up the reference count
> and because of this you see an UAF.
>
> Anyway as far as I can see the DMA-buf code is correct regarding this.
>
> Regards,
> Christian.

I tried to reproduce this problem but I couldn't. What I see is a ref
get taken when poll is first called. So subsequently closing the fd in
userspace while it's being polled doesn't take the refcount all the
way to 0. That happens when dma_buf_poll_cb fires, either due to an
event or when the fd is closed upon timeout.

I don't really see how this could be triggered from userspace so I am
also suspicious of dma_buf_get/put.

Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue

2024-03-29 Thread T.J. Mercier

On Tue, Mar 26, 2024 at 7:29 PM Zhiguo Jiang  wrote:
>
> The issue is a UAF issue of dmabuf file fd. Throght debugging, we found
> that the dmabuf file fd is added to the epoll event listener list, and
> when it is released, it is not removed from the epoll list, which leads
> to the UAF(Use-After-Free) issue.
>
> The UAF issue can be solved by checking dmabuf file->f_count value and
> skipping the poll operation for the closed dmabuf file in the
> dma_buf_poll(). We have tested this solved patch multiple times and
> have not reproduced the uaf issue.
>

Hi Zhiguo,

What is the most recent kernel version you've seen the bug on?

You are closing the dmabuf fd from another thread while it is still
part of the epoll interest list?

Thanks,
T.J.

Re: [PATCH] dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS

2024-03-28 Thread T.J. Mercier

On Thu, Mar 28, 2024 at 7:53 AM Tvrtko Ursulin  wrote:
>
> From: Tvrtko Ursulin 
>
> There is no point in compiling in the list and mutex operations which are
> only used from the dma-buf debugfs code, if debugfs is not compiled in.
>
> Put the code in questions behind some kconfig guards and so save some text
> and maybe even a pointer per object at runtime when not enabled.
>
> Signed-off-by: Tvrtko Ursulin 

Reviewed-by: T.J. Mercier

Re: [PATCH RESEND drm-misc 4/4] dma-buf: heaps: make dma_heap_class constant

2024-03-05 Thread T.J. Mercier

On Tue, Mar 5, 2024 at 10:02 AM Ricardo B. Marliere
 wrote:
>
> On  5 Mar 09:07, T.J. Mercier wrote:
> >
> > Reviewed-by: T.J. Mercier 
> >
> > Is this really a resend? I don't see anything on lore and I can't
> > recall seeing this patch in my inbox before.
>
> Hi T.J. thanks for reviewing!
>
> I'm sorry about that, I sent the series only to Greg before but I
> thought it had Cc'ed the lists as well. Then I realized it was sent
> publicly only once. Double mistake :(
>
> Best regards,
> -   Ricardo.

Cheers, glad I don't have to try to rework my email filters. :)

Re: [PATCH] dma-buf: Add syntax highlighting to code listings in the document

2024-03-05 Thread T.J. Mercier

On Thu, Jan 18, 2024 at 7:33 PM Tommy Chiang  wrote:
>
> This patch tries to improve the display of the code listing
> on The Linux Kernel documentation website for dma-buf [1] .
>
> Originally, it appears that it was attempting to escape
> the '*' character, but looks like it's not necessary (now),
> so we are seeing something like '\*' on the webite.
>
> This patch removes these unnecessary backslashes and adds syntax
> highlighting to improve the readability of the code listing.
>
> [1] https://docs.kernel.org/driver-api/dma-buf.html
>
> Signed-off-by: Tommy Chiang 
> ---
>  drivers/dma-buf/dma-buf.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 8fe5aa67b167..e083a0ab06d7 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -1282,10 +1282,12 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
>   *   vmap interface is introduced. Note that on very old 32-bit architectures
>   *   vmalloc space might be limited and result in vmap calls failing.
>   *
> - *   Interfaces::
> + *   Interfaces:
>   *
> - *  void \*dma_buf_vmap(struct dma_buf \*dmabuf, struct iosys_map \*map)
> - *  void dma_buf_vunmap(struct dma_buf \*dmabuf, struct iosys_map \*map)
> + *   .. code-block:: c
> + *
> + * void *dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
> + * void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)
>   *
>   *   The vmap call can fail if there is no vmap support in the exporter, or 
> if
>   *   it runs out of vmalloc space. Note that the dma-buf layer keeps a 
> reference
> @@ -1342,10 +1344,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
>   *   enough, since adding interfaces to intercept pagefaults and allow pte
>   *   shootdowns would increase the complexity quite a bit.
>   *
> - *   Interface::
> + *   Interface:
> + *
> + *   .. code-block:: c
>   *
> - *  int dma_buf_mmap(struct dma_buf \*, struct vm_area_struct \*,
> - *unsigned long);
> + * int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned 
> long);
>   *
>   *   If the importing subsystem simply provides a special-purpose mmap call 
> to
>   *   set up a mapping in userspace, calling do_mmap with _buf.file will
> --
> 2.43.0.381.gb435a96ce8-goog

Reviewed-by: T.J. Mercier 

The code block highlighting is nice.

Re: [PATCH RESEND drm-misc 4/4] dma-buf: heaps: make dma_heap_class constant

2024-03-05 Thread T.J. Mercier

On Tue, Mar 5, 2024 at 3:34 AM Ricardo B. Marliere  wrote:
>
> Since commit 43a7206b0963 ("driver core: class: make class_register() take
> a const *"), the driver core allows for struct class to be in read-only
> memory, so move the dma_heap_class structure to be declared at build time
> placing it into read-only memory, instead of having to be dynamically
> allocated at boot time.
>
> Cc: Greg Kroah-Hartman 
> Suggested-by: Greg Kroah-Hartman 
> Signed-off-by: Ricardo B. Marliere 
> ---
>  drivers/dma-buf/dma-heap.c | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index 84ae708fafe7..bcca6a2bbce8 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -43,10 +43,18 @@ struct dma_heap {
> struct cdev heap_cdev;
>  };
>
> +static char *dma_heap_devnode(const struct device *dev, umode_t *mode)
> +{
> +   return kasprintf(GFP_KERNEL, "dma_heap/%s", dev_name(dev));
> +}
> +
>  static LIST_HEAD(heap_list);
>  static DEFINE_MUTEX(heap_list_lock);
>  static dev_t dma_heap_devt;
> -static struct class *dma_heap_class;
> +static struct class dma_heap_class = {
> +   .name = DEVNAME,
> +   .devnode = dma_heap_devnode,
> +};
>  static DEFINE_XARRAY_ALLOC(dma_heap_minors);
>
>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> @@ -261,7 +269,7 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> goto err1;
> }
>
> -   dev_ret = device_create(dma_heap_class,
> +   dev_ret = device_create(_heap_class,
> NULL,
> heap->heap_devt,
> NULL,
> @@ -291,7 +299,7 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> return heap;
>
>  err3:
> -   device_destroy(dma_heap_class, heap->heap_devt);
> +   device_destroy(_heap_class, heap->heap_devt);
>  err2:
> cdev_del(>heap_cdev);
>  err1:
> @@ -301,11 +309,6 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> return err_ret;
>  }
>
> -static char *dma_heap_devnode(const struct device *dev, umode_t *mode)
> -{
> -   return kasprintf(GFP_KERNEL, "dma_heap/%s", dev_name(dev));
> -}
> -
>  static int dma_heap_init(void)
>  {
> int ret;
> @@ -314,12 +317,11 @@ static int dma_heap_init(void)
> if (ret)
> return ret;
>
> -   dma_heap_class = class_create(DEVNAME);
> -   if (IS_ERR(dma_heap_class)) {
> +   ret = class_register(_heap_class);
> +   if (ret) {
>     unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS);
> -   return PTR_ERR(dma_heap_class);
> +   return ret;
> }
> -   dma_heap_class->devnode = dma_heap_devnode;
>
> return 0;
>  }
>
> --
> 2.43.0

Reviewed-by: T.J. Mercier 

Is this really a resend? I don't see anything on lore and I can't
recall seeing this patch in my inbox before.

Re: [PATCH] dma-buf: heaps: Don't track CMA dma-buf pages under RssFile

2024-01-18 Thread T.J. Mercier

On Thu, Jan 18, 2024 at 6:49 AM Daniel Vetter  wrote:
>
> On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote:
> > Am 17.01.24 um 19:11 schrieb T.J. Mercier:
> > > DMA buffers allocated from the CMA dma-buf heap get counted under
> > > RssFile for processes that map them and trigger page faults. In
> > > addition to the incorrect accounting reported to userspace, reclaim
> > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
> > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
> > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
> > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
> > >
> > > The system dma-buf heap does not suffer from this issue since
> > > remap_pfn_range is used during the mmap of the buffer, which also sets
> > > VM_PFNMAP on the VMA.
> >
> > Mhm, not an issue with this patch but Daniel wanted to add a check for
> > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier.
> >
> > I don't fully remember the discussion but for some reason that was never
> > committed. We should probably try that again.
>
> Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL
> mapping, at least on absolutely all architectures. That's why I defacto
> dropped that idea, but it would indeed be really great if we could
> resurrect it.

I actually had it in my head that it was a BUG_ON check for VM_PFNMAP
in dma_buf_mmap and it was merged, so I was surprised to discover that
it wasn't set for these CMA buffers.

> Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which
> exact cases ended up with a VM_NORMAL mapping ... Would need a pile of
> digging.

Looking back at the patch, the CI email at the end of the thread lists
a bunch of now-broken links to DMESG-WARN test failures I assume
pointed at a large chunk of them.

https://lore.kernel.org/all/166919750173.15575.2864736980735346...@emeril.freedesktop.org/

> >
> > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
> > >
> > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
> > > Signed-off-by: T.J. Mercier
> >
> > Acked-by: Christian König 

Thanks Christian.

[PATCH] dma-buf: heaps: Don't track CMA dma-buf pages under RssFile

2024-01-17 Thread T.J. Mercier

DMA buffers allocated from the CMA dma-buf heap get counted under
RssFile for processes that map them and trigger page faults. In
addition to the incorrect accounting reported to userspace, reclaim
behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.

The system dma-buf heap does not suffer from this issue since
remap_pfn_range is used during the mmap of the buffer, which also sets
VM_PFNMAP on the VMA.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede

Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
Signed-off-by: T.J. Mercier 
---
 drivers/dma-buf/heaps/cma_heap.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index ee899f8e6721..4a63567e93ba 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
if (vmf->pgoff > buffer->pagecount)
return VM_FAULT_SIGBUS;
 
-   vmf->page = buffer->pages[vmf->pgoff];
-   get_page(vmf->page);
-
-   return 0;
+   return vmf_insert_pfn(vma, vmf->address, 
page_to_pfn(buffer->pages[vmf->pgoff]));
 }
 
 static const struct vm_operations_struct dma_heap_vm_ops = {
@@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct 
vm_area_struct *vma)
if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;
 
+   vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+
vma->vm_ops = _heap_vm_ops;
vma->vm_private_data = buffer;
 
-- 
2.43.0.381.gb435a96ce8-goog

Re: [PATCH] dma-buf: Replace strlcpy() with strscpy()

2023-11-17 Thread T.J. Mercier

On Thu, Nov 16, 2023 at 11:14 AM Kees Cook  wrote:
>
> strlcpy() reads the entire source buffer first. This read may exceed
> the destination size limit. This is both inefficient and can lead
> to linear read overflows if a source string is not NUL-terminated[1].
> Additionally, it returns the size of the source string, not the
> resulting size of the destination string. In an effort to remove strlcpy()
> completely[2], replace strlcpy() here with strscpy().
>
> Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy 
> [1]
> Link: https://github.com/KSPP/linux/issues/89 [2]
> Cc: Sumit Semwal 
> Cc: "Christian König" 
> Cc: Azeem Shaikh 
> Cc: linux-me...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Kees Cook 

Reviewed-by: T.J. Mercier 

strscpy returns -E2BIG when it truncates / force null-terminates which
would provide the wrong argument for dynamic_dname, but
dma_buf_set_name{_user} makes sure we have a null-terminated string of
the appropriate maximum size in dmabuf->name.

Re: [RFC PATCH 2/2] vc4: introduce DMA-BUF heap

2023-11-09 Thread T.J. Mercier

On Wed, Nov 8, 2023 at 11:46 PM Simon Ser  wrote:
>
> +int vc4_dma_heap_create(struct vc4_dev *vc4)
> +{
> +   struct dma_heap_export_info exp_info;
> +   struct dma_heap *heap;
> +
> +   exp_info.name = "vc4"; /* TODO: allow multiple? */
> +   exp_info.ops = _dma_heap_ops;
> +   exp_info.priv = vc4; /* TODO: unregister when unloading */
> +

So unregistering a heap isn't currently possible, but we're trying to
enable that here:
https://lore.kernel.org/all/20231106120423.23364-7-yunfei.d...@mediatek.com/

Re: [PATCH] dma-buf: heaps: Fix off by one in cma_heap_vm_fault()

2023-10-03 Thread T.J. Mercier

On Tue, Oct 3, 2023 at 1:30 AM Dan Carpenter  wrote:
>
> On Mon, Oct 02, 2023 at 10:16:24AM -0700, T.J. Mercier wrote:
> > On Mon, Oct 2, 2023 at 12:04 AM Dan Carpenter  
> > wrote:
> > >
> > > The buffer->pages[] has "buffer->pagecount" elements so this > comparison
> > > has to be changed to >= to avoid reading beyond the end of the array.
> > > The buffer->pages[] array is allocated in cma_heap_allocate().
> > >
> > > Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the 
> > > cma_heap implementation")
> > > Signed-off-by: Dan Carpenter 
> > > ---
> > >  drivers/dma-buf/heaps/cma_heap.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/dma-buf/heaps/cma_heap.c 
> > > b/drivers/dma-buf/heaps/cma_heap.c
> > > index ee899f8e6721..bea7e574f916 100644
> > > --- a/drivers/dma-buf/heaps/cma_heap.c
> > > +++ b/drivers/dma-buf/heaps/cma_heap.c
> > > @@ -165,7 +165,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault 
> > > *vmf)
> > > struct vm_area_struct *vma = vmf->vma;
> > > struct cma_heap_buffer *buffer = vma->vm_private_data;
> > >
> > > -   if (vmf->pgoff > buffer->pagecount)
> > > +   if (vmf->pgoff >= buffer->pagecount)
> > > return VM_FAULT_SIGBUS;
> > >
> > Hi Dan,
> >
> > Your fix looks correct to me, but I'm curious if you observed this
> > problem on a device? The mmap in dma-buf.c looks like it prevents
> > creating a mapping that is too large, and I think an access beyond the
> > VMA should segfault before reaching here.
>
> This is from static analysis and not from testing.  You could be correct
> that this bug can't affect real life.
>
> regards,
> dan carpenter

Ok, thanks Dan.

Reviewed-by: T.J. Mercier

Re: [PATCH] dma-buf: heaps: Fix off by one in cma_heap_vm_fault()

2023-10-02 Thread T.J. Mercier

On Mon, Oct 2, 2023 at 12:04 AM Dan Carpenter  wrote:
>
> The buffer->pages[] has "buffer->pagecount" elements so this > comparison
> has to be changed to >= to avoid reading beyond the end of the array.
> The buffer->pages[] array is allocated in cma_heap_allocate().
>
> Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the 
> cma_heap implementation")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/dma-buf/heaps/cma_heap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/cma_heap.c 
> b/drivers/dma-buf/heaps/cma_heap.c
> index ee899f8e6721..bea7e574f916 100644
> --- a/drivers/dma-buf/heaps/cma_heap.c
> +++ b/drivers/dma-buf/heaps/cma_heap.c
> @@ -165,7 +165,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
> struct vm_area_struct *vma = vmf->vma;
> struct cma_heap_buffer *buffer = vma->vm_private_data;
>
> -   if (vmf->pgoff > buffer->pagecount)
> +   if (vmf->pgoff >= buffer->pagecount)
> return VM_FAULT_SIGBUS;
>
Hi Dan,

Your fix looks correct to me, but I'm curious if you observed this
problem on a device? The mmap in dma-buf.c looks like it prevents
creating a mapping that is too large, and I think an access beyond the
VMA should segfault before reaching here.

Thanks,
T.J.

Re: [PATCH 2/9] dma-heap: Add proper kref handling on dma-buf heaps

2023-09-22 Thread T.J. Mercier

On Mon, Sep 11, 2023 at 2:49 AM Christian König
 wrote:
>
> Am 11.09.23 um 04:30 schrieb Yong Wu:
> > From: John Stultz 
> >
> > Add proper refcounting on the dma_heap structure.
> > While existing heaps are built-in, we may eventually
> > have heaps loaded from modules, and we'll need to be
> > able to properly handle the references to the heaps
> >
> > Also moves minor tracking into the heap structure so
> > we can properly free things.
>
> This is completely unnecessary, see below.
>
> >
> > Signed-off-by: John Stultz 
> > Signed-off-by: T.J. Mercier 
> > Signed-off-by: Yong Wu 
> > [Yong: Just add comment for "minor" and "refcount"]
> > ---
> >   drivers/dma-buf/dma-heap.c | 38 ++
> >   include/linux/dma-heap.h   |  6 ++
> >   2 files changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 51030f6c9d6e..dcc0e38c61fa 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -11,6 +11,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -30,6 +31,8 @@
> >* @heap_devt:  heap device node
> >* @list:   list head connecting to list of heaps
> >* @heap_cdev:  heap char device
> > + * @minor:   heap device node minor number
> > + * @refcount:reference counter for this heap device
> >*
> >* Represents a heap of memory from which buffers can be made.
> >*/
> > @@ -40,6 +43,8 @@ struct dma_heap {
> >   dev_t heap_devt;
> >   struct list_head list;
> >   struct cdev heap_cdev;
> > + int minor;
> > + struct kref refcount;
> >   };
> >
> >   static LIST_HEAD(heap_list);
> > @@ -205,7 +210,6 @@ struct dma_heap *dma_heap_add(const struct 
> > dma_heap_export_info *exp_info)
> >   {
> >   struct dma_heap *heap, *h, *err_ret;
> >   struct device *dev_ret;
> > - unsigned int minor;
> >   int ret;
> >
> >   if (!exp_info->name || !strcmp(exp_info->name, "")) {
> > @@ -222,12 +226,13 @@ struct dma_heap *dma_heap_add(const struct 
> > dma_heap_export_info *exp_info)
> >   if (!heap)
> >   return ERR_PTR(-ENOMEM);
> >
> > + kref_init(>refcount);
> >   heap->name = exp_info->name;
> >   heap->ops = exp_info->ops;
> >   heap->priv = exp_info->priv;
> >
> >   /* Find unused minor number */
> > - ret = xa_alloc(_heap_minors, , heap,
> > + ret = xa_alloc(_heap_minors, >minor, heap,
> >  XA_LIMIT(0, NUM_HEAP_MINORS - 1), GFP_KERNEL);
> >   if (ret < 0) {
> >   pr_err("dma_heap: Unable to get minor number for heap\n");
> > @@ -236,7 +241,7 @@ struct dma_heap *dma_heap_add(const struct 
> > dma_heap_export_info *exp_info)
> >   }
> >
> >   /* Create device */
> > - heap->heap_devt = MKDEV(MAJOR(dma_heap_devt), minor);
> > + heap->heap_devt = MKDEV(MAJOR(dma_heap_devt), heap->minor);
> >
> >   cdev_init(>heap_cdev, _heap_fops);
> >   ret = cdev_add(>heap_cdev, heap->heap_devt, 1);
> > @@ -280,12 +285,37 @@ struct dma_heap *dma_heap_add(const struct 
> > dma_heap_export_info *exp_info)
> >   err2:
> >   cdev_del(>heap_cdev);
> >   err1:
> > - xa_erase(_heap_minors, minor);
> > + xa_erase(_heap_minors, heap->minor);
> >   err0:
> >   kfree(heap);
> >   return err_ret;
> >   }
> >
> > +static void dma_heap_release(struct kref *ref)
> > +{
> > + struct dma_heap *heap = container_of(ref, struct dma_heap, refcount);
> > +
> > + /* Note, we already holding the heap_list_lock here */
> > + list_del(>list);
> > +
> > + device_destroy(dma_heap_class, heap->heap_devt);
> > + cdev_del(>heap_cdev);
> > + xa_erase(_heap_minors, heap->minor);
>
> You can just use MINOR(heap->heap_devt) here instead.
>
Got it, thanks.

> > +
> > + kfree(heap);
> > +}
> > +
> > +void dma_heap_put(struct dma_heap *h)
> > +{
> > + /*
> > +  * Take the heap_list_lock now to avoid racing with code
> > +  * scanning the list and then taking a kref.
> >

Re: [PATCH 1/9] dma-buf: heaps: Deduplicate docs and adopt common format

2023-09-11 Thread T.J. Mercier

On Mon, Sep 11, 2023 at 2:36 AM Christian König
 wrote:
>
> m 11.09.23 um 04:30 schrieb Yong Wu:
> > From: "T.J. Mercier" 
> >
> > The docs for dma_heap_get_name were incorrect, and since they were
> > duplicated in the implementation file they were wrong there too.
> >
> > The docs formatting was inconsistent so I tried to make it more
> > consistent across functions since I'm already in here doing cleanup.
> >
> > Remove multiple unused includes.
> >
> > Signed-off-by: T.J. Mercier 
> > Signed-off-by: Yong Wu 
> > [Yong: Just add a comment for "priv" to mute build warning]
> > ---
> >   drivers/dma-buf/dma-heap.c | 29 +++--
> >   include/linux/dma-heap.h   | 11 +--
> >   2 files changed, 12 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 84ae708fafe7..51030f6c9d6e 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,17 +7,15 @@
> >*/
> >
> >   #include 
> > -#include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> > -#include 
> >   #include 
> > -#include 
> >   #include 
> > -#include 
> >   #include 
> > -#include 
> > +#include 
> > +#include 
> >   #include 
> >
> >   #define DEVNAME "dma_heap"
> > @@ -28,9 +26,10 @@
> >* struct dma_heap - represents a dmabuf heap in the system
> >* @name:   used for debugging/device-node name
> >* @ops:ops struct for this heap
> > - * @heap_devtheap device node
> > - * @list list head connecting to list of heaps
> > - * @heap_cdevheap char device
> > + * @priv:private data for this heap
> > + * @heap_devt:   heap device node
> > + * @list:list head connecting to list of heaps
> > + * @heap_cdev:   heap char device
> >*
> >* Represents a heap of memory from which buffers can be made.
> >*/
> > @@ -192,25 +191,11 @@ static const struct file_operations dma_heap_fops = {
> >   #endif
> >   };
> >
> > -/**
> > - * dma_heap_get_drvdata() - get per-subdriver data for the heap
> > - * @heap: DMA-Heap to retrieve private data for
> > - *
> > - * Returns:
> > - * The per-subdriver data for the heap.
> > - */
>
> Kernel documentation is usually kept on the implementation and not the
> definition.
>
> So I strongly suggest to remove the documentation from the header
> instead and if there is any additional information in there add it here.
>
> Regards,
> Christian.
>
Ok thanks for looking. I'll move all the function docs over to the
implementation.

Re: [RFC v5 00/17] DRM cgroup controller with scheduling control and memory stats

2023-07-20 Thread T.J. Mercier

On Thu, Jul 20, 2023 at 3:55 AM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 19/07/2023 21:31, T.J. Mercier wrote:
> > On Wed, Jul 12, 2023 at 4:47 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>drm.memory.stat
> >>  A nested file containing cumulative memory statistics for the 
> >> whole
> >>  sub-hierarchy, broken down into separate GPUs and separate memory
> >>  regions supported by the latter.
> >>
> >>  For example::
> >>
> >>$ cat drm.memory.stat
> >>card0 region=system total=12898304 shared=0 active=0 
> >> resident=12111872 purgeable=167936
> >>card0 region=stolen-system total=0 shared=0 active=0 resident=0 
> >> purgeable=0
> >>
> >>  Card designation corresponds to the DRM device names and multiple 
> >> line
> >>  entries can be present per card.
> >>
> >>  Memory region names should be expected to be driver specific with 
> >> the
> >>  exception of 'system' which is standardised and applicable for 
> >> GPUs
> >>  which can operate on system memory buffers.
> >>
> >>  Sub-keys 'resident' and 'purgeable' are optional.
> >>
> >>  Per category region usage is reported in bytes.
> >>
> >>   * Feedback from people interested in drm.active_us and drm.memory.stat is
> >> required to understand the use cases and their usefulness (of the 
> >> fields).
> >>
> >> Memory stats are something which was easy to add to my series, since I 
> >> was
> >> already working on the fdinfo memory stats patches, but the question 
> >> is how
> >> useful it is.
> >>
> > Hi Tvrtko,
> >
> > I think this style of driver-defined categories for reporting of
> > memory could potentially allow us to eliminate the GPU memory tracking
> > tracepoint used on Android (gpu_mem_total). This would involve reading
> > drm.memory.stat at the root cgroup (I see it's currently disabled on
>
> I can put it available under root too, don't think there is any
> technical reason to not have it. In fact, now that I look at it again,
> memory.stat is present on root so that would align with my general
> guideline to keep the two as similar as possible.
>
> > the root), which means traversing the whole cgroup tree under the
> > cgroup lock to generate the values on-demand. This would be done
> > rarely, but I still wonder what the cost of that would turn out to be.
>
> Yeah that's ugly. I could eliminate cgroup_lock by being a bit smarter.
> Just didn't think it worth it for the RFC.
>
> Basically to account memory stats for any sub-tree I need the equivalent
> one struct drm_memory_stats per DRM device present in the hierarchy. So
> I could pre-allocate a few and restart if run out of spares, or
> something. They are really small so pre-allocating a good number, based
> on past state or something, should would good enough. Or even total
> number of DRM devices in a system as a pessimistic and safe option for
> most reasonable deployments.
>
> > The drm_memory_stats categories in the output don't seem like a big
> > value-add for this use-case, but no real objection to them being
>
> You mean the fact there are different categories is not a value add for
> your use case because you would only use one?
>
Exactly, I guess that'd be just "private" (or pick another one) for
the driver-defined "regions" where
shared/private/resident/purgeable/active aren't really applicable.
That doesn't seem like a big problem to me since you already need an
understanding of what a driver-defined region means. It's just adding
a requirement to understand what fields are used, and a driver can
document that in the same place as the region itself. That does mean
performing arithmetic on values from different drivers might not make
sense. But this is just my perspective from trying to fit the
gpu_mem_total tracepoint here. I think we could probably change the
way drivers that use it report memory to fit closer into the
drm_memory_stats categories.

> The idea was to align 1:1 with DRM memory stats fdinfo and somewhat
> emulate how memory.stat also offers a breakdown.
>
> > there. I know it's called the DRM cgroup controller, but it'd be nice
> > if there were a way to make the mem tracking part work for any driver
> > that wishes to participate as many of our devices don't use a DRM
> > driver. But making that work doesn't look like it would fit very
>
> Ah that would be a cha

Re: [RFC v5 00/17] DRM cgroup controller with scheduling control and memory stats

2023-07-19 Thread T.J. Mercier

On Wed, Jul 12, 2023 at 4:47 AM Tvrtko Ursulin
 wrote:
>
>   drm.memory.stat
> A nested file containing cumulative memory statistics for the whole
> sub-hierarchy, broken down into separate GPUs and separate memory
> regions supported by the latter.
>
> For example::
>
>   $ cat drm.memory.stat
>   card0 region=system total=12898304 shared=0 active=0 
> resident=12111872 purgeable=167936
>   card0 region=stolen-system total=0 shared=0 active=0 resident=0 
> purgeable=0
>
> Card designation corresponds to the DRM device names and multiple line
> entries can be present per card.
>
> Memory region names should be expected to be driver specific with the
> exception of 'system' which is standardised and applicable for GPUs
> which can operate on system memory buffers.
>
> Sub-keys 'resident' and 'purgeable' are optional.
>
> Per category region usage is reported in bytes.
>
>  * Feedback from people interested in drm.active_us and drm.memory.stat is
>required to understand the use cases and their usefulness (of the fields).
>
>Memory stats are something which was easy to add to my series, since I was
>already working on the fdinfo memory stats patches, but the question is how
>useful it is.
>
Hi Tvrtko,

I think this style of driver-defined categories for reporting of
memory could potentially allow us to eliminate the GPU memory tracking
tracepoint used on Android (gpu_mem_total). This would involve reading
drm.memory.stat at the root cgroup (I see it's currently disabled on
the root), which means traversing the whole cgroup tree under the
cgroup lock to generate the values on-demand. This would be done
rarely, but I still wonder what the cost of that would turn out to be.
The drm_memory_stats categories in the output don't seem like a big
value-add for this use-case, but no real objection to them being
there. I know it's called the DRM cgroup controller, but it'd be nice
if there were a way to make the mem tracking part work for any driver
that wishes to participate as many of our devices don't use a DRM
driver. But making that work doesn't look like it would fit very
cleanly into this controller, so I'll just shut up now.

Thanks!
-T.J.

Re: [PATCH] MAINTAINERS: Remove Laura Abbott from DMA-BUF HEAPS FRAMEWORK

2023-06-30 Thread T.J. Mercier

On Fri, Jun 30, 2023 at 9:21 AM John Stultz  wrote:
>
> Laura's email address has not been valid for quite awhile now,
> so wanted to clean up the reviewer list here.
>
> I reached out to Laura who said it made sense to drop her from
> the list, so this patch does that.
>
> I do want to recognize Laura's long time contribution to this
> area and her previous ION maintainership, as this couldn't
> have gone upstream without her prior efforts. Many thanks!
>
> Cc: Laura Abbott 
> Cc: T.J. Mercier 
> Cc: Sumit Semwal 
> Cc: Benjamin Gaignard 
> Cc: Brian Starkey 
> Cc: John Stultz 
> Cc: linux-me...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-...@lists.linaro.org
> Cc: kernel-t...@android.com
> Acked-by: Laura Abbott 
> Signed-off-by: John Stultz 

Reviewed-by: T.J. Mercier

Re: [PATCH] MAINTAINERS: Remove Liam Mark from DMA-BUF HEAPS FRAMEWORK

2023-06-28 Thread T.J. Mercier

On Wed, Jun 28, 2023 at 1:39 PM John Stultz  wrote:
>
> On Wed, Jun 28, 2023 at 11:05 AM Jeffrey Hugo  wrote:
> >
> > @codeaurora.org email addresses are no longer valid and will bounce.
> >
> > I reached out to Liam about updating his entry under DMA-BUF HEAPS
> > FRAMEWORK with an @codeaurora.org address.  His response:
> >
> > "I am not a maintainer anymore, that should be removed."
> >
> > Liam currently does not have an email address that can be used to remove
> > this entry, so I offered to submit a cleanup on his behalf with Liam's
> > consent.
> >
> > Signed-off-by: Jeffrey Hugo 
> > ---
> >  MAINTAINERS | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 76b53bafc03c..1781eb0a8dda 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6168,7 +6168,6 @@ F:kernel/dma/
> >  DMA-BUF HEAPS FRAMEWORK
> >  M: Sumit Semwal 
> >  R: Benjamin Gaignard 
> > -R: Liam Mark 
> >  R: Laura Abbott 
> >  R: Brian Starkey 
> >  R: John Stultz 
> > --
>
> Acked-by: John Stultz 
>
> Though probably worth adding TJ as a reviewer?

Yes please!

Re: [PATCH v4 2/6] dma-buf/heaps: Don't assert held reservation lock for dma-buf mmapping

2023-06-21 Thread T.J. Mercier

On Wed, Jun 21, 2023 at 11:16 AM Dmitry Osipenko
 wrote:
>
> Hi,
>
> On 6/21/23 20:21, T.J. Mercier wrote:
> > On Mon, May 29, 2023 at 3:46 PM Dmitry Osipenko
> >  wrote:
> >>
> >> Don't assert held dma-buf reservation lock on memory mapping of exported
> >> buffer.
> >>
> >> We're going to change dma-buf mmap() locking policy such that exporters
> >> will have to handle the lock. The previous locking policy caused deadlock
> >> problem for DRM drivers in a case of self-imported dma-bufs once these
> >> drivers are moved to use reservation lock universally. The problem
> >> solved by moving the lock down to exporters. This patch prepares dma-buf
> >> heaps for the locking policy update.
> >>
> > Hi Dmitry,
> >
> > I see that in patch 6 of this series calls to
> > dma_resv_lock/dma_resv_unlock have been added to the
> > drm_gem_shmem_helper functions and some exporters. But I'm curious why
> > no dma_resv_lock/dma_resv_unlock calls were added to these two dma-buf
> > heap exporters for mmap?
>
> DMA-buf heaps are exporters, drm_gem_shmem_helper is importer. Locking
> rules are different for importers and exporters.
>
> DMA-heaps use own locking, they can be moved to resv lock in the future.
>
> DMA-heaps don't protect internal data in theirs mmap() implementations,
> nothing to protect there.
>
Thanks.

Re: [PATCH v4 2/6] dma-buf/heaps: Don't assert held reservation lock for dma-buf mmapping

2023-06-21 Thread T.J. Mercier

On Mon, May 29, 2023 at 3:46 PM Dmitry Osipenko
 wrote:
>
> Don't assert held dma-buf reservation lock on memory mapping of exported
> buffer.
>
> We're going to change dma-buf mmap() locking policy such that exporters
> will have to handle the lock. The previous locking policy caused deadlock
> problem for DRM drivers in a case of self-imported dma-bufs once these
> drivers are moved to use reservation lock universally. The problem
> solved by moving the lock down to exporters. This patch prepares dma-buf
> heaps for the locking policy update.
>
Hi Dmitry,

I see that in patch 6 of this series calls to
dma_resv_lock/dma_resv_unlock have been added to the
drm_gem_shmem_helper functions and some exporters. But I'm curious why
no dma_resv_lock/dma_resv_unlock calls were added to these two dma-buf
heap exporters for mmap?

Thanks,
T.J.

> Reviewed-by: Emil Velikov 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/dma-buf/heaps/cma_heap.c| 3 ---
>  drivers/dma-buf/heaps/system_heap.c | 3 ---
>  2 files changed, 6 deletions(-)
>
> diff --git a/drivers/dma-buf/heaps/cma_heap.c 
> b/drivers/dma-buf/heaps/cma_heap.c
> index 1131fb943992..28fb04eccdd0 100644
> --- a/drivers/dma-buf/heaps/cma_heap.c
> +++ b/drivers/dma-buf/heaps/cma_heap.c
> @@ -13,7 +13,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -183,8 +182,6 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct 
> vm_area_struct *vma)
>  {
> struct cma_heap_buffer *buffer = dmabuf->priv;
>
> -   dma_resv_assert_held(dmabuf->resv);
> -
> if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> return -EINVAL;
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c 
> b/drivers/dma-buf/heaps/system_heap.c
> index e8bd10e60998..fcf836ba9c1f 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -13,7 +13,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -202,8 +201,6 @@ static int system_heap_mmap(struct dma_buf *dmabuf, 
> struct vm_area_struct *vma)
> struct sg_page_iter piter;
> int ret;
>
> -   dma_resv_assert_held(dmabuf->resv);
> -
> for_each_sgtable_page(table, , vma->vm_pgoff) {
> struct page *page = sg_page_iter_page();
>
> --
> 2.40.1
>

Re: [LSF/MM/BPF proposal]: Physr discussion

2023-02-28 Thread T.J. Mercier

On Sat, Jan 21, 2023 at 7:03 AM Jason Gunthorpe  wrote:
>
> I would like to have a session at LSF to talk about Matthew's
> physr discussion starter:
>
>  https://lore.kernel.org/linux-mm/ydykweu0htv8m...@casper.infradead.org/
>
> I have become interested in this with some immediacy because of
> IOMMUFD and this other discussion with Christoph:
>
>  
> https://lore.kernel.org/kvm/4-v2-472615b3877e+28f7-vfio_dma_buf_...@nvidia.com/
>
> Which results in, more or less, we have no way to do P2P DMA
> operations without struct page - and from the RDMA side solving this
> well at the DMA API means advancing at least some part of the physr
> idea.
>
> So - my objective is to enable to DMA API to "DMA map" something that
> is not a scatterlist, may or may not contain struct pages, but can
> still contain P2P DMA data. From there I would move RDMA MR's to use
> this new API, modify DMABUF to export it, complete the above VFIO
> series, and finally, use all of this to add back P2P support to VFIO
> when working with IOMMUFD by allowing IOMMUFD to obtain a safe
> reference to the VFIO memory using DMABUF. From there we'd want to see
> pin_user_pages optimized, and that also will need some discussion how
> best to structure it.
>
> I also have several ideas on how something like physr can optimize the
> iommu driver ops when working with dma-iommu.c and IOMMUFD.
>
> I've been working on an implementation and hope to have something
> draft to show on the lists in a few weeks. It is pretty clear there
> are several interesting decisions to make that I think will benefit
> from a live discussion.
>
> Providing a kernel-wide alternative to scatterlist is something that
> has general interest across all the driver subsystems. I've started to
> view the general problem rather like xarray where the main focus is to
> create the appropriate abstraction and then go about transforming
> users to take advatange of the cleaner abstraction. scatterlist
> suffers here because it has an incredibly leaky API, a huge number of
> (often sketchy driver) users, and has historically been very difficult
> to improve.
>
> The session would quickly go over the current state of whatever the
> mailing list discussion evolves into and an open discussion around the
> different ideas.
>
> Thanks,
> Jason
>

Hi, I'm interested in participating in this discussion!

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-02 Thread T.J. Mercier

On Wed, Feb 1, 2023 at 6:52 AM Tvrtko Ursulin
 wrote:
>
>
> On 01/02/2023 14:23, Tvrtko Ursulin wrote:
> >
> > On 01/02/2023 01:49, T.J. Mercier wrote:
> >> On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 25/01/2023 20:04, T.J. Mercier wrote:
> >>>> On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
> >>>>  wrote:
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On 25/01/2023 11:52, Michal Hocko wrote:
> >>>>>> On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >>>>>>> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>>>>>>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> >>>>>>>>> When a buffer is exported to userspace, use memcg to attribute the
> >>>>>>>>> buffer to the allocating cgroup until all buffer references are
> >>>>>>>>> released.
> >>>>>>>>
> >>>>>>>> Is there any reason why this memory cannot be charged during the
> >>>>>>>> allocation (__GFP_ACCOUNT used)?
> >>>>>>>> Also you do charge and account the memory but underlying pages
> >>>>>>>> do not
> >>>>>>>> know about their memcg (this is normally done with commit_charge
> >>>>>>>> for
> >>>>>>>> user mapped pages). This would become a problem if the memory is
> >>>>>>>> migrated for example.
> >>>>>>>
> >>>>>>> I don't think this is movable memory.
> >>>>>>>
> >>>>>>>> This also means that you have to maintain memcg
> >>>>>>>> reference outside of the memcg proper which is not really nice
> >>>>>>>> either.
> >>>>>>>> This mimicks tcp kmem limit implementation which I really have
> >>>>>>>> to say I
> >>>>>>>> am not a great fan of and this pattern shouldn't be coppied.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I think we should keep the discussion on technical merits instead of
> >>>>>>> personal perference. To me using skmem like interface is totally
> >>>>>>> fine
> >>>>>>> but the pros/cons need to be very explicit and the clear reasons to
> >>>>>>> select that option should be included.
> >>>>>>
> >>>>>> I do agree with that. I didn't want sound to be personal wrt tcp kmem
> >>>>>> accounting but the overall code maintenance cost is higher because
> >>>>>> of how tcp take on accounting differs from anything else in the memcg
> >>>>>> proper. I would prefer to not grow another example like that.
> >>>>>>
> >>>>>>> To me there are two options:
> >>>>>>>
> >>>>>>> 1. Using skmem like interface as this patch series:
> >>>>>>>
> >>>>>>> The main pros of this option is that it is very simple. Let me
> >>>>>>> list down
> >>>>>>> the cons of this approach:
> >>>>>>>
> >>>>>>> a. There is time window between the actual memory allocation/free
> >>>>>>> and
> >>>>>>> the charge and uncharge and [un]charge happen when the whole
> >>>>>>> memory is
> >>>>>>> allocated or freed. I think for the charge path that might not be
> >>>>>>> a big
> >>>>>>> issue but on the uncharge, this can cause issues. The application
> >>>>>>> and
> >>>>>>> the potential shrinkers have freed some of this dmabuf memory but
> >>>>>>> until
> >>>>>>> the whole dmabuf is freed, the memcg uncharge will not happen.
> >>>>>>> This can
> >>>>>>> consequences on reclaim and oom behavior of the application.
> >>>>>>>
> >>>>>>> b. Due to the usage model i.e. a central daemon allocating the
> >>>>>>> dmabuf
> >>>>>>> memory upfront, there is a

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-02 Thread T.J. Mercier

On Wed, Feb 1, 2023 at 6:23 AM Tvrtko Ursulin
 wrote:
>
>
> On 01/02/2023 01:49, T.J. Mercier wrote:
> > On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 25/01/2023 20:04, T.J. Mercier wrote:
> >>> On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 25/01/2023 11:52, Michal Hocko wrote:
> >>>>> On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >>>>>> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>>>>>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> >>>>>>>> When a buffer is exported to userspace, use memcg to attribute the
> >>>>>>>> buffer to the allocating cgroup until all buffer references are
> >>>>>>>> released.
> >>>>>>>
> >>>>>>> Is there any reason why this memory cannot be charged during the
> >>>>>>> allocation (__GFP_ACCOUNT used)?
> >>>>>>> Also you do charge and account the memory but underlying pages do not
> >>>>>>> know about their memcg (this is normally done with commit_charge for
> >>>>>>> user mapped pages). This would become a problem if the memory is
> >>>>>>> migrated for example.
> >>>>>>
> >>>>>> I don't think this is movable memory.
> >>>>>>
> >>>>>>> This also means that you have to maintain memcg
> >>>>>>> reference outside of the memcg proper which is not really nice either.
> >>>>>>> This mimicks tcp kmem limit implementation which I really have to say 
> >>>>>>> I
> >>>>>>> am not a great fan of and this pattern shouldn't be coppied.
> >>>>>>>
> >>>>>>
> >>>>>> I think we should keep the discussion on technical merits instead of
> >>>>>> personal perference. To me using skmem like interface is totally fine
> >>>>>> but the pros/cons need to be very explicit and the clear reasons to
> >>>>>> select that option should be included.
> >>>>>
> >>>>> I do agree with that. I didn't want sound to be personal wrt tcp kmem
> >>>>> accounting but the overall code maintenance cost is higher because
> >>>>> of how tcp take on accounting differs from anything else in the memcg
> >>>>> proper. I would prefer to not grow another example like that.
> >>>>>
> >>>>>> To me there are two options:
> >>>>>>
> >>>>>> 1. Using skmem like interface as this patch series:
> >>>>>>
> >>>>>> The main pros of this option is that it is very simple. Let me list 
> >>>>>> down
> >>>>>> the cons of this approach:
> >>>>>>
> >>>>>> a. There is time window between the actual memory allocation/free and
> >>>>>> the charge and uncharge and [un]charge happen when the whole memory is
> >>>>>> allocated or freed. I think for the charge path that might not be a big
> >>>>>> issue but on the uncharge, this can cause issues. The application and
> >>>>>> the potential shrinkers have freed some of this dmabuf memory but until
> >>>>>> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> >>>>>> consequences on reclaim and oom behavior of the application.
> >>>>>>
> >>>>>> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> >>>>>> memory upfront, there is a requirement to have a memcg charge transfer
> >>>>>> functionality to transfer the charge from the central daemon to the
> >>>>>> client applications. This does introduce complexity and avenues of 
> >>>>>> weird
> >>>>>> reclaim and oom behavior.
> >>>>>>
> >>>>>>
> >>>>>> 2. Allocate and charge the memory on page fault by actual user
> >>>>>>
> >>>>>> In this approach, the memory is not allocated upfront by the central
> >>>>>> daemon but rather on the page fault by the client application an

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-31 Thread T.J. Mercier

On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:
>
>
> On 25/01/2023 20:04, T.J. Mercier wrote:
> > On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> Hi,
> >>
> >> On 25/01/2023 11:52, Michal Hocko wrote:
> >>> On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >>>> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>>>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> >>>>>> When a buffer is exported to userspace, use memcg to attribute the
> >>>>>> buffer to the allocating cgroup until all buffer references are
> >>>>>> released.
> >>>>>
> >>>>> Is there any reason why this memory cannot be charged during the
> >>>>> allocation (__GFP_ACCOUNT used)?
> >>>>> Also you do charge and account the memory but underlying pages do not
> >>>>> know about their memcg (this is normally done with commit_charge for
> >>>>> user mapped pages). This would become a problem if the memory is
> >>>>> migrated for example.
> >>>>
> >>>> I don't think this is movable memory.
> >>>>
> >>>>> This also means that you have to maintain memcg
> >>>>> reference outside of the memcg proper which is not really nice either.
> >>>>> This mimicks tcp kmem limit implementation which I really have to say I
> >>>>> am not a great fan of and this pattern shouldn't be coppied.
> >>>>>
> >>>>
> >>>> I think we should keep the discussion on technical merits instead of
> >>>> personal perference. To me using skmem like interface is totally fine
> >>>> but the pros/cons need to be very explicit and the clear reasons to
> >>>> select that option should be included.
> >>>
> >>> I do agree with that. I didn't want sound to be personal wrt tcp kmem
> >>> accounting but the overall code maintenance cost is higher because
> >>> of how tcp take on accounting differs from anything else in the memcg
> >>> proper. I would prefer to not grow another example like that.
> >>>
> >>>> To me there are two options:
> >>>>
> >>>> 1. Using skmem like interface as this patch series:
> >>>>
> >>>> The main pros of this option is that it is very simple. Let me list down
> >>>> the cons of this approach:
> >>>>
> >>>> a. There is time window between the actual memory allocation/free and
> >>>> the charge and uncharge and [un]charge happen when the whole memory is
> >>>> allocated or freed. I think for the charge path that might not be a big
> >>>> issue but on the uncharge, this can cause issues. The application and
> >>>> the potential shrinkers have freed some of this dmabuf memory but until
> >>>> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> >>>> consequences on reclaim and oom behavior of the application.
> >>>>
> >>>> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> >>>> memory upfront, there is a requirement to have a memcg charge transfer
> >>>> functionality to transfer the charge from the central daemon to the
> >>>> client applications. This does introduce complexity and avenues of weird
> >>>> reclaim and oom behavior.
> >>>>
> >>>>
> >>>> 2. Allocate and charge the memory on page fault by actual user
> >>>>
> >>>> In this approach, the memory is not allocated upfront by the central
> >>>> daemon but rather on the page fault by the client application and the
> >>>> memcg charge happen at the same time.
> >>>>
> >>>> The only cons I can think of is this approach is more involved and may
> >>>> need some clever tricks to track the page on the free patch i.e. we to
> >>>> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> >>>>
> >>>> The pros of this approach is there is no need have a charge transfer
> >>>> functionality and the charge/uncharge being closely tied to the actual
> >>>> memory allocation and free.
> >>>>
> >>>> Personally I would prefer the second approach but I don't want to just
> >>>> block this work if th

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread T.J. Mercier

On Wed, Jan 25, 2023 at 4:05 AM Michal Hocko  wrote:
>
> On Tue 24-01-23 10:55:21, T.J. Mercier wrote:
> > On Tue, Jan 24, 2023 at 7:00 AM Michal Hocko  wrote:
> > >
> > > On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > > > When a buffer is exported to userspace, use memcg to attribute the
> > > > buffer to the allocating cgroup until all buffer references are
> > > > released.
> > >
> > > Is there any reason why this memory cannot be charged during the
> > > allocation (__GFP_ACCOUNT used)?
> >
> > My main motivation was to keep code changes away from exporters and
> > implement the accounting in one common spot for all of them. This is a
> > bit of a carryover from a previous approach [1] where there was some
> > objection to pushing off this work onto exporters and forcing them to
> > adapt, but __GFP_ACCOUNT does seem like a smaller burden than before
> > at least initially. However in order to support charge transfer
> > between cgroups with __GFP_ACCOUNT we'd need to be able to get at the
> > pages backing dmabuf objects, and the exporters are the ones with that
> > access. Meaning I think we'd have to add some additional dma_buf_ops
> > to achieve that, which was the objection from [1].
> >
> > [1] 
> > https://lore.kernel.org/lkml/5cc27a05-8131-ce9b-dea1-5c75e9942...@amd.com/
> >
> > >
> > > Also you do charge and account the memory but underlying pages do not
> > > know about their memcg (this is normally done with commit_charge for
> > > user mapped pages). This would become a problem if the memory is
> > > migrated for example.
> >
> > Hmm, what problem do you see in this situation? If the backing pages
> > are to be migrated that requires the cooperation of the exporter,
> > which currently has no influence on how the cgroup charging is done
> > and that seems fine. (Unless you mean migrating the charge across
> > cgroups? In which case that's the next patch.)
>
> My main concern was that page migration could lose the external tracking
> without some additional steps on the dmabuf front.
>
I see, yes that would be true if an exporter moves data around between
system memory and VRAM for example. (I think TTM does this sort of
thing, but not sure if that's actually within a single dma buffer.)
VRAM feels like it maybe doesn't belong in memcg, yet it would still
be charged there under this series right now. I don't really see a way
around this except to involve the exporters directly in the accounting
(or don't attempt to distinguish between types of memory).

> > > This also means that you have to maintain memcg
> > > reference outside of the memcg proper which is not really nice either.
> > > This mimicks tcp kmem limit implementation which I really have to say I
> > > am not a great fan of and this pattern shouldn't be coppied.
> > >
> > Ah, what can I say. This way looked simple to me. I think otherwise
> > we're back to making all exporters do more stuff for the accounting.
> >
> > > Also you are not really saying anything about the oom behavior. With
> > > this implementation the kernel will try to reclaim the memory and even
> > > trigger the memcg oom killer if the request size is <= 8 pages. Is this
> > > a desirable behavior?
> >
> > It will try to reclaim some memory, but not the dmabuf pages right?
> > Not *yet* anyway. This behavior sounds expected to me.
>
> Yes, we have discussed that shrinkers will follow up later which is
> fine. The question is how much reclaim actually makes sense at this
> stage. Charging interface usually copes with sizes resulting from
> allocation requests (so usually 1< batch charge like implemented here could easily be 100s of MBs and it is
> much harder to define reclaim targets for. At least that is something
> the memcg charging hasn't really considered yet.  Maybe the existing
> try_charge implementation can cope with that just fine but it would be
> really great to have the expected behavior described.
>
> E.g. should be memcg OOM killer be invoked? Should reclaim really target
> regular memory at all costs or just a lightweight memory reclaim is
> preferred (is the dmabuf charge failure an expensive operation wrt.
> memory refault due to reclaim).

Ah, in my experience very large individual buffers like that are rare.
Cumulative system-wide usage might reach 100s of megs or more spread
across many buffers. On my phone the majority of buffer sizes are 4
pages or less, but there are a few that reach into the tens of megs.
But now I see your point. I still think that where a memcg limit is
exceeded and we can't re

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread T.J. Mercier

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 25/01/2023 11:52, Michal Hocko wrote:
> > On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> >>>> When a buffer is exported to userspace, use memcg to attribute the
> >>>> buffer to the allocating cgroup until all buffer references are
> >>>> released.
> >>>
> >>> Is there any reason why this memory cannot be charged during the
> >>> allocation (__GFP_ACCOUNT used)?
> >>> Also you do charge and account the memory but underlying pages do not
> >>> know about their memcg (this is normally done with commit_charge for
> >>> user mapped pages). This would become a problem if the memory is
> >>> migrated for example.
> >>
> >> I don't think this is movable memory.
> >>
> >>> This also means that you have to maintain memcg
> >>> reference outside of the memcg proper which is not really nice either.
> >>> This mimicks tcp kmem limit implementation which I really have to say I
> >>> am not a great fan of and this pattern shouldn't be coppied.
> >>>
> >>
> >> I think we should keep the discussion on technical merits instead of
> >> personal perference. To me using skmem like interface is totally fine
> >> but the pros/cons need to be very explicit and the clear reasons to
> >> select that option should be included.
> >
> > I do agree with that. I didn't want sound to be personal wrt tcp kmem
> > accounting but the overall code maintenance cost is higher because
> > of how tcp take on accounting differs from anything else in the memcg
> > proper. I would prefer to not grow another example like that.
> >
> >> To me there are two options:
> >>
> >> 1. Using skmem like interface as this patch series:
> >>
> >> The main pros of this option is that it is very simple. Let me list down
> >> the cons of this approach:
> >>
> >> a. There is time window between the actual memory allocation/free and
> >> the charge and uncharge and [un]charge happen when the whole memory is
> >> allocated or freed. I think for the charge path that might not be a big
> >> issue but on the uncharge, this can cause issues. The application and
> >> the potential shrinkers have freed some of this dmabuf memory but until
> >> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> >> consequences on reclaim and oom behavior of the application.
> >>
> >> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> >> memory upfront, there is a requirement to have a memcg charge transfer
> >> functionality to transfer the charge from the central daemon to the
> >> client applications. This does introduce complexity and avenues of weird
> >> reclaim and oom behavior.
> >>
> >>
> >> 2. Allocate and charge the memory on page fault by actual user
> >>
> >> In this approach, the memory is not allocated upfront by the central
> >> daemon but rather on the page fault by the client application and the
> >> memcg charge happen at the same time.
> >>
> >> The only cons I can think of is this approach is more involved and may
> >> need some clever tricks to track the page on the free patch i.e. we to
> >> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> >>
> >> The pros of this approach is there is no need have a charge transfer
> >> functionality and the charge/uncharge being closely tied to the actual
> >> memory allocation and free.
> >>
> >> Personally I would prefer the second approach but I don't want to just
> >> block this work if the dmabuf folks are ok with the cons mentioned of
> >> the first approach.
> >
> > I am not familiar with dmabuf internals to judge complexity on their end
> > but I fully agree that charge-when-used is much more easier to reason
> > about and it should have less subtle surprises.
>
> Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
> am missing something, but in principle yes, I agree that the 2nd option
> (charge the user, not exporter) should be preferred. Thing being that at
> export time there may not be any backing store allocated, plus if the
> series is restricting the charge transfer to just Android clients then
> it seems it has the poten

Re: DMA-heap driver hints

2023-01-24 Thread T.J. Mercier

gt; > registry of constraint types, similar to the Linux kernel's
> > drm_fourcc.h for modifiers, or the Khronos github repository for
> > Vulkan vendor IDs. If the definition needs to be used by the kernel,
> > the kernel is the logical repository for that role IMHO.
> >
> > In our 2020 discussion, there was some debate over whether the kernel
> > should expose and/or consume constraints directly, or whether it's
> > sufficient to expose lower-level mechanisms from the kernel and keep
> > the translation of constraints to the correct mechanism in userspace.
> > There are pros/cons to both. I don't know that we bottomed out on that
> > part of the discussion, and it could be the right answer is some
> > combination of the two, as suggested below.
> >
> >>> What I want to do is to separate the problem. The kernel only provides
> >>> the information where to allocate from, figuring out the details like
> >>> how many bytes, which format, plane layout etc.. is still the job of
> >>> userspace.
> >>
> >> Even with UVC, where to allocate memory from will depend on the use
> >> case. If the consumer is a device that doesn't support non-contiguous
> >> DMA, the system heap won't work.
> >>
> >> Actually, could you explain why UVC works better with the system heap ?
> >> I'm looking at videobuf2 as an importer, and it doesn't call the dmabuf
> >> as far as I can tell, so cache management provided by the exporter seems
> >> to be bypassed in any case.
> >>
> >>> What we do have is compatibility between heaps. E.g. a CMA heap is
> >>> usually compatible with the system heap or might even be a subset of
> >>> another CMA heap. But I wanted to add that as next step to the heaps
> >>> framework itself.
> >>>
> >>>> - Devices could have different constraints based on particular
> >>>> configurations. For instance, a device may require specific memory
> >>>> layout for multi-planar YUV formats only (as in allocating the
> >>>> Y and C
> >>>> planes of NV12 from different memory banks). A dynamic API may
> >>>> thus be
> >>>> needed (but may also be very painful to use from userspace).
> >>>
> >>> Uff, good to know. But I'm not sure how to expose stuff like that.
> >>
> >> Let's see if James has anything to share with us :-) With a bit of luck
> >> we won't have to start from scratch.
> >
> > Well, this is the hard example we keep using as a measure of success
> > for whatever we come up with. I don't know that someone ever sat down
> > and tried to express this in the mechanism Simon and I proposed in
> > 2020, but allowing the expression of something that complex was
> > certainly our goal. How to resolve it down to an allocation mechanism,
> > I believe, was further than we got, but we weren't that well versed in
> > DMA heaps back then, or at least I wasn't.
> >
> >>>>> What's still missing is certainly matching userspace for this since I
> >>>>> wanted to discuss the initial kernel approach first.
> >>>>
> >>>> https://git.libcamera.org/libcamera/libcamera.git/ would be a good
> >>>> place
> >>>> to prototype userspace support :-)
> >>>
> >>> Thanks for the pointer and the review,
> >>
> >> By the way, side question, does anyone know what the status of dma heaps
> >> support is in major distributions ? On my Gentoo box,
> >> /dev/dma_heap/system is 0600 root:root. That's easy to change for a
> >> developer, but not friendly to end-users. If we want to move forward
> >> with dma heaps as standard multimedia allocators (and I would really
> >> like to see that happening), we have to make sure they can be used.
> >
> > We seem to have reached a world where display (primary nodes) are
> > carefully guarded, and some mildly trusted group of folks (video) can
> > access render nodes, but then there's some separate group generally
> > for camera/video/v4l and whatnot from what I've seen (I don't survey
> > this stuff that often. I live in my developer bubble). I'm curious
> > whether the right direction is a broader group that encompasses all of
> > render nodes, multimedia, and heaps, or if a more segmented design is
> > right. The latter is probably the better design from first principles,
> > but might lead to headaches if the permissions diverge.
>
> The main argument is that this memory is not pro

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-24 Thread T.J. Mercier

On Tue, Jan 24, 2023 at 7:00 AM Michal Hocko  wrote:
>
> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
>
> Is there any reason why this memory cannot be charged during the
> allocation (__GFP_ACCOUNT used)?

My main motivation was to keep code changes away from exporters and
implement the accounting in one common spot for all of them. This is a
bit of a carryover from a previous approach [1] where there was some
objection to pushing off this work onto exporters and forcing them to
adapt, but __GFP_ACCOUNT does seem like a smaller burden than before
at least initially. However in order to support charge transfer
between cgroups with __GFP_ACCOUNT we'd need to be able to get at the
pages backing dmabuf objects, and the exporters are the ones with that
access. Meaning I think we'd have to add some additional dma_buf_ops
to achieve that, which was the objection from [1].

[1] https://lore.kernel.org/lkml/5cc27a05-8131-ce9b-dea1-5c75e9942...@amd.com/

>
> Also you do charge and account the memory but underlying pages do not
> know about their memcg (this is normally done with commit_charge for
> user mapped pages). This would become a problem if the memory is
> migrated for example.

Hmm, what problem do you see in this situation? If the backing pages
are to be migrated that requires the cooperation of the exporter,
which currently has no influence on how the cgroup charging is done
and that seems fine. (Unless you mean migrating the charge across
cgroups? In which case that's the next patch.)

> This also means that you have to maintain memcg
> reference outside of the memcg proper which is not really nice either.
> This mimicks tcp kmem limit implementation which I really have to say I
> am not a great fan of and this pattern shouldn't be coppied.
>
Ah, what can I say. This way looked simple to me. I think otherwise
we're back to making all exporters do more stuff for the accounting.

> Also you are not really saying anything about the oom behavior. With
> this implementation the kernel will try to reclaim the memory and even
> trigger the memcg oom killer if the request size is <= 8 pages. Is this
> a desirable behavior?

It will try to reclaim some memory, but not the dmabuf pages right?
Not *yet* anyway. This behavior sounds expected to me. I would only
expect it to be surprising for cgroups making heavy use of dmabufs
(that weren't accounted before) *and* with hard limits already very
close to actual usage. I remember Johannes mentioning that what counts
under memcg use is already a bit of a moving target.

> --
> Michal Hocko
> SUSE Labs

[PATCH v2 2/4] dmabuf: Add cgroup charge transfer function

2023-01-23 Thread T.J. Mercier

The dma_buf_transfer_charge function provides a way for processes to
transfer charge of a buffer to a different cgroup. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory, and drops all references to the allocated memory.

Signed-off-by: T.J. Mercier 
---
 drivers/dma-buf/dma-buf.c  | 56 ++
 include/linux/dma-buf.h|  1 +
 include/linux/memcontrol.h |  5 
 3 files changed, 62 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a6a8cb5cb32d..ac3d02a7ecf8 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -11,6 +11,7 @@
  * refining of this idea.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1626,6 +1627,61 @@ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, 
struct iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
 
+/**
+ * dma_buf_transfer_charge - Change the cgroup to which the provided dma_buf 
is charged.
+ * @dmabuf_file:   [in]file for buffer whose charge will be migrated 
to a different cgroup
+ * @target:[in]the task_struct of the destination process for 
the cgroup charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_transfer_charge(struct file *dmabuf_file, struct task_struct 
*target)
+{
+   struct mem_cgroup *current_cg, *target_cg;
+   struct dma_buf *dmabuf;
+   unsigned int nr_pages;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_MEMCG))
+   return 0;
+
+   if (WARN_ON(!dmabuf_file) || WARN_ON(!target))
+   return -EINVAL;
+
+   if (!is_dma_buf_file(dmabuf_file))
+   return -EBADF;
+   dmabuf = dmabuf_file->private_data;
+
+   nr_pages = PAGE_ALIGN(dmabuf->size) / PAGE_SIZE;
+   current_cg = mem_cgroup_from_task(current);
+   target_cg = get_mem_cgroup_from_mm(target->mm);
+
+   if (current_cg == target_cg)
+   goto skip_transfer;
+
+   if (!mem_cgroup_charge_dmabuf(target_cg, nr_pages, GFP_KERNEL)) {
+   ret = -ENOMEM;
+   goto skip_transfer;
+   }
+
+   if (cmpxchg(>memcg, current_cg, target_cg) != current_cg) {
+   /* Only the current owner can transfer the charge */
+   ret = -EPERM;
+   mem_cgroup_uncharge_dmabuf(target_cg, nr_pages);
+   goto skip_transfer;
+   }
+
+   mem_cgroup_uncharge_dmabuf(current_cg, nr_pages);
+   mem_cgroup_put(current_cg); /* unref from buffer - buffer keeps new ref 
to target_cg */
+   return 0;
+
+skip_transfer:
+   mem_cgroup_put(target_cg);
+   return ret;
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1f0ffb8e4bf5..f25eb8e60fb2 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -634,4 +634,5 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map 
*map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
 int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
+int dma_buf_transfer_charge(struct file *dmabuf_file, struct task_struct 
*target);
 #endif /* __DMA_BUF_H__ */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c10b8565fdbf..009298a446fe 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1335,6 +1335,11 @@ struct mem_cgroup *mem_cgroup_from_css(struct 
cgroup_subsys_state *css)
return NULL;
 }
 
+static inline struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
+{
+   return NULL;
+}
+
 static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 {
 }
-- 
2.39.0.246.g2a6d74b583-goog

[PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-23 Thread T.J. Mercier

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.

Unlike the dmabuf sysfs stats implementation, this memcg accounting
avoids contention over the kernfs_rwsem incurred when creating or
removing nodes.

Signed-off-by: T.J. Mercier 
---
 Documentation/admin-guide/cgroup-v2.rst |  4 +++
 drivers/dma-buf/dma-buf.c   | 13 +
 include/linux/dma-buf.h |  3 ++
 include/linux/memcontrol.h  | 38 +
 mm/memcontrol.c | 19 +
 5 files changed, 77 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index c8ae7c897f14..538ae22bc514 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used for storing in-kernel data
structures.
 
+ dmabuf (npn)
+   Amount of memory used for exported DMA buffers allocated by the 
cgroup.
+   Stays with the allocating cgroup regardless of how the buffer 
is shared.
+
  workingset_refault_anon
Number of refaults of previously evicted anonymous pages.
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e6528767efc7..a6a8cb5cb32d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,6 +75,9 @@ static void dma_buf_release(struct dentry *dentry)
 */
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
 
+   mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / 
PAGE_SIZE);
+   mem_cgroup_put(dmabuf->memcg);
+
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
 
@@ -673,6 +676,13 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
if (ret)
goto err_dmabuf;
 
+   dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
+   if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / 
PAGE_SIZE,
+ GFP_KERNEL)) {
+   ret = -ENOMEM;
+   goto err_memcg;
+   }
+
file->private_data = dmabuf;
file->f_path.dentry->d_fsdata = dmabuf;
dmabuf->file = file;
@@ -683,6 +693,9 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
 
return dmabuf;
 
+err_memcg:
+   mem_cgroup_put(dmabuf->memcg);
+   dma_buf_stats_teardown(dmabuf);
 err_dmabuf:
if (!resv)
dma_resv_fini(dmabuf->resv);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6fa8d4e29719..1f0ffb8e4bf5 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct dma_buf;
@@ -446,6 +447,8 @@ struct dma_buf {
struct dma_buf *dmabuf;
} *sysfs_entry;
 #endif
+   /* The cgroup to which this buffer is currently attributed */
+   struct mem_cgroup *memcg;
 };
 
 /**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..c10b8565fdbf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,6 +37,7 @@ enum memcg_stat_item {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+   MEMCG_DMABUF,
MEMCG_NR_STAT,
 };
 
@@ -673,6 +674,25 @@ static inline int mem_cgroup_charge(struct folio *folio, 
struct mm_struct *mm,
 
 int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
  gfp_t gfp, swp_entry_t entry);
+
+/**
+ * mem_cgroup_charge_dmabuf - Charge dma-buf memory to a cgroup and update 
stat counter
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ * @gfp_mask: reclaim mode
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if it doesn't.
+ */
+bool __mem_cgroup_charge_dmabuf(struct mem_cgroup *memcg, unsigned int 
nr_pages, gfp_t gfp_mask);
+static inline bool mem_cgroup_charge_dmabuf(struct mem_cgroup *memcg, unsigned 
int nr_pages,
+   gfp_t gfp_mask)
+{
+   if (mem_cgroup_disabled())
+   return 0;
+   return __mem_cgroup_charge_dmabuf(memcg, nr_pages, gfp_mask);
+}
+
 void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
 
 void __mem_cgroup_uncharge(struct folio *folio);
@@ -690,6 +710,14 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
__mem_cgroup_uncharge(folio);
 }
 
+void __mem_cgroup_uncharge_dmabuf(struct mem_cgroup *memcg, unsigned int 
nr_pages);
+static inline void mem_cgroup_uncharge_dmabuf(struct mem_cgroup *memcg, 
unsigned int nr_pages)
+{
+

[PATCH v2 0/4] Track exported dma-buffers with memcg

2023-01-23 Thread T.J. Mercier

Based on discussions at LPC, this series adds a memory.stat counter for
exported dmabufs. This counter allows us to continue tracking
system-wide total exported buffer sizes which there is no longer any
way to get without DMABUF_SYSFS_STATS, and adds a new capability to
track per-cgroup exported buffer sizes. The total (root counter) is
helpful for accounting in-kernel dmabuf use (by comparing with the sum
of child nodes or with the sum of sizes of mapped buffers or FD
references in procfs) in addition to helping identify driver memory
leaks when in-kernel use continually increases over time. With
per-application cgroups, the per-cgroup counter allows us to quickly
see how much dma-buf memory an application has caused to be allocated.
This avoids the need to read through all of procfs which can be a
lengthy process, and causes the charge to "stick" to the allocating
process/cgroup as long as the buffer is alive, regardless of how the
buffer is shared (unless the charge is transferred).

The first patch adds the counter to memcg. The next two patches allow
the charge for a buffer to be transferred across cgroups which is
necessary because of the way most dmabufs are allocated from a central
process on Android. The fourth patch adds the binder object flags to
the existing selinux_binder_transfer_file LSM hook and a SELinux
permission for charge transfers.

[1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koe...@amd.com/

v2:
Actually charge memcg vs just mutate the stat counter per Shakeel Butt
and Michal Hocko. Shakeel pointed me at the skmem functions which
turned out to be very similar to how I was thinking the dmabuf tracking
should work. So I've added a pair of dmabuf functions that do
essentially the same thing, except conditionally implemented behind
CONFIG_MEMCG alongside the other charge/uncharge functions.

Drop security_binder_transfer_charge per Casey Schaufler and Paul Moore

Drop BINDER_FDA_FLAG_XFER_CHARGE (and fix commit message) per Carlos
Llamas

Don't expose is_dma_buf_file for use by binder per Hillf Danton

Call dma_buf_stats_teardown in dma_buf_export error handling

Rebase onto v6.2-rc5

Hridya Valsaraju (1):
  binder: Add flags to relinquish ownership of fds

T.J. Mercier (3):
  memcg: Track exported dma-buffers
  dmabuf: Add cgroup charge transfer function
  security: binder: Add binder object flags to
selinux_binder_transfer_file

 Documentation/admin-guide/cgroup-v2.rst |  5 ++
 drivers/android/binder.c| 27 --
 drivers/dma-buf/dma-buf.c   | 69 +
 include/linux/dma-buf.h |  4 ++
 include/linux/lsm_hook_defs.h   |  2 +-
 include/linux/lsm_hooks.h   |  5 +-
 include/linux/memcontrol.h  | 43 +++
 include/linux/security.h|  6 ++-
 include/uapi/linux/android/binder.h | 19 +--
 mm/memcontrol.c | 19 +++
 security/security.c |  4 +-
 security/selinux/hooks.c| 13 -
 security/selinux/include/classmap.h |  2 +-
 13 files changed, 201 insertions(+), 17 deletions(-)


base-commit: 2241ab53cbb5cdb08a6b2d4688feb13971058f65
-- 
2.39.0.246.g2a6d74b583-goog

Re: [PATCH 3/4] binder: Add flags to relinquish ownership of fds

2023-01-20 Thread T.J. Mercier

On Fri, Jan 20, 2023 at 1:25 PM Carlos Llamas  wrote:
>
> On Mon, Jan 09, 2023 at 09:38:06PM +0000, T.J. Mercier wrote:
> > From: Hridya Valsaraju 
> >
> > This patch introduces flags BINDER_FD_FLAG_XFER_CHARGE, and
> > BINDER_FD_FLAG_XFER_CHARGE that a process sending an individual fd or
>
> I believe the second one was meant to be BINDER_FDA_FLAG_XFER_CHARGE.
> However, I don't think a separation of flags is needed. We process each
> fd in the array individually anyway. So, it's OK to reuse the FD flags
> for FDAs too.

Yes, thanks.
>
>
> > fd array to another process over binder IPC can set to relinquish
> > ownership of the fd(s) being sent for memory accounting purposes. If the
> > flag is found to be set during the fd or fd array translation and the
> > fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup
> > and charged to the receiving process's cgroup instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL
> > process allocates memory and sends the fds to the clients over binder
> > IPC. The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets *_FLAG_XFER_CHARGE for fd arrays holding
> > DMA-BUF fds, or individual fd objects, binder will transfer the charge
> > for the buffer from the allocator process cgroup to the client process
> > cgroup.
> >
> > The pad [1] and pad_flags [2] fields of binder_fd_object and
> > binder_fda_array_object come from alignment with flat_binder_object and
> > have never been exposed for use from userspace. This new flags use
> > follows the pattern set by binder_buffer_object.
> >
> > [1] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=feba3900cabb8e7c87368faa28e7a6936809ba22
> > [2] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=5cdcf4c6a638591ec0e98c57404a19e7f9997567
> >
> > Signed-off-by: Hridya Valsaraju 
> > Signed-off-by: T.J. Mercier 
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst |  3 ++-
> >  drivers/android/binder.c| 31 +
> >  drivers/dma-buf/dma-buf.c   |  4 +---
> >  include/linux/dma-buf.h |  1 +
> >  include/uapi/linux/android/binder.h | 23 ++
> >  5 files changed, 50 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> > b/Documentation/admin-guide/cgroup-v2.rst
> > index 538ae22bc514..d225295932c0 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1457,7 +1457,8 @@ PAGE_SIZE multiple when read back.
> >
> > dmabuf (npn)
> >   Amount of memory used for exported DMA buffers allocated by 
> > the cgroup.
> > - Stays with the allocating cgroup regardless of how the buffer 
> > is shared.
> > + Stays with the allocating cgroup regardless of how the buffer 
> > is shared
> > + unless explicitly transferred.
> >
> > workingset_refault_anon
> >   Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 880224ec6abb..9830848c8d25 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2237,7 +2238,7 @@ static int binder_translate_handle(struct 
> > flat_binder_object *fp,
> >   return ret;
> >  }
> >
> > -static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> > +static int binder_translate_fd(u32 fd, binder_size_t fd_offset, __u32 
> > flags,
> >  struct binder_transaction *t,
> >  struct binder_thread *thread,
> >  struct binder_transaction *in_reply_to)
> > @@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t 
> > fd_offset,
> >   goto err_security;
> >   }
> >
> > + if (IS_ENABLED(CONFIG_MEMCG) && (flags & B

Re: [PATCH 0/4] Track exported dma-buffers with memcg

2023-01-11 Thread T.J. Mercier

On Wed, Jan 11, 2023 at 2:56 PM Daniel Vetter  wrote:
>
> On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote:
> > Hi T.J.,
> >
> > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier  wrote:
> > >
> > > Based on discussions at LPC, this series adds a memory.stat counter for
> > > exported dmabufs. This counter allows us to continue tracking
> > > system-wide total exported buffer sizes which there is no longer any
> > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to
> > > track per-cgroup exported buffer sizes. The total (root counter) is
> > > helpful for accounting in-kernel dmabuf use (by comparing with the sum
> > > of child nodes or with the sum of sizes of mapped buffers or FD
> > > references in procfs) in addition to helping identify driver memory
> > > leaks when in-kernel use continually increases over time. With
> > > per-application cgroups, the per-cgroup counter allows us to quickly
> > > see how much dma-buf memory an application has caused to be allocated.
> > > This avoids the need to read through all of procfs which can be a
> > > lengthy process, and causes the charge to "stick" to the allocating
> > > process/cgroup as long as the buffer is alive, regardless of how the
> > > buffer is shared (unless the charge is transferred).
> > >
> > > The first patch adds the counter to memcg. The next two patches allow
> > > the charge for a buffer to be transferred across cgroups which is
> > > necessary because of the way most dmabufs are allocated from a central
> > > process on Android. The fourth patch adds a SELinux hook to binder in
> > > order to control who is allowed to transfer buffer charges.
> > >
> > > [1] 
> > > https://lore.kernel.org/all/20220617085702.4298-1-christian.koe...@amd.com/
> > >
> >
> > I am a bit confused by the term "charge" used in this patch series.
> > From the patches, it seems like only a memcg stat is added and nothing
> > is charged to the memcg.
> >
> > This leads me to the question: Why add this stat in memcg if the
> > underlying memory is not charged to the memcg and if we don't really
> > want to limit the usage?
> >
> > I see two ways forward:
> >
> > 1. Instead of memcg, use bpf-rstat [1] infra to implement the
> > per-cgroup stat for dmabuf. (You may need an additional hook for the
> > stat transfer).
> >
> > 2. Charge the actual memory to the memcg. Since the size of dmabuf is
> > immutable across its lifetime, you will not need to do accounting at
> > page level and instead use something similar to the network memory
> > accounting interface/mechanism (or even more simple). However you
> > would need to handle the reclaim, OOM and charge context and failure
> > cases. However if you are not looking to limit the usage of dmabuf
> > then this option is an overkill.
>
> I think eventually, at least for other "account gpu stuff in cgroups" use
> case we do want to actually charge the memory.
>
Yes, I've been looking at this today.

> The problem is a bit that with gpu allocations reclaim is essentially "we
> pass the error to userspace and they get to sort the mess out". There are
> some exceptions (some gpu drivers to have shrinkers) would we need to make
> sure these shrinkers are tied into the cgroup stuff before we could enable
> charging for them?
>
I'm also not sure that we can depend on the dmabuf being backed at
export time 100% of the time? (They are for dmabuf heaps.) If not,
that'd make calling the existing memcg folio based functions a bit
difficult.

> Also note that at least from the gpu driver side this is all a huge
> endeavour, so if we can split up the steps as much as possible (and get
> something interim useable that doesn't break stuff ofc), that is
> practically need to make headway here. TJ has been trying out various
> approaches for quite some time now already :-/
> -Daniel
>
> > Please let me know if I misunderstood something.
> >
> > [1] https://lore.kernel.org/all/20220824233117.1312810-1-hao...@google.com/
> >
> > thanks,
> > Shakeel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH 3/4] binder: Add flags to relinquish ownership of fds

2023-01-10 Thread T.J. Mercier

On Mon, Jan 9, 2023 at 6:07 PM Hillf Danton  wrote:
>
> On 9 Jan 2023 21:38:06 +0000 T.J. Mercier 
> >
> > @@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t 
> > fd_offset,
> >   goto err_security;
> >   }
> >
> > + if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) 
> > {
> > + struct dma_buf *dmabuf;
> > +
> > + if (unlikely(!is_dma_buf_file(file))) {
> > + binder_user_error(
> > + "%d:%d got transaction with XFER_CHARGE for 
> > non-dmabuf fd, %d\n",
> > + proc->pid, thread->pid, fd);
> > + ret = -EINVAL;
> > + goto err_dmabuf;
> > + }
>
> It barely makes sense to expose is_dma_buf_file() only for this.
> > +
> > + dmabuf = file->private_data;
> > + ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
> > + if (ret) {
> > + pr_warn("%d:%d Unable to transfer DMA-BUF fd charge 
> > to %d\n",
> > + proc->pid, thread->pid, target_proc->pid);
> > + goto err_xfer;
> > + }
> > + }
> > +
>
> This whole hunk should go to dma-buf instead by adding change to
> dma_buf_transfer_charge() for instance.

Fair enough, will change this for v2. I think we'd still want to
distinguish between the two failure modes for logging purposes, so
I'll use the return value of dma_buf_transfer_charge to do that.

Re: [PATCH 1/4] memcg: Track exported dma-buffers

2023-01-10 Thread T.J. Mercier

On Tue, Jan 10, 2023 at 12:58 AM Michal Hocko  wrote:
>
> On Mon 09-01-23 21:38:04, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
> >
> > Unlike the dmabuf sysfs stats implementation, this memcg accounting
> > avoids contention over the kernfs_rwsem incurred when creating or
> > removing nodes.
>
> I am not familiar with dmabuf infrastructure so please bear with me.
> AFAIU this patch adds a dmabuf specific counter to find out the amount
> of dmabuf memory used. But I do not see any actual charging implemented
> for that memory.
>
> I have looked at two random users of dma_buf_export cma_heap_allocate
> and it allocates pages to back the dmabuf (AFAIU) by cma_alloc
> which doesn't account to memcg, system_heap_allocate uses
> alloc_largest_available which relies on order_flags which doesn't seem
> to ever use __GFP_ACCOUNT.
>
> This would mean that the counter doesn't represent any actual memory
> reflected in the overall memory consumption of a memcg. I believe this
> is rather unexpected and confusing behavior. While some counters
> overlap and their sum would exceed the charged memory we do not have any
> that doesn't correspond to any memory (at least not for non-root memcgs).
>
> --
> Michal Hocko
> SUSE Labs

Thank you, that behavior is not intentional. I'm not looking at the
overall memcg charge yet otherwise I would have noticed this. I think
I understand what's needed for the charging part, but Shakeel
mentioned some additional work for "reclaim, OOM and charge context
and failure cases" on the cover letter which I need to look into.

[PATCH 3/4] binder: Add flags to relinquish ownership of fds

2023-01-09 Thread T.J. Mercier

From: Hridya Valsaraju 

This patch introduces flags BINDER_FD_FLAG_XFER_CHARGE, and
BINDER_FD_FLAG_XFER_CHARGE that a process sending an individual fd or
fd array to another process over binder IPC can set to relinquish
ownership of the fd(s) being sent for memory accounting purposes. If the
flag is found to be set during the fd or fd array translation and the
fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup
and charged to the receiving process's cgroup instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL
process allocates memory and sends the fds to the clients over binder
IPC. The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets *_FLAG_XFER_CHARGE for fd arrays holding
DMA-BUF fds, or individual fd objects, binder will transfer the charge
for the buffer from the allocator process cgroup to the client process
cgroup.

The pad [1] and pad_flags [2] fields of binder_fd_object and
binder_fda_array_object come from alignment with flat_binder_object and
have never been exposed for use from userspace. This new flags use
follows the pattern set by binder_buffer_object.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=feba3900cabb8e7c87368faa28e7a6936809ba22
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=5cdcf4c6a638591ec0e98c57404a19e7f9997567

Signed-off-by: Hridya Valsaraju 
Signed-off-by: T.J. Mercier 
---
 Documentation/admin-guide/cgroup-v2.rst |  3 ++-
 drivers/android/binder.c| 31 +
 drivers/dma-buf/dma-buf.c   |  4 +---
 include/linux/dma-buf.h |  1 +
 include/uapi/linux/android/binder.h | 23 ++
 5 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 538ae22bc514..d225295932c0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1457,7 +1457,8 @@ PAGE_SIZE multiple when read back.
 
  dmabuf (npn)
Amount of memory used for exported DMA buffers allocated by the 
cgroup.
-   Stays with the allocating cgroup regardless of how the buffer 
is shared.
+   Stays with the allocating cgroup regardless of how the buffer 
is shared
+   unless explicitly transferred.
 
  workingset_refault_anon
Number of refaults of previously evicted anonymous pages.
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 880224ec6abb..9830848c8d25 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -2237,7 +2238,7 @@ static int binder_translate_handle(struct 
flat_binder_object *fp,
return ret;
 }
 
-static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
+static int binder_translate_fd(u32 fd, binder_size_t fd_offset, __u32 flags,
   struct binder_transaction *t,
   struct binder_thread *thread,
   struct binder_transaction *in_reply_to)
@@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t 
fd_offset,
goto err_security;
}
 
+   if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) {
+   struct dma_buf *dmabuf;
+
+   if (unlikely(!is_dma_buf_file(file))) {
+   binder_user_error(
+   "%d:%d got transaction with XFER_CHARGE for 
non-dmabuf fd, %d\n",
+   proc->pid, thread->pid, fd);
+   ret = -EINVAL;
+   goto err_dmabuf;
+   }
+
+   dmabuf = file->private_data;
+   ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
+   if (ret) {
+   pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to 
%d\n",
+   proc->pid, thread->pid, target_proc->pid);
+   goto err_xfer;
+   }
+   }
+
/*
 * Add fixup record for this transaction. The allocation
 * of the fd in the target needs to be done from a
@@ -2294,6 +2315,8 @@ static int binder_translate_fd(u32 fd, binder_size_t 
fd_offset,
return ret;
 
 err_alloc:
+err_xfer:
+err_dmabuf:
 err_security:
fput(file);
 err_fget:
@@ -2604,7 +2627,7 @@ static int binder_translate_fd_array(str

[PATCH 2/4] dmabuf: Add cgroup charge transfer function

2023-01-09 Thread T.J. Mercier

The dma_buf_transfer_charge function provides a way for processes to
transfer charge of a buffer to a different cgroup. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory, and drops all references to the allocated memory.

Signed-off-by: T.J. Mercier 
---
 drivers/dma-buf/dma-buf.c  | 45 ++
 include/linux/dma-buf.h|  1 +
 include/linux/memcontrol.h |  6 +
 3 files changed, 52 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index ac45dd101c4d..fd6c5002032b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -11,6 +11,7 @@
  * refining of this idea.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1618,6 +1619,50 @@ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, 
struct iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
 
+/**
+ * dma_buf_transfer_charge - Change the cgroup to which the provided dma_buf 
is charged.
+ * @dmabuf:[in]buffer whose charge will be migrated to a different 
cgroup
+ * @target:[in]the task_struct of the destination process for the 
cgroup charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target)
+{
+   struct mem_cgroup *current_cg, *target_cg;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_MEMCG))
+   return 0;
+
+   if (WARN_ON(!dmabuf) || WARN_ON(!target))
+   return -EINVAL;
+
+   current_cg = mem_cgroup_from_task(current);
+   target_cg = get_mem_cgroup_from_mm(target->mm);
+
+   if (current_cg == target_cg)
+   goto skip_transfer;
+
+   if (cmpxchg(>memcg, current_cg, target_cg) != current_cg) {
+   /* Only the current owner can transfer the charge */
+   ret = -EPERM;
+   goto skip_transfer;
+   }
+
+   mod_memcg_state(current_cg, MEMCG_DMABUF, -dmabuf->size);
+   mod_memcg_state(target_cg, MEMCG_DMABUF, dmabuf->size);
+
+   mem_cgroup_put(current_cg); /* unref from buffer - buffer keeps new ref 
to target_cg */
+   return 0;
+
+skip_transfer:
+   mem_cgroup_put(target_cg);
+   return ret;
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1f0ffb8e4bf5..6aa128d76aa7 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -634,4 +634,5 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map 
*map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
 int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct 
*target);
 #endif /* __DMA_BUF_H__ */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1c1da2da20a6..e5aec27044c7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1298,6 +1298,12 @@ struct mem_cgroup *mem_cgroup_from_css(struct 
cgroup_subsys_state *css)
return NULL;
 }
 
+static inline
+struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
+{
+   return NULL;
+}
+
 static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 {
 }
-- 
2.39.0.314.g84b9a713c41-goog

[PATCH 1/4] memcg: Track exported dma-buffers

2023-01-09 Thread T.J. Mercier

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.

Unlike the dmabuf sysfs stats implementation, this memcg accounting
avoids contention over the kernfs_rwsem incurred when creating or
removing nodes.

Signed-off-by: T.J. Mercier 
---
 Documentation/admin-guide/cgroup-v2.rst | 4 
 drivers/dma-buf/dma-buf.c   | 5 +
 include/linux/dma-buf.h | 3 +++
 include/linux/memcontrol.h  | 1 +
 mm/memcontrol.c | 4 
 5 files changed, 17 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index c8ae7c897f14..538ae22bc514 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used for storing in-kernel data
structures.
 
+ dmabuf (npn)
+   Amount of memory used for exported DMA buffers allocated by the 
cgroup.
+   Stays with the allocating cgroup regardless of how the buffer 
is shared.
+
  workingset_refault_anon
Number of refaults of previously evicted anonymous pages.
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e6528767efc7..ac45dd101c4d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,6 +75,8 @@ static void dma_buf_release(struct dentry *dentry)
 */
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
 
+   mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, -dmabuf->size);
+   mem_cgroup_put(dmabuf->memcg);
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
 
@@ -673,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
if (ret)
goto err_dmabuf;
 
+   dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
+   mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, dmabuf->size);
+
file->private_data = dmabuf;
file->f_path.dentry->d_fsdata = dmabuf;
dmabuf->file = file;
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6fa8d4e29719..1f0ffb8e4bf5 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct dma_buf;
@@ -446,6 +447,8 @@ struct dma_buf {
struct dma_buf *dmabuf;
} *sysfs_entry;
 #endif
+   /* The cgroup to which this buffer is currently attributed */
+   struct mem_cgroup *memcg;
 };
 
 /**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..1c1da2da20a6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,6 +37,7 @@ enum memcg_stat_item {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+   MEMCG_DMABUF,
MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab457f0394ab..680189bec7e0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1502,6 +1502,7 @@ static const struct memory_stat memory_stats[] = {
{ "unevictable",NR_UNEVICTABLE  },
{ "slab_reclaimable",   NR_SLAB_RECLAIMABLE_B   },
{ "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B },
+   { "dmabuf", MEMCG_DMABUF},
 
/* The memory events */
{ "workingset_refault_anon",WORKINGSET_REFAULT_ANON },
@@ -1519,6 +1520,7 @@ static int memcg_page_state_unit(int item)
switch (item) {
case MEMCG_PERCPU_B:
case MEMCG_ZSWAP_B:
+   case MEMCG_DMABUF:
case NR_SLAB_RECLAIMABLE_B:
case NR_SLAB_UNRECLAIMABLE_B:
case WORKINGSET_REFAULT_ANON:
@@ -4042,6 +4044,7 @@ static const unsigned int memcg1_stats[] = {
WORKINGSET_REFAULT_ANON,
WORKINGSET_REFAULT_FILE,
MEMCG_SWAP,
+   MEMCG_DMABUF,
 };
 
 static const char *const memcg1_stat_names[] = {
@@ -4057,6 +4060,7 @@ static const char *const memcg1_stat_names[] = {
"workingset_refault_anon",
"workingset_refault_file",
"swap",
+   "dmabuf",
 };
 
 /* Universal VM events cgroup1 shows, original sort order */
-- 
2.39.0.314.g84b9a713c41-goog

[PATCH 0/4] Track exported dma-buffers with memcg

2023-01-09 Thread T.J. Mercier

Based on discussions at LPC, this series adds a memory.stat counter for
exported dmabufs. This counter allows us to continue tracking
system-wide total exported buffer sizes which there is no longer any
way to get without DMABUF_SYSFS_STATS, and adds a new capability to
track per-cgroup exported buffer sizes. The total (root counter) is
helpful for accounting in-kernel dmabuf use (by comparing with the sum
of child nodes or with the sum of sizes of mapped buffers or FD
references in procfs) in addition to helping identify driver memory
leaks when in-kernel use continually increases over time. With
per-application cgroups, the per-cgroup counter allows us to quickly
see how much dma-buf memory an application has caused to be allocated.
This avoids the need to read through all of procfs which can be a
lengthy process, and causes the charge to "stick" to the allocating
process/cgroup as long as the buffer is alive, regardless of how the
buffer is shared (unless the charge is transferred).

The first patch adds the counter to memcg. The next two patches allow
the charge for a buffer to be transferred across cgroups which is
necessary because of the way most dmabufs are allocated from a central
process on Android. The fourth patch adds a SELinux hook to binder in
order to control who is allowed to transfer buffer charges.

[1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koe...@amd.com/

Hridya Valsaraju (1):
  binder: Add flags to relinquish ownership of fds

T.J. Mercier (3):
  memcg: Track exported dma-buffers
  dmabuf: Add cgroup charge transfer function
  security: binder: Add transfer_charge SElinux hook

 Documentation/admin-guide/cgroup-v2.rst |  5 +++
 drivers/android/binder.c| 36 +++--
 drivers/dma-buf/dma-buf.c   | 54 +++--
 include/linux/dma-buf.h |  5 +++
 include/linux/lsm_hook_defs.h   |  2 +
 include/linux/lsm_hooks.h   |  6 +++
 include/linux/memcontrol.h  |  7 
 include/linux/security.h|  2 +
 include/uapi/linux/android/binder.h | 23 +--
 mm/memcontrol.c |  4 ++
 security/security.c |  6 +++
 security/selinux/hooks.c|  9 +
 security/selinux/include/classmap.h |  2 +-
 13 files changed, 149 insertions(+), 12 deletions(-)


base-commit: b7bfaa761d760e72a969d116517eaa12e404c262
-- 
2.39.0.314.g84b9a713c41-goog

Re: [PATCH] dma-buf: fix dma_buf_export init order v2

2022-12-09 Thread T.J. Mercier

On Fri, Dec 9, 2022 at 8:29 AM Ruhl, Michael J  wrote:
>
> >-Original Message-
> >From: dri-devel  On Behalf Of
> >Christian König
> >Sent: Friday, December 9, 2022 2:16 AM
> >To: quic_chara...@quicinc.com; cuigaoshe...@huawei.com;
> >sumit.sem...@linaro.org
> >Cc: linaro-mm-...@lists.linaro.org; dri-devel@lists.freedesktop.org; linux-
> >me...@vger.kernel.org
> >Subject: [PATCH] dma-buf: fix dma_buf_export init order v2
> >
> >The init order and resulting error handling in dma_buf_export
> >was pretty messy.
> >
> >Subordinate objects like the file and the sysfs kernel objects
> >were initializing and wiring itself up with the object in the
> >wrong order resulting not only in complicating and partially
> >incorrect error handling, but also in publishing only halve
> >initialized DMA-buf objects.
> >
> >Clean this up thoughtfully by allocating the file independent
> >of the DMA-buf object. Then allocate and initialize the DMA-buf
> >object itself, before publishing it through sysfs. If everything
> >works as expected the file is then connected with the DMA-buf
> >object and publish it through debugfs.
> >
> >Also adds the missing dma_resv_fini() into the error handling.
> >
> >v2: add some missing changes to dma_bug_getfile() and a missing NULL
> >check in dma_buf_file_release()
>
> Looks good.
>
> Reviewed-by: Michael J. Ruhl 
>
> Mike
>
+1

Reviewed-by: T.J. Mercier 

> >Signed-off-by: Christian König 
> >---
> > drivers/dma-buf/dma-buf-sysfs-stats.c |  7 +--
> > drivers/dma-buf/dma-buf-sysfs-stats.h |  4 +-
> > drivers/dma-buf/dma-buf.c | 84 +--
> > 3 files changed, 43 insertions(+), 52 deletions(-)
> >
> >diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-
> >buf-sysfs-stats.c
> >index 2bba0babcb62..4b680e10c15a 100644
> >--- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> >+++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> >@@ -168,14 +168,11 @@ void dma_buf_uninit_sysfs_statistics(void)
> >   kset_unregister(dma_buf_stats_kset);
> > }
> >
> >-int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >+int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file *file)
> > {
> >   struct dma_buf_sysfs_entry *sysfs_entry;
> >   int ret;
> >
> >-  if (!dmabuf || !dmabuf->file)
> >-  return -EINVAL;
> >-
> >   if (!dmabuf->exp_name) {
> >   pr_err("exporter name must not be empty if stats
> >needed\n");
> >   return -EINVAL;
> >@@ -192,7 +189,7 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >
> >   /* create the directory for buffer stats */
> >   ret = kobject_init_and_add(_entry->kobj, _buf_ktype,
> >NULL,
> >- "%lu", file_inode(dmabuf->file)->i_ino);
> >+ "%lu", file_inode(file)->i_ino);
> >   if (ret)
> >   goto err_sysfs_dmabuf;
> >
> >diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.h b/drivers/dma-buf/dma-
> >buf-sysfs-stats.h
> >index a49c6e2650cc..7a8a995b75ba 100644
> >--- a/drivers/dma-buf/dma-buf-sysfs-stats.h
> >+++ b/drivers/dma-buf/dma-buf-sysfs-stats.h
> >@@ -13,7 +13,7 @@
> > int dma_buf_init_sysfs_statistics(void);
> > void dma_buf_uninit_sysfs_statistics(void);
> >
> >-int dma_buf_stats_setup(struct dma_buf *dmabuf);
> >+int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file *file);
> >
> > void dma_buf_stats_teardown(struct dma_buf *dmabuf);
> > #else
> >@@ -25,7 +25,7 @@ static inline int dma_buf_init_sysfs_statistics(void)
> >
> > static inline void dma_buf_uninit_sysfs_statistics(void) {}
> >
> >-static inline int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >+static inline int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file
> >*file)
> > {
> >   return 0;
> > }
> >diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >index e6f36c014c4c..eb6b59363c4f 100644
> >--- a/drivers/dma-buf/dma-buf.c
> >+++ b/drivers/dma-buf/dma-buf.c
> >@@ -95,10 +95,11 @@ static int dma_buf_file_release(struct inode *inode,
> >struct file *file)
> >   return -EINVAL;
> >
> >   dmabuf = file->private_data;
> >-
> >-  mutex_lock(_list.lock);
> >-  list_del(>list_node);
> >-  mutex_unlock(_list.lock);
> >+  if (dmabuf) {
> >+

Re: [PATCH] dma-buf: fix dma_buf_export init order

2022-12-06 Thread T.J. Mercier

On Tue, Dec 6, 2022 at 7:12 AM Christian König
 wrote:
>
> The init order and resulting error handling in dma_buf_export
> was pretty messy.
>
> Subordinate objects like the file and the sysfs kernel objects
> were initializing and wiring itself up with the object in the
> wrong order resulting not only in complicating and partially
> incorrect error handling, but also in publishing only halve
> initialized DMA-buf objects.
>
> Clean this up thoughtfully by allocating the file independent
> of the DMA-buf object. Then allocate and initialize the DMA-buf
> object itself, before publishing it through sysfs. If everything
> works as expected the file is then connected with the DMA-buf
> object and publish it through debugfs.
>
> Also adds the missing dma_resv_fini() into the error handling.
>
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-buf-sysfs-stats.c |  7 +--
>  drivers/dma-buf/dma-buf-sysfs-stats.h |  4 +-
>  drivers/dma-buf/dma-buf.c | 65 +--
>  3 files changed, 34 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c 
> b/drivers/dma-buf/dma-buf-sysfs-stats.c
> index 2bba0babcb62..4b680e10c15a 100644
> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> @@ -168,14 +168,11 @@ void dma_buf_uninit_sysfs_statistics(void)
> kset_unregister(dma_buf_stats_kset);
>  }
>
> -int dma_buf_stats_setup(struct dma_buf *dmabuf)
> +int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file *file)
>  {
> struct dma_buf_sysfs_entry *sysfs_entry;
> int ret;
>
> -   if (!dmabuf || !dmabuf->file)
> -   return -EINVAL;
> -
> if (!dmabuf->exp_name) {
> pr_err("exporter name must not be empty if stats needed\n");
> return -EINVAL;
> @@ -192,7 +189,7 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
>
> /* create the directory for buffer stats */
> ret = kobject_init_and_add(_entry->kobj, _buf_ktype, NULL,
> -  "%lu", file_inode(dmabuf->file)->i_ino);
> +  "%lu", file_inode(file)->i_ino);
> if (ret)
> goto err_sysfs_dmabuf;
>
> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.h 
> b/drivers/dma-buf/dma-buf-sysfs-stats.h
> index a49c6e2650cc..7a8a995b75ba 100644
> --- a/drivers/dma-buf/dma-buf-sysfs-stats.h
> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.h
> @@ -13,7 +13,7 @@
>  int dma_buf_init_sysfs_statistics(void);
>  void dma_buf_uninit_sysfs_statistics(void);
>
> -int dma_buf_stats_setup(struct dma_buf *dmabuf);
> +int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file *file);
>
>  void dma_buf_stats_teardown(struct dma_buf *dmabuf);
>  #else
> @@ -25,7 +25,7 @@ static inline int dma_buf_init_sysfs_statistics(void)
>
>  static inline void dma_buf_uninit_sysfs_statistics(void) {}
>
> -static inline int dma_buf_stats_setup(struct dma_buf *dmabuf)
> +static inline int dma_buf_stats_setup(struct dma_buf *dmabuf, struct file 
> *file)
>  {
> return 0;
>  }
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index e6f36c014c4c..ea08049b70ae 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -614,19 +614,11 @@ struct dma_buf *dma_buf_export(const struct 
> dma_buf_export_info *exp_info)
> size_t alloc_size = sizeof(struct dma_buf);
> int ret;
>
> -   if (!exp_info->resv)
> -   alloc_size += sizeof(struct dma_resv);
> -   else
> -   /* prevent _buf[1] == dma_buf->resv */
> -   alloc_size += 1;
> -
> -   if (WARN_ON(!exp_info->priv
> - || !exp_info->ops
> - || !exp_info->ops->map_dma_buf
> - || !exp_info->ops->unmap_dma_buf
> - || !exp_info->ops->release)) {
> +   if (WARN_ON(!exp_info->priv || !exp_info->ops
> +   || !exp_info->ops->map_dma_buf
> +   || !exp_info->ops->unmap_dma_buf
> +   || !exp_info->ops->release))
> return ERR_PTR(-EINVAL);
> -   }
>
> if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
> (exp_info->ops->pin || exp_info->ops->unpin)))
> @@ -638,10 +630,21 @@ struct dma_buf *dma_buf_export(const struct 
> dma_buf_export_info *exp_info)
> if (!try_module_get(exp_info->owner))
> return ERR_PTR(-ENOENT);
>
> +   file = dma_buf_getfile(exp_info->size, exp_info->flags);
> +   if (IS_ERR(file)) {
> +   ret = PTR_ERR(file);
> +   goto err_module;
> +   }
> +
> +   if (!exp_info->resv)
> +   alloc_size += sizeof(struct dma_resv);
> +   else
> +   /* prevent _buf[1] == dma_buf->resv */
> +   alloc_size += 1;
> dmabuf = kzalloc(alloc_size, GFP_KERNEL);
> if (!dmabuf) {
> ret

Re: [PATCH linux-next] dma-buf: use strscpy() to instead of strlcpy()

2022-11-29 Thread T.J. Mercier

On Thu, Nov 24, 2022 at 3:26 AM  wrote:
>
> From: Xu Panda 
>
> The implementation of strscpy() is more robust and safer.
> That's now the recommended way to copy NUL terminated strings.
>
> Signed-off-by: Xu Panda 
> Signed-off-by: Yang Yang 
> ---
>  drivers/dma-buf/dma-buf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b6c36914e7c6..485cf4f3431e 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -51,7 +51,7 @@ static char *dmabuffs_dname(struct dentry *dentry, char 
> *buffer, int buflen)
> dmabuf = dentry->d_fsdata;
> spin_lock(>name_lock);
> if (dmabuf->name)
> -   ret = strlcpy(name, dmabuf->name, DMA_BUF_NAME_LEN);
> +   ret = strscpy(name, dmabuf->name, DMA_BUF_NAME_LEN);

The type of ret should also be changed to ssize_t to capture the
negative error value which strlcpy does not have. We shouldn't ever
see that error condition here with the code the way it is now, but
let's not risk it.

> spin_unlock(>name_lock);
>
> return dynamic_dname(buffer, buflen, "/%s:%s",
> --
> 2.15.2

Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: A collection of typo and documentation fixes

2022-11-25 Thread T.J. Mercier

On Thu, Nov 24, 2022 at 1:43 AM Christian König
 wrote:
>
> Am 24.11.22 um 10:05 schrieb Daniel Vetter:
> > On Thu, Nov 24, 2022 at 08:03:09AM +0100, Christian König wrote:
> >> Am 23.11.22 um 20:35 schrieb T.J. Mercier:
> >>> I've been collecting these typo fixes for a while and it feels like
> >>> time to send them in.
> >>>
> >>> Signed-off-by: T.J. Mercier 
> >> Acked-by: Christian König 
> > Will you also push this? I think tj doesn't have commit rights yet, and I
> > somehow can't see the patch locally (I guess it's stuck in moderation).
>
> I was just about to complain that this doesn't apply cleanly to
> drm-misc-next.
>
> Trivial problem, one of the typos was just removed by Dimitry a few
> weeks ago.
>
> I've fixed that up locally and pushed the result, but nevertheless
> please make sure that DMA-buf patches are based on the drm branches.
>
I'm sorry, this was on top of a random spot in Linus's 6.1-rc5.
(84368d882b96 Merge tag 'soc-fixes-6.1-3') I'm not sure why I did
that, but I suspect it was after a fresh git pull. I have too many
repos.

Thanks all for the reviews.

> Thanks,
> Christian.
>
> > -Daniel
> >
> >>> ---
> >>>drivers/dma-buf/dma-buf.c | 14 +++---
> >>>include/linux/dma-buf.h   |  6 +++---
> >>>2 files changed, 10 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index dd0f83ee505b..614ccd208af4 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -1141,7 +1141,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment, 
> >>> DMA_BUF);
> >>> *
> >>> * @dmabuf:  [in]buffer which is moving
> >>> *
> >>> - * Informs all attachmenst that they need to destroy and recreated all 
> >>> their
> >>> + * Informs all attachments that they need to destroy and recreate all 
> >>> their
> >>> * mappings.
> >>> */
> >>>void dma_buf_move_notify(struct dma_buf *dmabuf)
> >>> @@ -1159,11 +1159,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, 
> >>> DMA_BUF);
> >>>/**
> >>> * DOC: cpu access
> >>> *
> >>> - * There are mutliple reasons for supporting CPU access to a dma buffer 
> >>> object:
> >>> + * There are multiple reasons for supporting CPU access to a dma buffer 
> >>> object:
> >>> *
> >>> * - Fallback operations in the kernel, for example when a device is 
> >>> connected
> >>> *   over USB and the kernel needs to shuffle the data around first 
> >>> before
> >>> - *   sending it away. Cache coherency is handled by braketing any 
> >>> transactions
> >>> + *   sending it away. Cache coherency is handled by bracketing any 
> >>> transactions
> >>> *   with calls to dma_buf_begin_cpu_access() and 
> >>> dma_buf_end_cpu_access()
> >>> *   access.
> >>> *
> >>> @@ -1190,7 +1190,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
> >>> *   replace ION buffers mmap support was needed.
> >>> *
> >>> *   There is no special interfaces, userspace simply calls mmap on 
> >>> the dma-buf
> >>> - *   fd. But like for CPU access there's a need to braket the actual 
> >>> access,
> >>> + *   fd. But like for CPU access there's a need to bracket the actual 
> >>> access,
> >>> *   which is handled by the ioctl (DMA_BUF_IOCTL_SYNC). Note that
> >>> *   DMA_BUF_IOCTL_SYNC can fail with -EAGAIN or -EINTR, in which case 
> >>> it must
> >>> *   be restarted.
> >>> @@ -1264,10 +1264,10 @@ static int __dma_buf_begin_cpu_access(struct 
> >>> dma_buf *dmabuf,
> >>> * preparations. Coherency is only guaranteed in the specified range 
> >>> for the
> >>> * specified access direction.
> >>> * @dmabuf:  [in]buffer to prepare cpu access for.
> >>> - * @direction: [in]length of range for cpu access.
> >>> + * @direction: [in]direction of access.
> >>> *
> >>> * After the cpu access is complete the caller should call
> >>> - * dma_buf_end_cpu_access(). Only when cpu access is braketed by both 
> >>> calls is
> >>>

[PATCH] dma-buf: A collection of typo and documentation fixes

2022-11-23 Thread T.J. Mercier

I've been collecting these typo fixes for a while and it feels like
time to send them in.

Signed-off-by: T.J. Mercier 
---
 drivers/dma-buf/dma-buf.c | 14 +++---
 include/linux/dma-buf.h   |  6 +++---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index dd0f83ee505b..614ccd208af4 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1141,7 +1141,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment, DMA_BUF);
  *
  * @dmabuf:[in]buffer which is moving
  *
- * Informs all attachmenst that they need to destroy and recreated all their
+ * Informs all attachments that they need to destroy and recreate all their
  * mappings.
  */
 void dma_buf_move_notify(struct dma_buf *dmabuf)
@@ -1159,11 +1159,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
 /**
  * DOC: cpu access
  *
- * There are mutliple reasons for supporting CPU access to a dma buffer object:
+ * There are multiple reasons for supporting CPU access to a dma buffer object:
  *
  * - Fallback operations in the kernel, for example when a device is connected
  *   over USB and the kernel needs to shuffle the data around first before
- *   sending it away. Cache coherency is handled by braketing any transactions
+ *   sending it away. Cache coherency is handled by bracketing any transactions
  *   with calls to dma_buf_begin_cpu_access() and dma_buf_end_cpu_access()
  *   access.
  *
@@ -1190,7 +1190,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
  *   replace ION buffers mmap support was needed.
  *
  *   There is no special interfaces, userspace simply calls mmap on the dma-buf
- *   fd. But like for CPU access there's a need to braket the actual access,
+ *   fd. But like for CPU access there's a need to bracket the actual access,
  *   which is handled by the ioctl (DMA_BUF_IOCTL_SYNC). Note that
  *   DMA_BUF_IOCTL_SYNC can fail with -EAGAIN or -EINTR, in which case it must
  *   be restarted.
@@ -1264,10 +1264,10 @@ static int __dma_buf_begin_cpu_access(struct dma_buf 
*dmabuf,
  * preparations. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to prepare cpu access for.
- * @direction: [in]length of range for cpu access.
+ * @direction: [in]direction of access.
  *
  * After the cpu access is complete the caller should call
- * dma_buf_end_cpu_access(). Only when cpu access is braketed by both calls is
+ * dma_buf_end_cpu_access(). Only when cpu access is bracketed by both calls is
  * it guaranteed to be coherent with other DMA access.
  *
  * This function will also wait for any DMA transactions tracked through
@@ -1307,7 +1307,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_begin_cpu_access, DMA_BUF);
  * actions. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to complete cpu access for.
- * @direction: [in]length of range for cpu access.
+ * @direction: [in]direction of access.
  *
  * This terminates CPU access started with dma_buf_begin_cpu_access().
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 71731796c8c3..1d61a4f6db35 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -330,7 +330,7 @@ struct dma_buf {
 * @lock:
 *
 * Used internally to serialize list manipulation, attach/detach and
-* vmap/unmap. Note that in many cases this is superseeded by
+* vmap/unmap. Note that in many cases this is superseded by
 * dma_resv_lock() on @resv.
 */
struct mutex lock;
@@ -365,7 +365,7 @@ struct dma_buf {
 */
const char *name;
 
-   /** @name_lock: Spinlock to protect name acces for read access. */
+   /** @name_lock: Spinlock to protect name access for read access. */
spinlock_t name_lock;
 
/**
@@ -402,7 +402,7 @@ struct dma_buf {
 *   anything the userspace API considers write access.
 *
 * - Drivers may just always add a write fence, since that only
-*   causes unecessarily synchronization, but no correctness issues.
+*   causes unnecessary synchronization, but no correctness issues.
 *
 * - Some drivers only expose a synchronous userspace API with no
 *   pipelining across drivers. These do not set any fences for their
-- 
2.38.1.584.g0f3c55d4c2-goog

Re: [PATCH] dma-buf: Fix possible UAF in dma_buf_export

2022-11-18 Thread T.J. Mercier

On Fri, Nov 18, 2022 at 12:27 AM Christian König 
wrote:

> Am 18.11.22 um 03:36 schrieb T.J. Mercier:
> > On Thu, Nov 17, 2022 at 2:16 AM Christian König
> >  wrote:
> >> Am 17.11.22 um 08:48 schrieb Charan Teja Kalla:
> >>> Sometime back Dan also reported the same issue[1] where I do mentioned
> >>> that fput()-->dma_buf_file_release() will remove it from the list.
> >>>
> >>> But it seems that I failed to notice fput() here calls the
> >>> dma_buf_file_release() asynchronously i.e. dmabuf that is accessed in
> >>> the close path is already freed. Am I wrong here?
> >>>
> >>> Should we have the __fput_sync(file) here instead of just fput(file)
> >>> which can solve this problem?
> >>>
> >>> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220516084704.GG29930%40kadam%2Fdata=05%7C01%7Cchristian.koenig%40amd.com%7C7d87a302d300479ecfa608dac90dc9f4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638043358319479671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=erPl1hGdfLbfCxK3J3xiIR9boJbgj6hPUnCBvZFobog%3Dreserved=0
> >> That doesn't look like the right solution to me either.
> >>
> >> Essentially we have two separate tear down methods for the dma_buf
> >> object here:
> >>
> >> 1. It's not completely initialized and we can call kfree()+module_put()
> >> to clean up.
> >>   There is actually a dma_resv_fini() here. That should probably be
> >> fixed.
> >>
> >> 2. The dma_buf object is fully initialized, but creating the sysfs stats
> >> file failed.
> >>   In this case we should *not* clean it up like we currently do, but
> >> rather call fput().
> >>
> >> So the right thing to do is a) fix the missing dma_resv_fini() call and
> >> b) drop the setting d_fsdata=NULL hack and properly return after the
> fput().
> >>
> > This looks right to me if by properly return you mean return
> > ERR_PTR(ret); at the end of err_sysfs after the fput. (letting
> > dma_buf_file_release and dma_buf_release do the full cleanup)
>
> Yes, exactly that's the idea.
>
> The only alternatives I can see would be to either move allocating the
> file and so completing the dma_buf initialization last again or just
> ignore errors from sysfs.
>
> > If we still want to avoid calling dmabuf->ops->release(dmabuf) in
> > dma_buf_release like the comment says I guess we could use sysfs_entry
> > and ERR_PTR to flag that, otherwise it looks like we'd need a bit
> > somewhere.
>
> No, this should be dropped as far as I can see. The sysfs cleanup code
> looks like it can handle not initialized kobj pointers.
>

Yeah there is also the null check in dma_buf_stats_teardown() that would
prevent it from running, but I understood the comment to be referring to
the release() dma_buf_ops call into the exporter which comes right after
the teardown call. That looks like it's preventing the fput task work
calling back into the exporter after the exporter already got an error from
dma_buf_export(). Otherwise the exporter sees a release() for a buffer that
it doesn't know about / thinks shouldn't exist. So I could imagine an
exporter trying to double free: once for the failed dma_buf_export() call,
and again when the release() op is called later.

>
> Regards,
> Christian.
>
> >
> >   >
> >> Regards,
> >> Christian.
> >>
> >>> Thanks,
> >>> Charan
> >>> On 11/17/2022 11:51 AM, Gaosheng Cui wrote:
> >>>> Smatch report warning as follows:
> >>>>
> >>>> drivers/dma-buf/dma-buf.c:681 dma_buf_export() warn:
> >>>> '>list_node' not removed from list
> >>>>
> >>>> If dma_buf_stats_setup() fails in dma_buf_export(), goto err_sysfs
> >>>> and dmabuf will be freed, but dmabuf->list_node will not be removed
> >>>> from db_list.head, then list traversal may cause UAF.
> >>>>
> >>>> Fix by removeing it from db_list.head before free().
> >>>>
> >>>> Fixes: ef3a6b70507a ("dma-buf: call dma_buf_stats_setup after dmabuf
> is in valid list")
> >>>> Signed-off-by: Gaosheng Cui 
> >>>> ---
> >>>>drivers/dma-buf/dma-buf.c | 3 +++
> >>>>1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>>> index b809513b03fe..6848f50226d5 100644
> >>>> --- a/drivers/dma-buf/dma-buf.c
> >>>> +++ b/drivers/dma-buf/dma-buf.c
> >>>> @@ -675,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct
> dma_buf_export_info *exp_info)
> >>>>   return dmabuf;
> >>>>
> >>>>err_sysfs:
> >>>> +mutex_lock(_list.lock);
> >>>> +list_del(>list_node);
> >>>> +mutex_unlock(_list.lock);
> >>>>   /*
> >>>>* Set file->f_path.dentry->d_fsdata to NULL so that when
> >>>>* dma_buf_release() gets invoked by dentry_ops, it exits
>
>

Re: [PATCH] dma-buf: Fix possible UAF in dma_buf_export

2022-11-17 Thread T.J. Mercier

On Thu, Nov 17, 2022 at 2:16 AM Christian König
 wrote:
>
> Am 17.11.22 um 08:48 schrieb Charan Teja Kalla:
> > Sometime back Dan also reported the same issue[1] where I do mentioned
> > that fput()-->dma_buf_file_release() will remove it from the list.
> >
> > But it seems that I failed to notice fput() here calls the
> > dma_buf_file_release() asynchronously i.e. dmabuf that is accessed in
> > the close path is already freed. Am I wrong here?
> >
> > Should we have the __fput_sync(file) here instead of just fput(file)
> > which can solve this problem?
> >
> > [1]https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220516084704.GG29930%40kadam%2Fdata=05%7C01%7Cchristian.koenig%40amd.com%7C115292dd7a874278b3ed08dac8701320%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638042680960627756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=NYNIAJjt%2FSUXoc3wCz2vPvo%2Be%2FIVcABEA2JYZ8%2F2q04%3Dreserved=0
>
> That doesn't look like the right solution to me either.
>
> Essentially we have two separate tear down methods for the dma_buf
> object here:
>
> 1. It's not completely initialized and we can call kfree()+module_put()
> to clean up.
>  There is actually a dma_resv_fini() here. That should probably be
> fixed.
>
> 2. The dma_buf object is fully initialized, but creating the sysfs stats
> file failed.
>  In this case we should *not* clean it up like we currently do, but
> rather call fput().
>
> So the right thing to do is a) fix the missing dma_resv_fini() call and
> b) drop the setting d_fsdata=NULL hack and properly return after the fput().
>
This looks right to me if by properly return you mean return
ERR_PTR(ret); at the end of err_sysfs after the fput. (letting
dma_buf_file_release and dma_buf_release do the full cleanup)

If we still want to avoid calling dmabuf->ops->release(dmabuf) in
dma_buf_release like the comment says I guess we could use sysfs_entry
and ERR_PTR to flag that, otherwise it looks like we'd need a bit
somewhere.

 >
> Regards,
> Christian.
>
> >
> > Thanks,
> > Charan
> > On 11/17/2022 11:51 AM, Gaosheng Cui wrote:
> >> Smatch report warning as follows:
> >>
> >> drivers/dma-buf/dma-buf.c:681 dma_buf_export() warn:
> >>'>list_node' not removed from list
> >>
> >> If dma_buf_stats_setup() fails in dma_buf_export(), goto err_sysfs
> >> and dmabuf will be freed, but dmabuf->list_node will not be removed
> >> from db_list.head, then list traversal may cause UAF.
> >>
> >> Fix by removeing it from db_list.head before free().
> >>
> >> Fixes: ef3a6b70507a ("dma-buf: call dma_buf_stats_setup after dmabuf is in 
> >> valid list")
> >> Signed-off-by: Gaosheng Cui 
> >> ---
> >>   drivers/dma-buf/dma-buf.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >> index b809513b03fe..6848f50226d5 100644
> >> --- a/drivers/dma-buf/dma-buf.c
> >> +++ b/drivers/dma-buf/dma-buf.c
> >> @@ -675,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct 
> >> dma_buf_export_info *exp_info)
> >>  return dmabuf;
> >>
> >>   err_sysfs:
> >> +mutex_lock(_list.lock);
> >> +list_del(>list_node);
> >> +mutex_unlock(_list.lock);
> >>  /*
> >>   * Set file->f_path.dentry->d_fsdata to NULL so that when
> >>   * dma_buf_release() gets invoked by dentry_ops, it exits
>

Re: [PATCH v3] dma-buf: fix racing conflict of dma_heap_add()

2022-11-02 Thread T.J. Mercier

On Wed, Nov 2, 2022 at 8:59 AM Dawei Li  wrote:
>
> Racing conflict could be:
> task A task B
> list_for_each_entry
> strcmp(h->name))
>list_for_each_entry
>strcmp(h->name)
> kzallockzalloc
> .. .
> device_create  device_create
> list_add
>list_add
>
> The root cause is that task B has no idea about the fact someone
> else(A) has inserted heap with same name when it calls list_add,
> so a potential collision occurs.
>
> v1: 
> https://lore.kernel.org/all/tycp286mb2323950197f60fc3473123b7ca...@tycp286mb2323.jpnp286.prod.outlook.com/
>
> v1->v2: Narrow down locking scope, check the existence of heap before
> insertion, as suggested by Andrew Davis.
>
> v2->v3: Remove double checking.
>
> Fixes: c02a81fba74f ("dma-buf: Add dma-buf heaps framework")
>
> base-commit: 447fb14bf07905b880c9ed1ea92c53d6dd0649d7
>
> Signed-off-by: Dawei Li 
> ---
>  drivers/dma-buf/dma-heap.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index 8f5848aa144f..7a25e98259ea 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -233,18 +233,6 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> return ERR_PTR(-EINVAL);
> }
>
> -   /* check the name is unique */
> -   mutex_lock(_list_lock);
> -   list_for_each_entry(h, _list, list) {
> -   if (!strcmp(h->name, exp_info->name)) {
> -   mutex_unlock(_list_lock);
> -   pr_err("dma_heap: Already registered heap named %s\n",
> -  exp_info->name);
> -   return ERR_PTR(-EINVAL);
> -   }
> -   }
> -   mutex_unlock(_list_lock);
> -
> heap = kzalloc(sizeof(*heap), GFP_KERNEL);
> if (!heap)
> return ERR_PTR(-ENOMEM);
> @@ -283,13 +271,26 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> err_ret = ERR_CAST(dev_ret);
> goto err2;
> }
> -   /* Add heap to the list */
> +
> mutex_lock(_list_lock);
> +   /* check the name is unique */
> +   list_for_each_entry(h, _list, list) {
> +   if (!strcmp(h->name, exp_info->name)) {
> +   mutex_unlock(_list_lock);
> +   pr_err("dma_heap: Already registered heap named %s\n",
> +  exp_info->name);
> +   err_ret = ERR_PTR(-EINVAL);
> +   goto err3;
> +   }
> +   }
> +
> +   /* Add heap to the list */
> list_add(>list, _list);
> mutex_unlock(_list_lock);
>
> return heap;
> -
> +err3:
> +   device_destroy(dma_heap_class, heap->heap_devt);
>  err2:
> cdev_del(>heap_cdev);
>  err1:
> --
> 2.25.1
>

Reviewed-by: T.J. Mercier 

Thanks!

Re: [PATCH v2] dma-buf: fix racing conflict of dma_heap_add()

2022-11-01 Thread T.J. Mercier

On Mon, Oct 31, 2022 at 7:53 AM Dawei Li  wrote:
>
> Racing conflict could be:
> task A task B
> list_for_each_entry
> strcmp(h->name))
>list_for_each_entry
>strcmp(h->name)
> kzallockzalloc
> .. .
> device_create  device_create
> list_add
>list_add
>
> The root cause is that task B has no idea about the fact someone
> else(A) has inserted heap with same name when it calls list_add,
> so a potential collision occurs.
>
> v1: 
> https://lore.kernel.org/all/tycp286mb2323950197f60fc3473123b7ca...@tycp286mb2323.jpnp286.prod.outlook.com/
>
> v1->v2: Narrow down locking scope, check the existence of heap before
> insertion, as suggested by Andrew Davis.
>
> Fixes: c02a81fba74f ("dma-buf: Add dma-buf heaps framework")
>
> base-commit: 447fb14bf07905b880c9ed1ea92c53d6dd0649d7
>
> Signed-off-by: Dawei Li 
> ---
>  drivers/dma-buf/dma-heap.c | 37 -
>  1 file changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index 8f5848aa144f..1c787a147e3a 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -216,9 +216,21 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> return heap->name;
>  }
>
> +static inline bool dma_heap_exist(const char *name)
> +{
> +   struct dma_heap *h;
> +
> +   list_for_each_entry(h, _list, list) {
> +   if (!strcmp(h->name, name))
> +   return true;
> +   }
> +
> +   return false;
> +}
> +
>  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
>  {
> -   struct dma_heap *heap, *h, *err_ret;
> +   struct dma_heap *heap, *err_ret;
> struct device *dev_ret;
> unsigned int minor;
> int ret;
> @@ -235,13 +247,11 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
>
> /* check the name is unique */
> mutex_lock(_list_lock);
> -   list_for_each_entry(h, _list, list) {
> -   if (!strcmp(h->name, exp_info->name)) {
> -   mutex_unlock(_list_lock);
> -   pr_err("dma_heap: Already registered heap named %s\n",
> -  exp_info->name);
> -   return ERR_PTR(-EINVAL);
> -   }
> +   if (dma_heap_exist(exp_info->name)) {
> +   mutex_unlock(_list_lock);
> +   pr_err("dma_heap: Already registered heap named %s\n",
> +  exp_info->name);
> +   return ERR_PTR(-EINVAL);

Hi Dawei,
What Andrew was suggesting was that you entirely move the check from
here to the critical section down below, not duplicate the check. I
don't think we want to check this twice. We should be able to do this
by taking the heap_list_lock only once.

> }
> mutex_unlock(_list_lock);
>
> @@ -283,13 +293,22 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
> err_ret = ERR_CAST(dev_ret);
> goto err2;
> }
> +
> /* Add heap to the list */
> mutex_lock(_list_lock);
> +   if (unlikely(dma_heap_exist(exp_info->name))) {
> +   mutex_unlock(_list_lock);
> +   pr_err("dma_heap: Already registered heap named %s\n",
> +  exp_info->name);
> +   err_ret = ERR_PTR(-EINVAL);
> +   goto err3;
> +   }
> list_add(>list, _list);
> mutex_unlock(_list_lock);
>
> return heap;
> -
> +err3:
> +   device_destroy(dma_heap_class, heap->heap_devt);
>  err2:
> cdev_del(>heap_cdev);
>  err1:
> --
> 2.25.1
>

[PATCH] drm/etnaviv: Remove duplicate call to drm_gem_free_mmap_offset

2022-07-06 Thread T.J. Mercier

The docs explicitly say the drm_gem_object_release function already calls this,
and this does not appear to be a prerequisite for the call to
etnaviv_gem_ops.release.

Signed-off-by: T.J. Mercier 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index cc386f8a7116..ecb828e4e156 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -504,7 +504,6 @@ void etnaviv_gem_free_object(struct drm_gem_object *obj)
kfree(mapping);
}
 
-   drm_gem_free_mmap_offset(obj);
etnaviv_obj->ops->release(etnaviv_obj);
drm_gem_object_release(obj);
 
-- 
2.37.0.rc0.161.g10f37bed90-goog

Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path

2022-06-15 Thread T.J. Mercier

On Wed, Jun 1, 2022 at 5:40 AM Daniel Vetter  wrote:
>
> On Mon, May 30, 2022 at 08:12:16AM +0200, Christian König wrote:
> > Am 25.05.22 um 23:05 schrieb T.J. Mercier:
> > > On Wed, May 25, 2022 at 7:38 AM Daniel Vetter  wrote:
> > > > On Tue, May 17, 2022 at 08:13:24AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote:
> > > > > > On Mon, May 16, 2022 at 12:21 PM Christian König
> > > > > >  wrote:
> > > > > > > Am 16.05.22 um 20:08 schrieb T.J. Mercier:
> > > > > > > > On Mon, May 16, 2022 at 10:20 AM Christian König
> > > > > > > >  wrote:
> > > > > > > > > Am 16.05.22 um 19:13 schrieb T.J. Mercier:
> > > > > > > > > > Recently, we noticed an issue where a process went into 
> > > > > > > > > > direct reclaim
> > > > > > > > > > while holding the kernfs rw semaphore for sysfs in write 
> > > > > > > > > > (exclusive)
> > > > > > > > > > mode. This caused processes who were doing DMA-BUF exports 
> > > > > > > > > > and releases
> > > > > > > > > > to go into uninterruptible sleep since they needed to 
> > > > > > > > > > acquire the same
> > > > > > > > > > semaphore for the DMA-BUF sysfs entry creation/deletion. In 
> > > > > > > > > > order to avoid
> > > > > > > > > > blocking DMA-BUF export for an indeterminate amount of time 
> > > > > > > > > > while
> > > > > > > > > > another process is holding the sysfs rw semaphore in 
> > > > > > > > > > exclusive mode,
> > > > > > > > > > this patch moves the per-buffer sysfs file creation to the 
> > > > > > > > > > default work
> > > > > > > > > > queue. Note that this can lead to a short-term inaccuracy 
> > > > > > > > > > in the dmabuf
> > > > > > > > > > sysfs statistics, but this is a tradeoff to prevent the hot 
> > > > > > > > > > path from
> > > > > > > > > > being blocked. A work_struct is added to dma_buf to achieve 
> > > > > > > > > > this, but as
> > > > > > > > > > it is unioned with the kobject in the sysfs_entry, dma_buf 
> > > > > > > > > > does not
> > > > > > > > > > increase in size.
> > > > > > > > > I'm still not very keen of this approach as it strongly feels 
> > > > > > > > > like we
> > > > > > > > > are working around shortcoming somewhere else.
> > > > > > > > >
> > > > > > > > My read of the thread for the last version is that we're 
> > > > > > > > running into
> > > > > > > > a situation where sysfs is getting used for something it wasn't
> > > > > > > > originally intended for, but we're also stuck with this sysfs
> > > > > > > > functionality for dmabufs.
> > > > > > > >
> > > > > > > > > > Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose 
> > > > > > > > > > DMA-BUF stats in sysfs")
> > > > > > > > > > Originally-by: Hridya Valsaraju 
> > > > > > > > > > Signed-off-by: T.J. Mercier 
> > > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > See the originally submitted patch by Hridya Valsaraju here:
> > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066data=05%7C01%7Cchristian.koenig%40amd.com%7C8f00afd44b9744c45f5708da3e926503%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637891095771223650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=pubWqUyqhCWpXHhJHsoqarc3GLtB6IFB1rhgfsL4a1M%3Dreserved=0
> > > > > > > > > >
> > > > > > > > > > v2 changes:
> > > > > > > > > > - Defer only sysfs creation instead of

Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path

2022-05-25 Thread T.J. Mercier

On Wed, May 25, 2022 at 2:05 PM T.J. Mercier  wrote:
>
> On Wed, May 25, 2022 at 7:38 AM Daniel Vetter  wrote:
> >
> > On Tue, May 17, 2022 at 08:13:24AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote:
> > > > On Mon, May 16, 2022 at 12:21 PM Christian König
> > > >  wrote:
> > > > >
> > > > > Am 16.05.22 um 20:08 schrieb T.J. Mercier:
> > > > > > On Mon, May 16, 2022 at 10:20 AM Christian König
> > > > > >  wrote:
> > > > > >> Am 16.05.22 um 19:13 schrieb T.J. Mercier:
> > > > > >>> Recently, we noticed an issue where a process went into direct 
> > > > > >>> reclaim
> > > > > >>> while holding the kernfs rw semaphore for sysfs in write 
> > > > > >>> (exclusive)
> > > > > >>> mode. This caused processes who were doing DMA-BUF exports and 
> > > > > >>> releases
> > > > > >>> to go into uninterruptible sleep since they needed to acquire the 
> > > > > >>> same
> > > > > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order 
> > > > > >>> to avoid
> > > > > >>> blocking DMA-BUF export for an indeterminate amount of time while
> > > > > >>> another process is holding the sysfs rw semaphore in exclusive 
> > > > > >>> mode,
> > > > > >>> this patch moves the per-buffer sysfs file creation to the 
> > > > > >>> default work
> > > > > >>> queue. Note that this can lead to a short-term inaccuracy in the 
> > > > > >>> dmabuf
> > > > > >>> sysfs statistics, but this is a tradeoff to prevent the hot path 
> > > > > >>> from
> > > > > >>> being blocked. A work_struct is added to dma_buf to achieve this, 
> > > > > >>> but as
> > > > > >>> it is unioned with the kobject in the sysfs_entry, dma_buf does 
> > > > > >>> not
> > > > > >>> increase in size.
> > > > > >> I'm still not very keen of this approach as it strongly feels like 
> > > > > >> we
> > > > > >> are working around shortcoming somewhere else.
> > > > > >>
> > > > > > My read of the thread for the last version is that we're running 
> > > > > > into
> > > > > > a situation where sysfs is getting used for something it wasn't
> > > > > > originally intended for, but we're also stuck with this sysfs
> > > > > > functionality for dmabufs.
> > > > > >
> > > > > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose 
> > > > > >>> DMA-BUF stats in sysfs")
> > > > > >>> Originally-by: Hridya Valsaraju 
> > > > > >>> Signed-off-by: T.J. Mercier 
> > > > > >>>
> > > > > >>> ---
> > > > > >>> See the originally submitted patch by Hridya Valsaraju here:
> > > > > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066data=05%7C01%7Cchristian.koenig%40amd.com%7C794614324d114880a25508da37672e4b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637883213566903705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=bGlA2FeubfSeL5XDHYyWMZqUXfScoCphZjjK4jrqQJs%3Dreserved=0
> > > > > >>>
> > > > > >>> v2 changes:
> > > > > >>> - Defer only sysfs creation instead of creation and teardown per
> > > > > >>> Christian König
> > > > > >>>
> > > > > >>> - Use a work queue instead of a kthread for deferred work per
> > > > > >>> Christian König
> > > > > >>> ---
> > > > > >>>drivers/dma-buf/dma-buf-sysfs-stats.c | 56 
> > > > > >>> ---
> > > > > >>>include/linux/dma-buf.h   | 14 ++-
> > > > > >>>2 files changed, 54 insertions(+), 16 deletions(-)
> > > > > >>>
> > > > > >>> diff --git a/drivers/dma-buf/dma-buf-sys

Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path

2022-05-25 Thread T.J. Mercier

On Wed, May 25, 2022 at 7:38 AM Daniel Vetter  wrote:
>
> On Tue, May 17, 2022 at 08:13:24AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote:
> > > On Mon, May 16, 2022 at 12:21 PM Christian König
> > >  wrote:
> > > >
> > > > Am 16.05.22 um 20:08 schrieb T.J. Mercier:
> > > > > On Mon, May 16, 2022 at 10:20 AM Christian König
> > > > >  wrote:
> > > > >> Am 16.05.22 um 19:13 schrieb T.J. Mercier:
> > > > >>> Recently, we noticed an issue where a process went into direct 
> > > > >>> reclaim
> > > > >>> while holding the kernfs rw semaphore for sysfs in write (exclusive)
> > > > >>> mode. This caused processes who were doing DMA-BUF exports and 
> > > > >>> releases
> > > > >>> to go into uninterruptible sleep since they needed to acquire the 
> > > > >>> same
> > > > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order 
> > > > >>> to avoid
> > > > >>> blocking DMA-BUF export for an indeterminate amount of time while
> > > > >>> another process is holding the sysfs rw semaphore in exclusive mode,
> > > > >>> this patch moves the per-buffer sysfs file creation to the default 
> > > > >>> work
> > > > >>> queue. Note that this can lead to a short-term inaccuracy in the 
> > > > >>> dmabuf
> > > > >>> sysfs statistics, but this is a tradeoff to prevent the hot path 
> > > > >>> from
> > > > >>> being blocked. A work_struct is added to dma_buf to achieve this, 
> > > > >>> but as
> > > > >>> it is unioned with the kobject in the sysfs_entry, dma_buf does not
> > > > >>> increase in size.
> > > > >> I'm still not very keen of this approach as it strongly feels like we
> > > > >> are working around shortcoming somewhere else.
> > > > >>
> > > > > My read of the thread for the last version is that we're running into
> > > > > a situation where sysfs is getting used for something it wasn't
> > > > > originally intended for, but we're also stuck with this sysfs
> > > > > functionality for dmabufs.
> > > > >
> > > > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF 
> > > > >>> stats in sysfs")
> > > > >>> Originally-by: Hridya Valsaraju 
> > > > >>> Signed-off-by: T.J. Mercier 
> > > > >>>
> > > > >>> ---
> > > > >>> See the originally submitted patch by Hridya Valsaraju here:
> > > > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066data=05%7C01%7Cchristian.koenig%40amd.com%7C794614324d114880a25508da37672e4b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637883213566903705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=bGlA2FeubfSeL5XDHYyWMZqUXfScoCphZjjK4jrqQJs%3Dreserved=0
> > > > >>>
> > > > >>> v2 changes:
> > > > >>> - Defer only sysfs creation instead of creation and teardown per
> > > > >>> Christian König
> > > > >>>
> > > > >>> - Use a work queue instead of a kthread for deferred work per
> > > > >>> Christian König
> > > > >>> ---
> > > > >>>drivers/dma-buf/dma-buf-sysfs-stats.c | 56 
> > > > >>> ---
> > > > >>>include/linux/dma-buf.h   | 14 ++-
> > > > >>>2 files changed, 54 insertions(+), 16 deletions(-)
> > > > >>>
> > > > >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c 
> > > > >>> b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > > >>> index 2bba0babcb62..67b0a298291c 100644
> > > > >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > > >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > > >>> @@ -11,6 +11,7 @@
> > > > >>>#include 
> > > > >>>#include 
> > > > >>>#include 
> > > > >>> +#include 
> > > >

60 matches

Mail list logo