On Mon, May 06, 2019 at 04:29:41PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> The helper function hmm_vma_fault() calls hmm_range_register() but is
> missing a call to hmm_range_unregister() in one of the error paths.
> This leads to a reference count leak and ultimately a memory leak on
> struct hmm.
>
> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/hmm.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 35a429621e1e..fa0671d67269 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range,
> bool block)
> return (int)ret;
>
> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> + hmm_range_unregister(range);
> /*
> * The mmap_sem was taken by driver we release it here and
> * returns -EAGAIN which correspond to mmap_sem have been
> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range
> *range, bool block)
>
> ret = hmm_range_fault(range, block);
> if (ret <= 0) {
> + hmm_range_unregister(range);
While this seems to be a clear improvement, it seems there is still a
bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
never calls hmm_range_unregister() for its on stack range - and
hmm_vma_fault() still returns with the range registered.
As hmm_vma_fault() is only used by nouveau and is marked as
deprecated, I think we need to fix nouveau, either by dropping
hmm_range_fault(), or by adding the missing unregister to nouveau in
this patch.
Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of
this deprecated API, including these bugs...
amd folks: Can you please push a patch for your driver to stop using
hmm_vma_fault() and correct the use-after free? Ideally I'd like to
delete this function this merge cycle from hmm.git
Also if you missed it, I'm running a clean hmm.git that you can pull
into the AMD tree, if necessary, to get the changes that will go into
5.3 - if you need/wish to do this please consult with me before making a
merge commit, thanks. See:
https://lore.kernel.org/lkml/[email protected]/
So Ralph, you'll need to resend this.
Thanks,
Jason