On 3/28/19 3:40 PM, Jerome Glisse wrote: > On Thu, Mar 28, 2019 at 03:25:39PM -0700, John Hubbard wrote: >> On 3/28/19 3:08 PM, Jerome Glisse wrote: >>> On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote: >>>> On 3/28/19 2:30 PM, Jerome Glisse wrote: >>>>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote: >>>>>> On 3/25/19 7:40 AM, jgli...@redhat.com wrote: >>>>>>> From: Jérôme Glisse <jgli...@redhat.com> >> [...] >>>> >>>>>> >>>>>> If you insist on having this wrapper, I think it should have >>>>>> approximately >>>>>> this form: >>>>>> >>>>>> void hmm_mirror_mm_down_read(...) >>>>>> { >>>>>> WARN_ON(...) >>>>>> down_read(...) >>>>>> } >>>>> >>>>> I do insist as it is useful and use by both RDMA and nouveau and the >>>>> above would kill the intent. The intent is do not try to take the lock >>>>> if the process is dying. >>>> >>>> Could you provide me a link to those examples so I can take a peek? I >>>> am still convinced that this whole thing is a race condition at best. >>> >>> The race is fine and ok see: >>> >>> https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec >>> >>> which has been posted and i think i provided a link in the cover >>> letter to that post. The same patch exist for nouveau i need to >>> cleanup that tree and push it. >> >> Thanks for that link, and I apologize for not keeping up with that >> other review thread. >> >> Looking it over, hmm_mirror_mm_down_read() is only used in one place. >> So, what you really want there is not a down_read() wrapper, but rather, >> something like >> >> hmm_sanity_check() >> >> , that ib_umem_odp_map_dma_pages() calls. > > Why ? The device driver pattern is: > if (hmm_is_it_dying()) { > // handle when process die and abort the fault ie useless > // to call within HMM > } > down_read(mmap_sem); > > This pattern is common within nouveau and RDMA and other device driver in > the work. Hence why i am replacing it with just one helper. Also it has the > added benefit that changes being discussed around the mmap sem will be easier > to do as it avoid having to update each driver but instead it can be done > just once for the HMM helpers.
Yes, and I'm saying that the pattern is broken. Because it's racy. :) >>>>>>> +{ >>>>>>> + struct mm_struct *mm; >>>>>>> + >>>>>>> + /* Sanity check ... */ >>>>>>> + if (!mirror || !mirror->hmm) >>>>>>> + return -EINVAL; >>>>>>> + /* >>>>>>> + * Before trying to take the mmap_sem make sure the mm is still >>>>>>> + * alive as device driver context might outlive the mm lifetime. >>>>>> >>>>>> Let's find another way, and a better place, to solve this problem. >>>>>> Ref counting? >>>>> >>>>> This has nothing to do with refcount or use after free or anthing >>>>> like that. It is just about checking wether we are about to do >>>>> something pointless. If the process is dying then it is pointless >>>>> to try to take the lock and it is pointless for the device driver >>>>> to trigger handle_mm_fault(). >>>> >>>> Well, what happens if you let such pointless code run anyway? >>>> Does everything still work? If yes, then we don't need this change. >>>> If no, then we need a race-free version of this change. >>> >>> Yes everything work, nothing bad can happen from a race, it will just >>> do useless work which never hurt anyone. >>> >> >> OK, so let's either drop this patch, or if merge windows won't allow that, >> then *eventually* drop this patch. And instead, put in a hmm_sanity_check() >> that does the same checks. > > RDMA depends on this, so does the nouveau patchset that convert to new API. > So i do not see reason to drop this. They are user for this they are posted > and i hope i explained properly the benefit. > > It is a common pattern. Yes it only save couple lines of code but down the > road i will also help for people working on the mmap_sem patchset. > It *adds* a couple of lines that are misleading, because they look like they make things safer, but they don't actually do so. thanks, -- John Hubbard NVIDIA