On 03/15/2018 05:54 PM, Jerome Glisse wrote: > On Thu, Mar 15, 2018 at 03:48:29PM -0700, Andrew Morton wrote: >> On Thu, 15 Mar 2018 14:36:59 -0400 jgli...@redhat.com wrote: >> >>> From: Ralph Campbell <rcampb...@nvidia.com> >>> >>> The hmm_mirror_register() function registers a callback for when >>> the CPU pagetable is modified. Normally, the device driver will >>> call hmm_mirror_unregister() when the process using the device is >>> finished. However, if the process exits uncleanly, the struct_mm >>> can be destroyed with no warning to the device driver. >> >> The changelog doesn't tell us what the runtime effects of the bug are. >> This makes it hard for me to answer the "did Jerome consider doing >> cc:stable" question. > > The impact is low, they might be issue only if application is kill, > and we don't have any upstream user yet hence why i did not cc > stable. >
Hi Jerome and Andrew, I'd claim that it is not possible to make a safe and correct device driver, without this patch. That's because, without the .release callback that you're adding here, the driver could end up doing operations on a stale struct_mm, leading to crashes and other disasters. Even if people think that maybe that window is "small", it's not really any smaller than lots of race condition problems that we've seen. And it is definitely not that hard to hit it: just a good directed stress test involving multiple threads that are doing early process termination while also doing lots of migrations and page faults, should suffice. It is probably best to add this patch to stable, for that reason. thanks, -- John Hubbard NVIDIA