On Wed, Feb 24, 2016 at 10:35 AM, Linus Torvalds <[email protected]> wrote: > On Feb 24, 2016 09:38, "Tony Luck" <[email protected]> wrote: >> >> 2) But if we want to use this for copy_from_user() as part of the >> write(2) call stack (and I *do* want to do that) > > Think of it this way: if a regular copy_from_user() doesn't work on > the memory, there's no way in hell we can allow user space to map it > anyway.
I see I have caused confusion by talking about Dan's NVDIMM case and copy_from_user() in the same breath, This isn't just about NVDIMMs. It is about uncorrected errors in any type of memory. The copy_from_user() case I'd like to fix is when there is an uncorrected in memory. This can happen to regular DDR3/DDR4 memory just as it can happen to NVDIMM. When a user process directly reads the location with the uncorrected error the h/w triggers a machine check and if the error is marked as recoverable the kernel will SIGBUS the process. See mm/memory-failure.c We do have an issue in current processor implementations that access using rep mov generates a fatal machine check. This is a gap and will at some point be fixed. Currently if the user passes the address of the location with the uncorrected error to the kernel via a system call like write(2) the kernel will do the access, and we will crash. I'd like to fix that and SIGBUS the user, just like would happen if they touched the memory themselves.To do that the copy_from_user() needs to be able to tell that there was a machine check (and probably take action inline, as fixing every place that we call copy_from_user() sounds like a silly idea). Maybe this won't be ready for inclusion in the kernel until rep mov generates recoverable machine checks ... otherwise we have a bigger heap of possible copy routines to choose from as we'd have to look not only an the speed of various copy algorithms, but also at the behavior during exceptions ... in particular users may have to choose between speed (rep mov) and recoverability on current generation cpus (and given how rare uncorrected errors are, they might all choose speed). -Tony

