On Thu, Feb 27, 2020 at 10:03 AM Jason Gunthorpe <j...@ziepe.ca> wrote: > > On Thu, Feb 27, 2020 at 09:55:04AM -0800, Dan Williams wrote: > > On Thu, Feb 27, 2020 at 9:43 AM Jason Gunthorpe <j...@ziepe.ca> wrote: > > > > > > On Thu, Feb 27, 2020 at 10:21:50AM -0700, Logan Gunthorpe wrote: > > > > > > > > > > > > On 2020-02-27 10:17 a.m., Jason Gunthorpe wrote: > > > > >> Instead of this, this series proposes a change to arch_add_memory() > > > > >> to take the pgprot required by the mapping which allows us to > > > > >> explicitly set pagetable entries for P2PDMA memory to WC. > > > > > > > > > > Is there a particular reason why WC was selected here? I thought for > > > > > the p2pdma cases there was no kernel user that touched the memory? > > > > > > > > Yes, that's correct. I choose WC here because the existing users are > > > > registering memory blocks without side effects which fit the WC > > > > semantics well. > > > > > > Hm, AFAIK WC memory is not compatible with the spinlocks/mutexs/etc in > > > Linux, so while it is true the memory has no side effects, there would > > > be surprising concurrency risks if anything in the kernel tried to > > > write to it. > > > > > > Not compatible means the locks don't contain stores to WC memory the > > > way you would expect. AFAIK on many CPUs extra barriers are required > > > to keep WC stores ordered, the same way ARM already has extra barriers > > > to keep UC stores ordered with locking.. > > > > > > The spinlocks are defined to contain UC stores though. > > > > How are spinlocks and mutexes getting into p2pdma ranges in the first > > instance? Even with UC, the system has bigger problems if it's trying > > to send bus locks targeting PCI, see the flurry of activity of trying > > to trigger faults on split locks [1]. > > This is not what I was trying to explain. > > Consider > > static spinlock lock; // CPU DRAM > static idx = 0; > u64 *wc_memory = [..]; > > spin_lock(&lock); > wc_memory[0] = idx++; > spin_unlock(&lock); > > You'd expect that the PCI device will observe stores where idx is > strictly increasing, but this is not guarenteed. idx may decrease, idx > may skip. It just won't duplicate. > > Or perhaps > > wc_memory[0] = foo; > writel(doorbell) > > foo is not guarenteed observable by the device before doorbell reaches > the device. > > All of these are things that do not happen with UC or NC memory, and > are surprising violations of our programming model. > > Generic kernel code should never touch WC memory unless the code is > specifically designed to handle it.
Ah, yes, agree.