Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Tue, May 19, 2015 at 12:18:54PM +0100, Catalin Marinas wrote: > On Tue, May 19, 2015 at 11:03:22AM +0100, Andrew Jones wrote: > > On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote: > > > Another way would be to split the vma containing the non-cacheable > > > memory so that you get a single vma with the vm_page_prot as > > > Non-cacheable. > > > > This sounds interesting. Actually, it even crossed my mind once when I > > first saw that the vma would overwrite the attributes, but then, sigh, > > I let my brain take a stupidity bath. > > > > > > > > Yet another approach could be for KVM to mmap the necessary memory for > > > Qemu via a file_operations.mmap call (but that's only for ranges outside > > > the guest "RAM"). > > > > I guess I prefer the vma splitting, rather than this (the vma creating > > with mmap), as it keeps the KVM interface from changing (as you point out > > below). Well, unless there are other advantages to this that are worth > > considering? > > The advantage is that you don't need to deal with the mm internals in > the KVM code. > > But you can probably add such code directly to mm/ and reuse some of the > existing code in there already as part of change_protection(), > mprotect_fixup(), sys_mprotect(). Actually, once you split the vma and > set the new protection (something similar to mprotect_fixup), it looks > to me like you can just call change_protection(vma->vm_page_prot). I'll start playing around with this today. > > > > I didn't have time to follow these threads in details, but just to > > > recap my understanding, we have two main use-cases: > > > > > > 1. Qemu handling guest I/O to device (e.g. PCIe BARs) > > > 2. Qemu emulating device DMA > > > > > > For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about > > > this memory slot. The memory attributes in this case could be Device > > > because that's how the guest would normally map it. The > > > file_operations.mmap trick would work in this case but this means > > > expanding the KVM ABI beyond just an ioctl(). > > > > > > For (2), since Qemu is writing to the guest "RAM" (e.g. video > > > framebuffer allocated by the guest), I still think the simplest is to > > > tell the guest (via DT) that such device is cache coherent rather than > > > trying to remap the Qemu mapping as non-cacheable. > > > > If we need a solution for (1), then I'd prefer that it work and be > > applied to (2) as well. Anyway, I'm still not 100% sure we can count on > > all guest types (booloaders, different OSes) to listen to us. They may > > assume non-cacheable is typical and safe, and thus just do that always. > > We can certainly change some of those bootloaders and OSes, but probably > > not all of them. > > That's fine by me. Once you get the vma splitting and attributes > changing done, I think you get the second one for free. > > Do we want to differentiate between Device and Normal Non-cacheable > memory? Something like KVM_MEMSLOT_DEVICE? > > Nitpick: I'm not sure whether "uncached" is clear enough. In Linux, > pgprot_noncached() returns Strongly Ordered memory. For Normal > Non-cachable we used pgprot_writecombine (e.g. a video framebuffer). > > Maybe something like KVM_MEMSLOT_COHERENT meaning a request to KVM to Sounds good to me. I'll rename for the next round. > ensure that guest and host access it coherently (which would mean > writecombine for ARM). That's similar naming to functions like > dma_alloc_coherent() that return cacheable or non-cacheable memory based > on what the device supports. Anyway, I'm not to bothered with the > naming. > Thanks for your help! drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Tue, May 19, 2015 at 11:03:22AM +0100, Andrew Jones wrote: > On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote: > > Another way would be to split the vma containing the non-cacheable > > memory so that you get a single vma with the vm_page_prot as > > Non-cacheable. > > This sounds interesting. Actually, it even crossed my mind once when I > first saw that the vma would overwrite the attributes, but then, sigh, > I let my brain take a stupidity bath. > > > > > Yet another approach could be for KVM to mmap the necessary memory for > > Qemu via a file_operations.mmap call (but that's only for ranges outside > > the guest "RAM"). > > I guess I prefer the vma splitting, rather than this (the vma creating > with mmap), as it keeps the KVM interface from changing (as you point out > below). Well, unless there are other advantages to this that are worth > considering? The advantage is that you don't need to deal with the mm internals in the KVM code. But you can probably add such code directly to mm/ and reuse some of the existing code in there already as part of change_protection(), mprotect_fixup(), sys_mprotect(). Actually, once you split the vma and set the new protection (something similar to mprotect_fixup), it looks to me like you can just call change_protection(vma->vm_page_prot). > > I didn't have time to follow these threads in details, but just to > > recap my understanding, we have two main use-cases: > > > > 1. Qemu handling guest I/O to device (e.g. PCIe BARs) > > 2. Qemu emulating device DMA > > > > For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about > > this memory slot. The memory attributes in this case could be Device > > because that's how the guest would normally map it. The > > file_operations.mmap trick would work in this case but this means > > expanding the KVM ABI beyond just an ioctl(). > > > > For (2), since Qemu is writing to the guest "RAM" (e.g. video > > framebuffer allocated by the guest), I still think the simplest is to > > tell the guest (via DT) that such device is cache coherent rather than > > trying to remap the Qemu mapping as non-cacheable. > > If we need a solution for (1), then I'd prefer that it work and be > applied to (2) as well. Anyway, I'm still not 100% sure we can count on > all guest types (booloaders, different OSes) to listen to us. They may > assume non-cacheable is typical and safe, and thus just do that always. > We can certainly change some of those bootloaders and OSes, but probably > not all of them. That's fine by me. Once you get the vma splitting and attributes changing done, I think you get the second one for free. Do we want to differentiate between Device and Normal Non-cacheable memory? Something like KVM_MEMSLOT_DEVICE? Nitpick: I'm not sure whether "uncached" is clear enough. In Linux, pgprot_noncached() returns Strongly Ordered memory. For Normal Non-cachable we used pgprot_writecombine (e.g. a video framebuffer). Maybe something like KVM_MEMSLOT_COHERENT meaning a request to KVM to ensure that guest and host access it coherently (which would mean writecombine for ARM). That's similar naming to functions like dma_alloc_coherent() that return cacheable or non-cacheable memory based on what the device supports. Anyway, I'm not to bothered with the naming. -- Catalin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
Hi Catalin, Thanks for the feedback. Some comments to your comments below. On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote: > On Thu, May 14, 2015 at 02:46:44PM +0100, Andrew Jones wrote: > > On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote: > > > On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote: > > > > Provide a method to change normal, cacheable memory to non-cacheable. > > > > KVM will make use of this to keep emulated device memory regions > > > > coherent with the guest. > > > > > > > > Signed-off-by: Andrew Jones > > > > > > Reviewed-by: Christoffer Dall > > > > > > But you obviously need Russell and Will/Catalin to ack/merge this. > > > > I guess this patch is going to go away in the next round. You've > > pointed out that I screwed stuff up royally with my over eagerness > > to reuse code. I need to reimplement change_memory_common, but a > > version that takes an mm, which is more or less what I did in the > > last version of this series, back when I was pinning pages. > > I kept wondering what this patch and the next one are doing with > set_memory_nc(). Basically you were trying to set the cache attributes > for the kernel linear mapping or kmap address (in the 32-bit arm case, > which is removed anyway on kunmap). Yeah, I was way to hasty with this version... Sorry for wasting people's time with it. > > What you need is changing the attributes of the user mapping as accessed > by Qemu but I don't think simply re-implementing change_memory_common() > would work, you probably need to pin the pages in memory as well. > Otherwise, the kernel may remove such pages and, when bringing them > back, would set the default cacheability attributes. Yup, I read that code and saw it would inherit the memory attributes from the vma. At the time I wasn't thinking about the userspace mapping though, and thus had convinced myself that if "the" mapping goes away, then we'll be invalidating the guest's mapping too, and thus we'd end up faulting it in again when necessary, and at that time resetting the attributes. If the guest never faulted it in again, then it wouldn't have mattered what the attributes were anyway. Of course I was thinking about the wrong mapping... > > Another way would be to split the vma containing the non-cacheable > memory so that you get a single vma with the vm_page_prot as > Non-cacheable. This sounds interesting. Actually, it even crossed my mind once when I first saw that the vma would overwrite the attributes, but then, sigh, I let my brain take a stupidity bath. > > Yet another approach could be for KVM to mmap the necessary memory for > Qemu via a file_operations.mmap call (but that's only for ranges outside > the guest "RAM"). I guess I prefer the vma splitting, rather than this (the vma creating with mmap), as it keeps the KVM interface from changing (as you point out below). Well, unless there are other advantages to this that are worth considering? > > I didn't have time to follow these threads in details, but just to > recap my understanding, we have two main use-cases: > > 1. Qemu handling guest I/O to device (e.g. PCIe BARs) > 2. Qemu emulating device DMA > > For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about > this memory slot. The memory attributes in this case could be Device > because that's how the guest would normally map it. The > file_operations.mmap trick would work in this case but this means > expanding the KVM ABI beyond just an ioctl(). > > For (2), since Qemu is writing to the guest "RAM" (e.g. video > framebuffer allocated by the guest), I still think the simplest is to > tell the guest (via DT) that such device is cache coherent rather than > trying to remap the Qemu mapping as non-cacheable. If we need a solution for (1), then I'd prefer that it work and be applied to (2) as well. Anyway, I'm still not 100% sure we can count on all guest types (booloaders, different OSes) to listen to us. They may assume non-cacheable is typical and safe, and thus just do that always. We can certainly change some of those bootloaders and OSes, but probably not all of them. Thanks, drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm