Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

Marcel Apfelbaum Thu, 01 Feb 2018 10:14:14 -0800

On 01/02/2018 14:57, Michael S. Tsirkin wrote:
> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>>>> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
>>>>> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
>>>>>> On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
>>>>>>> Currently only file backed memory backend can
>>>>>>> be created with a "share" flag in order to allow
>>>>>>> sharing guest RAM with other processes in the host.
>>>>>>>
>>>>>>> Add the "share" flag also to RAM Memory Backend
>>>>>>> in order to allow remapping parts of the guest RAM
>>>>>>> to different host virtual addresses. This is needed
>>>>>>> by the RDMA devices in order to remap non-contiguous
>>>>>>> QEMU virtual addresses to a contiguous virtual address range.
>>>>>>>
>>>>>>
>>>>>> Why do we need to make this configurable?  Would anything break
>>>>>> if MAP_SHARED was always used if possible?
>>>>>
>>>>> See Documentation/vm/numa_memory_policy.txt for a list
>>>>> of complications.
>>>>
>>>> Ew.
>>>>
>>>>>
>>>>> Maybe we should more of an effort to detect and report these
>>>>> issues.
>>>>
>>>> Probably.  Having other features breaking silently when using
>>>> pvrdma doesn't sound good.  We must at least document those
>>>> problems in the documentation for memory-backend-ram.
>>>>
>>>> BTW, what's the root cause for requiring HVAs in the buffer?
>>>
>>> It's a side effect of the kernel/userspace API which always wants
>>> a single HVA/len pair to map memory for the application.
>>>
>>>
>>
>> Hi Eduardo and Michael,
>>
>>>>  Can
>>>> this be fixed?
>>>
>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>> mapping an s/g list with actual memory. The HVA/len pair would then just
>>> be used to refer to the region, without creating the two mappings.
>>>
>>> Something like splitting the register mr into
>>>
>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>
>>> addmemory(mr, offset, hva, len) - pin memory
>>>
>>> register mr - pass it to HW
>>>
>>> As a nice side effect we won't burn so much virtual address space.
>>>
>>
>> We would still need a contiguous virtual address space range (for post-send)
>> which we don't have since guest contiguous virtual address space
>> will always end up as non-contiguous host virtual address space.
> 
> It just needs to be contiguous in the HCA virtual address space.
> Software never accesses through this pointer.
> In other words - basically expose register physical mr to userspace.
> 
> 
>>
>> I am not sure the RDMA HW can handle a large VA with holes.
>>
>> An alternative would be 0-based MR, QEMU intercepts the post-send
>> operations and can substract the guest VA base address.
>> However I didn't see the implementation in kernel for 0 based MRs
>> and also the RDMA maintainer said it would work for local keys
>> and not for remote keys.
>>
>>> This will fix rdma with hugetlbfs as well which is currently broken.
>>>
>>>
>>
>> There is already a discussion on the linux-rdma list:
>>     https://www.spinics.net/lists/linux-rdma/msg60079.html
>> But it will take some (actually a lot of) time, we are currently talking 
>> about
>> a possible API.
> 
> You probably need to pass the s/g piece by piece since it might exceed
> any reasonable array size.


Right. They say the new API is ioctl based but so this is not a limitation.
We proposed also a bitmap representation of a large range,
but what we really need is what you mentioned:
  to pass the Guest VA directly to reg_mr.

Thanks,
Marcel

> 
>> And it does not solve the re-mapping...
>>
>> Thanks,
>> Marcel
> 
> Haven't read through that discussion. But at least what I posted solves
> it since you do not need it contiguous in HVA any longer.
> 
>>>> -- 
>>>> Eduardo

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

Reply via email to