Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-23 Thread Ian Molton
Alexander Graf wrote:

 I guess what you really want is some shm region between host and guess
 that you can use as ring buffer. Then you could run a timer on the host
 side to flush it or have some sort of callback when you urgently need to
 flush it manually.
 
 The benefit here is that you can actually make use of multiple threads.
 There's no need to intercept the guest at all just because it wants to
 issue some GL operations.

Something like that should work. The problem right now is mostly the
'some sort of callback'. Im not sure there exists any mechanism for the
guests userspace to interrupt qemu directly when running under kvm...




Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-23 Thread Alexander Graf

On 23.02.2010, at 16:46, Ian Molton wrote:

 Alexander Graf wrote:
 
 I guess what you really want is some shm region between host and guess
 that you can use as ring buffer. Then you could run a timer on the host
 side to flush it or have some sort of callback when you urgently need to
 flush it manually.
 
 The benefit here is that you can actually make use of multiple threads.
 There's no need to intercept the guest at all just because it wants to
 issue some GL operations.
 
 Something like that should work. The problem right now is mostly the
 'some sort of callback'. Im not sure there exists any mechanism for the
 guests userspace to interrupt qemu directly when running under kvm...

I'm not aware of any mechanism, but you could easily write a simple UIO driver 
that takes over this exact task.
Or you build on top of cam's shm patches and create a device node that exposes 
a poke ioctl.


Alex



Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-23 Thread Anthony Liguori

On 02/22/2010 11:47 AM, Ian Molton wrote:

Anthony Liguori wrote:
   

On 02/22/2010 10:46 AM, Ian Molton wrote:
 

Anthony Liguori wrote:


   

cpu_physical_memory_map().

But this function has some subtle characteristics.  It may return a
bounce buffer if you attempt to map MMIO memory.  There is a limited
pool of bounce buffers available so it may return NULL in the event that
it cannot allocate a bounce buffer.

It may also return a partial result if you're attempting to map a region
that straddles multiple memory slots.

 

Thanks. I had found this, but was unsure as to wether it was quite what
I wanted. (also is it possible to tell when it has (eg.) allocated a
bounce buffer?)

Basically, I need to get buffer(s) from guest userspace into the hosts
address space. The buffers are virtually contiguous but likely
physically discontiguous. They are allocated with malloc() and theres
nothing I can do about that.

The obvious but slow solution would be to copy all the buffers into nice
virtio-based scatter/gather buffers and feed them to the host that way,
however its not fast enough.

   

Why is this slow?
 

Because the buffers will all have to be copied.


Why?

It sounds like your kernel driver is doing the wrong thing if you can't 
preserve zero-copy from userspace.



  So far, switching from
abusing an instruction to interrupt qemu to using virtio has incurred a
roughly 5x slowdown.


If you post patches, we can help determine what you're doing that's 
causing such a slow down.


Regards,

Anthony Liguori


  I'd guess much of this is down to the fact we have
to switch to kernel-mode on the guest and back again for every single GL
call...

If I can establish some kind of stable guest_virt-phys-host_virt
mapping, many of the problems will just 'go away'. a way to interrupt
qemu from user-mode on the guest without involving the guest kernel
would be quite awesome also (theres really nothing we want the kernel to
actually /do/ here, it just adds overhead).

-Ian

   






Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-22 Thread Anthony Liguori

On 02/22/2010 07:59 AM, Ian Molton wrote:

Hi folks,

I've been updating some old patches which make use of a function to
translate guest virtual addresses into pointers into the guest RAM.

As I understand it qemu has guest virtual and physical addresses, the
latter of which map somehow to host ram addresses.

The function which the code had been using appears not to work under
kvm, which leads me to think that qemu doesnt emulate the MMU (or at
least not in the same manner) when it is using kvm as opposed to pure
emulation.

If I turn off kvm, the patch works, albeit slowly. If I enable it, the
code takes the path which looks for the magic value (below).

Is there a 'proper' way to translate guest virtual addresses into host
RAM addresses?
   


cpu_physical_memory_map().

But this function has some subtle characteristics.  It may return a 
bounce buffer if you attempt to map MMIO memory.  There is a limited 
pool of bounce buffers available so it may return NULL in the event that 
it cannot allocate a bounce buffer.


It may also return a partial result if you're attempting to map a region 
that straddles multiple memory slots.


Regards,

Anthony Liguori




Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-22 Thread Ian Molton
Anthony Liguori wrote:

 cpu_physical_memory_map().
 
 But this function has some subtle characteristics.  It may return a
 bounce buffer if you attempt to map MMIO memory.  There is a limited
 pool of bounce buffers available so it may return NULL in the event that
 it cannot allocate a bounce buffer.
 
 It may also return a partial result if you're attempting to map a region
 that straddles multiple memory slots.

Thanks. I had found this, but was unsure as to wether it was quite what
I wanted. (also is it possible to tell when it has (eg.) allocated a
bounce buffer?)

Basically, I need to get buffer(s) from guest userspace into the hosts
address space. The buffers are virtually contiguous but likely
physically discontiguous. They are allocated with malloc() and theres
nothing I can do about that.

The obvious but slow solution would be to copy all the buffers into nice
virtio-based scatter/gather buffers and feed them to the host that way,
however its not fast enough.

Right now I have a little driver I have written that allows a buffer to
be mmap()ed by the guest userspace, and this is pushed to the host via
virtio s/g io when the guest calls fsync(). This buffer contains the
data that must be passed to the host, however this data may often
contain pointers to (that is, userspace virtual addresses of) buffers of
unknown sizes which the host also needs to access. These buffers are
what I need to read from the guests RAM.

The buffers will likely remain active across multiple different calls to
the host so their pages will need to be available. As the calls always
happen when that specific process is active, I'd guess the worst we need
to do is generate a page fault to unswap the page(s). Can that be caused
by qemu (under kvm)?

It seems that cpu_physical_memory_map() deals with physically contiguous
areas of guest address space. I need to get a host-side mapping of a
*virtually* contiguous (possibly physically discontiguous) set of guest
pages. If this can be done, it'd mean direct transfer of data from guest
application to host shared library, which would be a major win.

-Ian




Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-22 Thread Anthony Liguori

On 02/22/2010 10:46 AM, Ian Molton wrote:

Anthony Liguori wrote:

   

cpu_physical_memory_map().

But this function has some subtle characteristics.  It may return a
bounce buffer if you attempt to map MMIO memory.  There is a limited
pool of bounce buffers available so it may return NULL in the event that
it cannot allocate a bounce buffer.

It may also return a partial result if you're attempting to map a region
that straddles multiple memory slots.
 

Thanks. I had found this, but was unsure as to wether it was quite what
I wanted. (also is it possible to tell when it has (eg.) allocated a
bounce buffer?)

Basically, I need to get buffer(s) from guest userspace into the hosts
address space. The buffers are virtually contiguous but likely
physically discontiguous. They are allocated with malloc() and theres
nothing I can do about that.

The obvious but slow solution would be to copy all the buffers into nice
virtio-based scatter/gather buffers and feed them to the host that way,
however its not fast enough.
   


Why is this slow?

Regards,

Anthony Liguori


Right now I have a little driver I have written that allows a buffer to
be mmap()ed by the guest userspace, and this is pushed to the host via
virtio s/g io when the guest calls fsync(). This buffer contains the
data that must be passed to the host, however this data may often
contain pointers to (that is, userspace virtual addresses of) buffers of
unknown sizes which the host also needs to access. These buffers are
what I need to read from the guests RAM.
   





The buffers will likely remain active across multiple different calls to
the host so their pages will need to be available. As the calls always
happen when that specific process is active, I'd guess the worst we need
to do is generate a page fault to unswap the page(s). Can that be caused
by qemu (under kvm)?

It seems that cpu_physical_memory_map() deals with physically contiguous
areas of guest address space. I need to get a host-side mapping of a
*virtually* contiguous (possibly physically discontiguous) set of guest
pages. If this can be done, it'd mean direct transfer of data from guest
application to host shared library, which would be a major win.

-Ian
   






Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-22 Thread Ian Molton
Anthony Liguori wrote:
 On 02/22/2010 10:46 AM, Ian Molton wrote:
 Anthony Liguori wrote:

   
 cpu_physical_memory_map().

 But this function has some subtle characteristics.  It may return a
 bounce buffer if you attempt to map MMIO memory.  There is a limited
 pool of bounce buffers available so it may return NULL in the event that
 it cannot allocate a bounce buffer.

 It may also return a partial result if you're attempting to map a region
 that straddles multiple memory slots.
  
 Thanks. I had found this, but was unsure as to wether it was quite what
 I wanted. (also is it possible to tell when it has (eg.) allocated a
 bounce buffer?)

 Basically, I need to get buffer(s) from guest userspace into the hosts
 address space. The buffers are virtually contiguous but likely
 physically discontiguous. They are allocated with malloc() and theres
 nothing I can do about that.

 The obvious but slow solution would be to copy all the buffers into nice
 virtio-based scatter/gather buffers and feed them to the host that way,
 however its not fast enough.

 
 Why is this slow?

Because the buffers will all have to be copied. So far, switching from
abusing an instruction to interrupt qemu to using virtio has incurred a
roughly 5x slowdown. I'd guess much of this is down to the fact we have
to switch to kernel-mode on the guest and back again for every single GL
call...

If I can establish some kind of stable guest_virt-phys-host_virt
mapping, many of the problems will just 'go away'. a way to interrupt
qemu from user-mode on the guest without involving the guest kernel
would be quite awesome also (theres really nothing we want the kernel to
actually /do/ here, it just adds overhead).

-Ian





Re: [Qemu-devel] Address translation - virt-phys-ram

2010-02-22 Thread Alexander Graf
Ian Molton wrote:
 Anthony Liguori wrote:
   
 On 02/22/2010 10:46 AM, Ian Molton wrote:
 
 Anthony Liguori wrote:

   
   
 cpu_physical_memory_map().

 But this function has some subtle characteristics.  It may return a
 bounce buffer if you attempt to map MMIO memory.  There is a limited
 pool of bounce buffers available so it may return NULL in the event that
 it cannot allocate a bounce buffer.

 It may also return a partial result if you're attempting to map a region
 that straddles multiple memory slots.
  
 
 Thanks. I had found this, but was unsure as to wether it was quite what
 I wanted. (also is it possible to tell when it has (eg.) allocated a
 bounce buffer?)

 Basically, I need to get buffer(s) from guest userspace into the hosts
 address space. The buffers are virtually contiguous but likely
 physically discontiguous. They are allocated with malloc() and theres
 nothing I can do about that.

 The obvious but slow solution would be to copy all the buffers into nice
 virtio-based scatter/gather buffers and feed them to the host that way,
 however its not fast enough.

   
 Why is this slow?
 

 Because the buffers will all have to be copied. So far, switching from
 abusing an instruction to interrupt qemu to using virtio has incurred a
 roughly 5x slowdown. I'd guess much of this is down to the fact we have
 to switch to kernel-mode on the guest and back again for every single GL
 call...

 If I can establish some kind of stable guest_virt-phys-host_virt
 mapping, many of the problems will just 'go away'. a way to interrupt
 qemu from user-mode on the guest without involving the guest kernel
 would be quite awesome also (theres really nothing we want the kernel to
 actually /do/ here, it just adds overhead).
   

I guess what you really want is some shm region between host and guess
that you can use as ring buffer. Then you could run a timer on the host
side to flush it or have some sort of callback when you urgently need to
flush it manually.

The benefit here is that you can actually make use of multiple threads.
There's no need to intercept the guest at all just because it wants to
issue some GL operations.


Alex