Re: [Qemu-devel] Re: QEMU-KVM and video performance - Update

2011-02-17 Thread Gerhard Wiesinger

Hello,

Some update on this issue, archive: 
http://www.mail-archive.com/kvm@vger.kernel.org/msg32600.html


Seems to be that cirrus VGA is now ok (>1000MB/s up to 2000MB/s). But 
cirrus has only 320x200x256colors (Mode 13h) mode implemented in VESA 
BIOS.


VMWare and std VGA still have the performance issue.

I guess improvement is related to the following commit:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=0d14905b5eb8aa1c2e195e13478bb7c74e1776db
Especially i guess the change in hw/cirrus_vga.c.

Any idea how to fix:
1.) More VESA modes in cirrus VGA (is VESA emulation done by Seabios or by 
KVM cirrus BIOS?) 
2.) fix in VMWare and std VGA modes the performance, too


Versions are latest dev versions of KVM user part and Seabios from GIT.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 12 May 2010, Avi Kivity wrote:


On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:

On Mon, 10 May 2010, Avi Kivity wrote:


On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:




For 256 color more the first priority is to find out why direct mapping is 
not used.  I'd suggest tracing the code that makes this decision (in 
hw/*vga.c) and seeing if it's right or not.


I think this is because A000 is not initialized for KVM (see log below and 
logging patch attached).


Why isn't it initialized?

Did the guest configure things such as it is impossible to map it directly? 
Or does the configuration allow direct mapping and qemu incorrectly decides 
that it cannot direct map?


Best would be to print out all the configuration registers and interpret them 
according to the specification.





vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, 
len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff


Why does this happen?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-12 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> Can one switch to the old software vmm in VMWare?

Perhaps you can install a very old version of VMWare.
Maybe run it under KVM ;-)

> That was one of the reasons why I was looking for alternatives for 
> graphical DOS programs. Overall summary so far:
> 1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
> fast VGA
> 2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
> performance
> 3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
> performance
> 4.) Virtual PC: Problems with 286 DOS Extender
> 5.) Bochs: Works well, but very slow.

I would be interested in the 286 DOS Extender issue, as I'd like to
use some 286 programs in QEMU at some point.

There were some changes to KVM in the kernel recently.  Were those
needed to get the 286 apps working?

> Looks like that VMWare Server and QEMU with KVM maybe have the same 
> architectural problems going through the whole slow chain from Guest OS to 
> virtualization layer for VGA writes.

They do have a similar architecture.

the VGA write speed is a bit surprising, as it should be fast in
256-colour non-modeX modes for both.  But maybe there's something
we've missed that makes it architecturally slow.  It will be
interesting to see what you find :-)

Thanks,
-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-12 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Jamie Lokier wrote:
> 
> >Gerhard Wiesinger wrote:
> >>Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
> >>of QEMU even from KVM must be possible (e.g. memory and port accesses are
> >>done on nearly every virtual device) and therefore I'm ending in C code in
> >>the QEMU hw/*.c directory. Therefore also the VGA memory area should be
> >>able to be accessable from KVM but with the specialized and fast memory
> >>access of QEMU.  Am I missing something?
> >
> >What you're missing is that when KVM calls out to QEMU to handle
> >hw/*.c traps, that call is very slow.  It's because the hardware-VM
> >support is a bit slow when the trap happens, and then the the call
> >from KVM in the kernel up to QEMU is a bit slow again.  Then all the
> >way back.  It adds up to a lot, for every I/O operation.
> 
> Isn't that then a general problem of KVM virtualization (oder hardware 
> virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

Yes it is a general problem, but KVM emulates some time-critical
things in the kernel (like APIC and CPU instructions), so it's not too bad.

KVM is about 5x faster than TCG for most things, and slower for a few
things, so on balance it is usually faster.

The slow 256-colour mode writes sound like just a simple bug, though.
No need for complicated changes.

> >In 256-colour mode, KVM should be writing to the VGA memory at high
> >speed a lot like normal RAM, not trapping at the hardware-VM level,
> >and not calling up to the code in hw/*.c for every byte.
> 
> Yes, same picture to me: 256 color mode should be only a memory write (16 
> color mode is more difficult as pixel/byte mapping is not the same).
> But it looks like this isn't the case in this test scenario.
> 
> >You might double-check if your guest is using VGA "Mode X".  (See 
> >Wikipedia.)
> >
> >That was a way to accelerate VGA on real PCs, but it will be slow in
> >KVM for the same reasons as 16-colour mode.
> 
> Which way do you mean?

Look up Mode X on Wikipedia if you're interested, but it isn't
relevant to the problem you've reported.  Mode X cannot be enabled
with a BIOS call; it's a VGA hardware programming trick.  It would not
be useful in a VM environment.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-11 Thread Avi Kivity

On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:

On Mon, 10 May 2010, Avi Kivity wrote:


On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:




For 256 color more the first priority is to find out why direct 
mapping is not used.  I'd suggest tracing the code that makes this 
decision (in hw/*vga.c) and seeing if it's right or not.


I think this is because A000 is not initialized for KVM (see log below 
and logging patch attached).


Why isn't it initialized?

Did the guest configure things such as it is impossible to map it 
directly?  Or does the configuration allow direct mapping and qemu 
incorrectly decides that it cannot direct map?


Best would be to print out all the configuration registers and interpret 
them according to the specification.





vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, 
len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff


Why does this happen?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-11 Thread Gerhard Wiesinger

On Mon, 10 May 2010, Avi Kivity wrote:


On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:




For 256 color more the first priority is to find out why direct mapping is 
not used.  I'd suggest tracing the code that makes this decision (in 
hw/*vga.c) and seeing if it's right or not.


I think this is because A000 is not initialized for KVM (see log below 
and logging patch attached).


Switches tried without success:
-vga std (log is from this one)
-vga cirrus
-vga vmware

I tried also to force the mapping (see patch where it is commented out) 
but some errors occour (see 2nd log below) and performance is still 
low at ~1MB/s:

s->lfb_vram_mapped = 1;

On testing the following line occour:
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
...

Any ideas? Can you reproduce it?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF000, len=0x0100
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF000, len=0x0100
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF000, len=0x0100
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
--
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE000, len=0x0100
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A, len=0x8000
BUG: kvm_dirty_pages_log_change: invalid parameters 
000a-000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x8000
BUG: kvm_dirty_pages_log_chan

Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-10 Thread Avi Kivity

On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:



Please run kvm_stat and report output for both tests to confirm.



See below. 2nd column is per second statistic when running the test.


efer_reload  0   0
exits 18470836  554582
fpu_reload 21478333469
halt_exits2083   0
halt_wakeup   2047   0
host_state_reload  21481863470
hypercalls   0   0
insn_emulation 7688203  554244

This indicates that kvm is emulating instead of direct mapping.  
That's probably a bug.  If you fix it, performance will increase 
dramatically.



Where can I start here?
Any ideas how to?

One of my ideas: Move hw/vga.c functions
vga_mem_readb
vga_mem_readw
vga_mem_readl
vga_mem_writeb
vga_mem_writew
vga_mem_writel
to KVM to avoid switching from KVM to QEMU (I can write C code even 
kernel but I'm not comfortable with KVM). Howto?


That is already done (generically), it's called coalesced mmio.  You 
only have 3470 qemu exits/sec compared to 554244 kvm writes/sec.


Switching between tcg and kvm is hard, but not needed.  For 256 color 
modes, direct map is possible and should yield good performance.  
Bank switching can be improved perhaps 3x, but will never be fast.


Where can I start for KVM performance for the bank switching (256 
color mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)

Any ideas how to improve (architecture for the change)?


For 256 color more the first priority is to find out why direct mapping 
is not used.  I'd suggest tracing the code that makes this decision (in 
hw/*vga.c) and seeing if it's right or not.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-09 Thread Gerhard Wiesinger

On Thu, 22 Apr 2010, Avi Kivity wrote:


On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:

I don't think changing VGA window is a problem because there are
500.000-1Mio changes/s possible.


1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?



To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.


That indicates a fault every write (assuming 8-16 bit writes).  If you're 
using 256 color vga and not switching banks, this indicates a bug.




Yes, 256 color VGA and no bank switches involved.

Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio function 
calls per second.


That's suprisingly fast. I'd expect 100-200k/sec.



Sorry, I mixed up the numbers:
1.) QEMU-KVM: ~111k
2.) QEMU only: 500k-1Mio


Please run kvm_stat and report output for both tests to confirm.



See below. 2nd column is per second statistic when running the test.


efer_reload  0   0
exits 18470836  554582
fpu_reload 21478333469
halt_exits2083   0
halt_wakeup   2047   0
host_state_reload  21481863470
hypercalls   0   0
insn_emulation 7688203  554244

This indicates that kvm is emulating instead of direct mapping.  That's 
probably a bug.  If you fix it, performance will increase dramatically.


Where can I start here?
Any ideas how to?

One of my ideas: Move hw/vga.c functions
vga_mem_readb
vga_mem_readw
vga_mem_readl
vga_mem_writeb
vga_mem_writew
vga_mem_writel
to KVM to avoid switching from KVM to QEMU (I can write C code even 
kernel but I'm not comfortable with KVM). Howto?



To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.


First should be doable easily, second is borderline.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
through KVM (because it is much more faster). Also 16 color modes should 
be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non 
legacy OS).


Is that easily possible?


No.  Code can run in either qemu or kvm, not both.  You can switch between 
them based on access statistics (early versions of qemu-kvm did that, 
without the statistics part), but this isn't trivial.


Hmmm. Ok, 2 different opinions about the memory write performance:
Easily or not possible?


Switching between tcg and kvm is hard, but not needed.  For 256 color modes, 
direct map is possible and should yield good performance.  Bank switching can 
be improved perhaps 3x, but will never be fast.


Where can I start for KVM performance for the bank switching (256 color 
mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)

Any ideas how to improve (architecture for the change)?

Thnx and sorry for long delay, was busy.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-22 Thread Avi Kivity

On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:

I don't think changing VGA window is a problem because there are
500.000-1Mio changes/s possible.


1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?



To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.


That indicates a fault every write (assuming 8-16 bit writes).  If 
you're using 256 color vga and not switching banks, this indicates a 
bug.




Yes, 256 color VGA and no bank switches involved.

Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio 
function calls per second.


That's suprisingly fast. I'd expect 100-200k/sec.



Sorry, I mixed up the numbers:
1.) QEMU-KVM: ~111k
2.) QEMU only: 500k-1Mio


Please run kvm_stat and report output for both tests to confirm.



See below. 2nd column is per second statistic when running the test.


 efer_reload  0   0
 exits 18470836  554582
 fpu_reload 21478333469
 halt_exits2083   0
 halt_wakeup   2047   0
 host_state_reload  21481863470
 hypercalls   0   0
 insn_emulation 7688203  554244

This indicates that kvm is emulating instead of direct mapping.  That's 
probably a bug.  If you fix it, performance will increase dramatically.






To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.


First should be doable easily, second is borderline.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU 
and not through KVM (because it is much more faster). Also 16 color 
modes should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non 
legacy OS).


Is that easily possible?


No.  Code can run in either qemu or kvm, not both.  You can switch 
between them based on access statistics (early versions of qemu-kvm 
did that, without the statistics part), but this isn't trivial.


Hmmm. Ok, 2 different opinions about the memory write performance:
Easily or not possible?


Switching between tcg and kvm is hard, but not needed.  For 256 color 
modes, direct map is possible and should yield good performance.  Bank 
switching can be improved perhaps 3x, but will never be fast.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-22 Thread Avi Kivity

On 04/22/2010 08:37 AM, Gerhard Wiesinger wrote:

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:


Can you explain which code files/functions of KVM is involved in 
handling VGA memory window and page switching through the port write 
to the VGA window register (or is that part handled through QEMU), 
so a little bit architecture explaination would be nice?


qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
vbe_ioport_write_data() and vga_ioport_write().




Yes, I was already in that code part and that are very simple 
functions as already explained and are therefore in QEMU only very 
fast. But I ment: How is the calling path from KVM guest OS to 
hw/vga.c for memory and I/O accesses, and which parts are done in 
hardware directly (to understand the speed gap and maybe to find a 
solution)?


The speed gap is mostly due to hardware constraints (it takes ~2000 
cycles for an exit from guest mode, plus we need to switch a few msrs to 
get to userspace).


See vmx_vcpu_run(), the vmresume instruction is where an exit starts.





BTW: In which KVM code parts is decided where "direct code" or an 
"emulated device code" is used?




Same place.  Look for calls to cpu_register_physical_memory().  If 
the last argument was obtained by a call to cpu_register_io_memory(), 
then all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() 
and writes will not trap (except the first write to a page in a 30ms 
window, used to note that the page is dirty and needs redrawing).


Ok, that finally ends in:
cpu_register_physical_memory_offset()
...
// 0.12.3
if (kvm_enabled())
kvm_set_phys_mem(start_addr, size, phys_offset);
// KVM
cpu_notify_set_memory(start_addr, size, phys_offset);
...

I/O is always done through:
cpu_register_io_memory => cpu_register_io_memory_fixed
cpu_register_io_memory_fixed()
...
No call to KVM?


kvm_set_phys_mem() is a call to kvm.


...

Where is the trap from KVM to QEMU?


See kvm_cpu_exec().

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Jamie Lokier wrote:


Gerhard Wiesinger wrote:

Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
of QEMU even from KVM must be possible (e.g. memory and port accesses are
done on nearly every virtual device) and therefore I'm ending in C code in
the QEMU hw/*.c directory. Therefore also the VGA memory area should be
able to be accessable from KVM but with the specialized and fast memory
access of QEMU.  Am I missing something?


What you're missing is that when KVM calls out to QEMU to handle
hw/*.c traps, that call is very slow.  It's because the hardware-VM
support is a bit slow when the trap happens, and then the the call
from KVM in the kernel up to QEMU is a bit slow again.  Then all the
way back.  It adds up to a lot, for every I/O operation.


Isn't that then a general problem of KVM virtualization (oder hardware 
virtualization) in general? Is this CPU dependend (AMD vs. Intel)?



When QEMU does the same thing, it's fast because it's inside the same
process; it's just a function call.


Yes, that's clear to me.


That's why the most often called devices are emulated separately in
KVM's kernel code, things like the interrupt controller, timer chip
etc.  It's also why individual instructions that need help are
emulated in KVM's kernel code, instead of passing control up to QEMU
just for one instruction.



BTW: Still not clear why performance is low with KVM since there are
no window changes in the testcase involved which could cause a (slow) page
fault.


It sounds like a bug.  Avi gave suggests about what to look for.
If it fixes my OS install speeds too, I'll be very happy :-)



See other post for details.


In 256-colour mode, KVM should be writing to the VGA memory at high
speed a lot like normal RAM, not trapping at the hardware-VM level,
and not calling up to the code in hw/*.c for every byte.




Yes, same picture to me: 256 color mode should be only a memory write (16 
color mode is more difficult as pixel/byte mapping is not the same).

But it looks like this isn't the case in this test scenario.


You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)



Code:
inregs.x.ax = 0x4F02;
inregs.x.bx = 0xC000 | 0x101; // bh=bit 15=0 (clear), bit14=0 (windowed)
int86x(INT_SCREEN, &inregs, &outregs, &outsregs);   /* Call INT 
10h */

I can post the whole code/exes if you want (I already planned to post my 
whole tools, but I have to do some cleanups until I wanted to publish 
whole package) .



That was a way to accelerate VGA on real PCs, but it will be slow in
KVM for the same reasons as 16-colour mode.


Which way do you mean?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:

I don't think changing VGA window is a problem because there are
500.000-1Mio changes/s possible.


1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?



To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.


That indicates a fault every write (assuming 8-16 bit writes).  If you're 
using 256 color vga and not switching banks, this indicates a bug.




Yes, 256 color VGA and no bank switches involved.

Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio function 
calls per second.


That's suprisingly fast. I'd expect 100-200k/sec.



Sorry, I mixed up the numbers:
1.) QEMU-KVM: ~111k
2.) QEMU only: 500k-1Mio


Please run kvm_stat and report output for both tests to confirm.



See below. 2nd column is per second statistic when running the test.



To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.


First should be doable easily, second is borderline.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
through KVM (because it is much more faster). Also 16 color modes should be 
fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM (there 
is a specialized driver loaded for that PCI device in the non legacy OS).


Is that easily possible?


No.  Code can run in either qemu or kvm, not both.  You can switch between 
them based on access statistics (early versions of qemu-kvm did that, without 
the statistics part), but this isn't trivial.


Hmmm. Ok, 2 different opinions about the memory write performance:
Easily or not possible?

Thnx for you help so far.

Ciao,
Gerhard

--
http://www.wiesinger.com/

-
kvm_stat
Please mount debugfs ('mount -t debugfs debugfs /sys/kernel/debug')
and ensure the kvm modules are loaded

lsmod|grep -i kvm
kvm_amd38276  0
kvm   162288  1 kvm_amd

mount|grep -i debug

=>
mount -t debugfs debugfs /sys/kernel/debug

int10perf: INT10h Performance tests:
kvm statistics
 efer_reload  0   0
 exits 37648629  456206
 fpu_reload 8512535  455983
 halt_exits2084   0
 halt_wakeup   2047   0
 host_state_reload  8513213  456011
 hypercalls   0   0
 insn_emulation29182065   0
 insn_emulation_fail  0   0
 invlpg   0   0
 io_exits   8386082  455975
 irq_exits51713 214
 irq_injections   21797  36
 irq_window   0   0
 largepages   0   0
 mmio_exits  242781   0
 mmu_cache_miss 150   0
 mmu_flooded  0   0
 mmu_pde_zapped   0   0
 mmu_pte_updated  0   0
 mmu_pte_write 8192   0
 mmu_recycled 0   0
 mmu_shadow_zapped  151   0
 mmu_unsync   0   0
 mmu_unsync_global0   0
 nmi_injections   0   0
 nmi_window   0   0
 pf_fixed 16935   0
 pf_guest 0   0
 remote_tlb_flush 2   0
 request_irq  0   0
 request_nmi  0   0
 signal_exits 1   0
 tlb_flush 2251   0

Running VGA memory tests in same VGA page in Video Mode VESA 101h:
kvm statistics

 efer_reload  0   0
 exits 18470836  554582
 fpu_reload 21478333469
 halt_exits2083   0
 halt_wakeup   2047   0
 host_state_reload  21481863470
 hypercalls   0   0
 insn_emulation 7688203  554244
 insn_emulation_fail  0   0
 invlpg   0   0
 io_exits  10701583  18
 irq_exits50781 321
 irq_injections   25251  18
 irq_window   0   0
 largepages   0   0
 mmio_exits  1628473241
 mmu_cache_miss 154   0
 mmu_flooded  0   0
 mmu_pde_zapped   0   0
 mmu_pte_updated  0   0
 mmu_pte_write 8192   0
 mmu_recycled 0   0
 mmu_shadow_zapped  155   0
 mmu_unsync   0   0
 mmu_unsync_global0   0
 nmi_injections   0   0
 nmi_window   0   0
 pf_fixed 16936   0
 pf_guest 0   0
 remote_tlb_flush 5   0
 request_irq  0   0
 request_nmi  0

Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:39 PM, Jamie Lokier wrote:

Avi Kivity wrote:


Writes to vga in 16-color mode don't change set a memory location to a
value, instead they change multiple memory locations.


While code is just writing to the VGA memory, not reading(*) and not
touching the VGA I/O register that control the write latches, is it
possible in principle to swizzle the format around in memory to make
regular writes work?



Not in software.  We can map pages, not cross address lines.


(*) Reading should be ok for some settings of the write latches, I
think.

I wonder if guests of interest behave like that.



Guests that use 16 color vga are usually of little interest.



I tested 256 color modes.


Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?


Yes.


$ date
Wed Apr 21 19:37:38 2015
$ modprobe ktcg



That's why the vmware software vmm was faster than the hardware vmm for the 
initial iterations of vmx.




On VMWare Server 2.0: same picture:
Calling INT10h interrupts is fast, Writing to VGA Memory is also very slow 
(1.0MB/s). Can one switch to the old software vmm in VMWare?


That was one of the reasons why I was looking for alternatives for 
graphical DOS programs. Overall summary so far:
1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
fast VGA
2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
performance
3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
performance

4.) Virtual PC: Problems with 286 DOS Extender
5.) Bochs: Works well, but very slow.

Looks like that VMWare Server and QEMU with KVM maybe have the same 
architectural problems going through the whole slow chain from Guest OS to 
virtualization layer for VGA writes.


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:


Can you explain which code files/functions of KVM is involved in handling 
VGA memory window and page switching through the port write to the VGA 
window register (or is that part handled through QEMU), so a little bit 
architecture explaination would be nice?


qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
vbe_ioport_write_data() and vga_ioport_write().




Yes, I was already in that code part and that are very simple functions as 
already explained and are therefore in QEMU only very fast. But I ment: 
How is the calling path from KVM guest OS to hw/vga.c for memory and I/O 
accesses, and which parts are done in hardware directly (to understand the 
speed gap and maybe to find a solution)?




BTW: In which KVM code parts is decided where "direct code" or an "emulated 
device code" is used?




Same place.  Look for calls to cpu_register_physical_memory().  If the last 
argument was obtained by a call to cpu_register_io_memory(), then all writes 
trap.  Otherwise, it was obtained by qemu_ram_alloc() and writes will not 
trap (except the first write to a page in a 30ms window, used to note that 
the page is dirty and needs redrawing).


Ok, that finally ends in:
cpu_register_physical_memory_offset()
...
// 0.12.3
if (kvm_enabled())
kvm_set_phys_mem(start_addr, size, phys_offset);
// KVM
cpu_notify_set_memory(start_addr, size, phys_offset);
...

I/O is always done through:
cpu_register_io_memory => cpu_register_io_memory_fixed
cpu_register_io_memory_fixed()
...
No call to KVM?
...

Where is the trap from KVM to QEMU?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
> of QEMU even from KVM must be possible (e.g. memory and port accesses are 
> done on nearly every virtual device) and therefore I'm ending in C code in
> the QEMU hw/*.c directory. Therefore also the VGA memory area should be 
> able to be accessable from KVM but with the specialized and fast memory
> access of QEMU.  Am I missing something?

What you're missing is that when KVM calls out to QEMU to handle
hw/*.c traps, that call is very slow.  It's because the hardware-VM
support is a bit slow when the trap happens, and then the the call
from KVM in the kernel up to QEMU is a bit slow again.  Then all the
way back.  It adds up to a lot, for every I/O operation.

When QEMU does the same thing, it's fast because it's inside the same
process; it's just a function call.

That's why the most often called devices are emulated separately in
KVM's kernel code, things like the interrupt controller, timer chip
etc.  It's also why individual instructions that need help are
emulated in KVM's kernel code, instead of passing control up to QEMU
just for one instruction.

> BTW: Still not clear why performance is low with KVM since there are 
> no window changes in the testcase involved which could cause a (slow) page 
> fault.

It sounds like a bug.  Avi gave suggests about what to look for.
If it fixes my OS install speeds too, I'll be very happy :-)

In 256-colour mode, KVM should be writing to the VGA memory at high
speed a lot like normal RAM, not trapping at the hardware-VM level,
and not calling up to the code in hw/*.c for every byte.

You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)

That was a way to accelerate VGA on real PCs, but it will be slow in
KVM for the same reasons as 16-colour mode.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
> On 04/21/2010 09:39 PM, Jamie Lokier wrote:
> >Avi Kivity wrote:
> >   
> >>Writes to vga in 16-color mode don't change set a memory location to a
> >>value, instead they change multiple memory locations.
> >> 
> >While code is just writing to the VGA memory, not reading(*) and not
> >touching the VGA I/O register that control the write latches, is it
> >possible in principle to swizzle the format around in memory to make
> >regular writes work?
> >   
> 
> Not in software.  We can map pages, not cross address lines.

Hence "swizzle".  You rearrange the data inside the page for the
crossed address lines, and undo the swizzle later on demand.  That
doesn't work for other VGA magic though.

> Guests that use 16 color vga are usually of little interest.

Fair enough.  We can move on :-)

It's been said that the super-slow VGA writes triggering this thread
are in 256-colour mode, so there's a different problem.  That should
be fast, shouldn't it?

I vaguely recall extremely slow OS installs I've seen in KVM, which
were fast in QEMU (and fast in KVM after installing), were using text
mode.  Possibly it was Windows 2000, or Windows Server 2003.  Text
mode should be fast too, shouldn't it?  I suppose it's possible that
it just looked like text mode and was really 16-colour mode.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Avi Kivity

On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:

I don't think changing VGA window is a problem because there are
500.000-1Mio changes/s possible.


1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?



To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.


That indicates a fault every write (assuming 8-16 bit writes).  If 
you're using 256 color vga and not switching banks, this indicates a bug.


Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio 
function calls per second.


That's suprisingly fast. I'd expect 100-200k/sec.

Please run kvm_stat and report output for both tests to confirm.



To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.


First should be doable easily, second is borderline.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU and 
not through KVM (because it is much more faster). Also 16 color modes 
should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non 
legacy OS).


Is that easily possible?


No.  Code can run in either qemu or kvm, not both.  You can switch 
between them based on access statistics (early versions of qemu-kvm did 
that, without the statistics part), but this isn't trivial.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Avi Kivity

On 04/21/2010 09:39 PM, Jamie Lokier wrote:

Avi Kivity wrote:
   

Writes to vga in 16-color mode don't change set a memory location to a
value, instead they change multiple memory locations.
 

While code is just writing to the VGA memory, not reading(*) and not
touching the VGA I/O register that control the write latches, is it
possible in principle to swizzle the format around in memory to make
regular writes work?
   


Not in software.  We can map pages, not cross address lines.


(*) Reading should be ok for some settings of the write latches, I
think.

I wonder if guests of interest behave like that.
   


Guests that use 16 color vga are usually of little interest.


Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?
   

Yes.
 

$ date
Wed Apr 21 19:37:38 2015
$ modprobe ktcg
   


That's why the vmware software vmm was faster than the hardware vmm for 
the initial iterations of vmx.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Avi Kivity

On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:


Can you explain which code files/functions of KVM is involved in 
handling VGA memory window and page switching through the port write 
to the VGA window register (or is that part handled through QEMU), so 
a little bit architecture explaination would be nice?


qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
vbe_ioport_write_data() and vga_ioport_write().




BTW: In which KVM code parts is decided where "direct code" or an 
"emulated device code" is used?




Same place.  Look for calls to cpu_register_physical_memory().  If the 
last argument was obtained by a call to cpu_register_io_memory(), then 
all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() and 
writes will not trap (except the first write to a page in a 30ms window, 
used to note that the page is dirty and needs redrawing).



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Jamie Lokier wrote:


Gerhard Wiesinger wrote:

Would it be possible to handle these writes through QEMU directly
(without
KVM), because performance is there very well (looking at the code there
is some pointer arithmetic and some memory write done)?


I've noticed extremely slow VGA performance too, when installing OSes.
It makes the difference between installing in a few minutes, and
installing taking hours - just because of the slow VGA.

So generally I use qemu for installing old versions of Windows, then
change to KVM to run them after installing.

Switching between KVM and qemu automatically based on guest code
behaviour, and making both memory models and device models compatible
at run time, is a difficult thing.  I guess it's not worth the
difficulty just to speed up VGA.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU
and not through KVM (because it is much more faster). Also 16 color modes
should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM
(there is a specialized driver loaded for that PCI device in the non
legacy OS).

Is that easily possible?


No it isn't.  Distingushing addresses is trivial.  You've ignored the
hard part, which is switching between different virtualisation
architectures...


Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
of QEMU even from KVM must be possible (e.g. memory and port accesses are 
done on nearly every virtual device) and therefore I'm ending in C code in

the QEMU hw/*.c directory. Therefore also the VGA memory area should be able
to be accessable from KVM but with the specialized and fast memory access of 
QEMU.
Am I missing something?

BTW: Still not clear why performance is low with KVM since there are 
no window changes in the testcase involved which could cause a (slow) page 
fault.


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> >>Would it be possible to handle these writes through QEMU directly 
> >>(without
> >>KVM), because performance is there very well (looking at the code there
> >>is some pointer arithmetic and some memory write done)?
> >
> >I've noticed extremely slow VGA performance too, when installing OSes.
> >It makes the difference between installing in a few minutes, and
> >installing taking hours - just because of the slow VGA.
> >
> >So generally I use qemu for installing old versions of Windows, then
> >change to KVM to run them after installing.
> >
> >Switching between KVM and qemu automatically based on guest code
> >behaviour, and making both memory models and device models compatible
> >at run time, is a difficult thing.  I guess it's not worth the
> >difficulty just to speed up VGA.
> 
> I think this is very easy to distingish:
> 1.) VGA Segment A000 is legacy and should be handled through QEMU 
> and not through KVM (because it is much more faster). Also 16 color modes 
> should be fast enough there.
> 2.) All other flat PCI memory accesses should be handled through KVM 
> (there is a specialized driver loaded for that PCI device in the non 
> legacy OS).
> 
> Is that easily possible?

No it isn't.  Distingushing addresses is trivial.  You've ignored the
hard part, which is switching between different virtualisation
architectures...

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Jamie Lokier wrote:


Gerhard Wiesinger wrote:

I'm using VESA mode 0x101 (640x480 256 colors), but performance is
there very low (~1MB/s). Test is also WITHOUT any vga window change, so
there isn't any page switching overhead involved in this test case.


Any ideas for improvement?


Currently when the physical memory map changes (which is what happens
when the vga window is updated), kvm drops the entire shadow cache.  It's
possible to do this only for vga memory, but not easy.


I don't think changing VGA window is a problem because there are
500.000-1Mio changes/s possible.


1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?


To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.
Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio function 
calls per second.


To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.


Would it be possible to handle these writes through QEMU directly (without
KVM), because performance is there very well (looking at the code there
is some pointer arithmetic and some memory write done)?


I've noticed extremely slow VGA performance too, when installing OSes.
It makes the difference between installing in a few minutes, and
installing taking hours - just because of the slow VGA.

So generally I use qemu for installing old versions of Windows, then
change to KVM to run them after installing.

Switching between KVM and qemu automatically based on guest code
behaviour, and making both memory models and device models compatible
at run time, is a difficult thing.  I guess it's not worth the
difficulty just to speed up VGA.


I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU 
and not through KVM (because it is much more faster). Also 16 color modes 
should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non legacy OS).


Is that easily possible?

Thnx.

Ciao,
Gerhard
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
> Writes to vga in 16-color mode don't change set a memory location to a 
> value, instead they change multiple memory locations.

While code is just writing to the VGA memory, not reading(*) and not
touching the VGA I/O register that control the write latches, is it
possible in principle to swizzle the format around in memory to make
regular writes work?

(*) Reading should be ok for some settings of the write latches, I
think.

I wonder if guests of interest behave like that.

> >Is this a case where TCG would run significantly faster for code blocks
> >that have been detected to access the VGA memory?
> 
> Yes.

$ date
Wed Apr 21 19:37:38 2015
$ modprobe ktcg
;-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> I'm using VESA mode 0x101 (640x480 256 colors), but performance is 
> there very low (~1MB/s). Test is also WITHOUT any vga window change, so 
> there isn't any page switching overhead involved in this test case.
> 
> >>Any ideas for improvement?
> >
> >Currently when the physical memory map changes (which is what happens 
> >when the vga window is updated), kvm drops the entire shadow cache.  It's 
> >possible to do this only for vga memory, but not easy.
> 
> I don't think changing VGA window is a problem because there are 
> 500.000-1Mio changes/s possible.

1MB/s, 500k-1M changes/s Coincidence?  Is it taking a page fault
or trap on every write?

> Would it be possible to handle these writes through QEMU directly (without 
> KVM), because performance is there very well (looking at the code there 
> is some pointer arithmetic and some memory write done)?

I've noticed extremely slow VGA performance too, when installing OSes.
It makes the difference between installing in a few minutes, and
installing taking hours - just because of the slow VGA.

So generally I use qemu for installing old versions of Windows, then
change to KVM to run them after installing.

Switching between KVM and qemu automatically based on guest code
behaviour, and making both memory models and device models compatible
at run time, is a difficult thing.  I guess it's not worth the
difficulty just to speed up VGA.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/21/2010 01:08 PM, Jamie Lokier wrote:

Avi Kivity wrote:


On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:


Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very
low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM
is slow)

I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA
index/value port) is about factor 5 slower, memory performance is
about factor 100 slower.

QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement
tolerance) and listed only once, QEMU KVM is much more slower (details
see below).

Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow?


16-color vga is slow because kvm cannot map the framebuffer to the guest
(writes are not interpreted as RAM writes).  256+-color vga should be
fast, except when switching the vga window.  Note it's only fast on
average, the first write into a page will be slow as kvm maps it in.


I don't understand: why is 256+-colour mappable and 16-colour not mappable?



Writes to vga in 16-color mode don't change set a memory location to a value, 
instead they change multiple memory locations.



Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?



Yes.


Currently when the physical memory map changes (which is what happens
when the vga window is updated), kvm drops the entire shadow cache.
It's possible to do this only for vga memory, but not easy.


If it's a page fault handled in the kernel, I would expect it to be
about as fast as those old VGA DOS-extender drivers which provide the
illusion of a single flat mapping, and bank switch on page faults -
multiplied by the speed of modern CPUs compared with then.  For many
graphics things those DOS-extender drivers worked perfectly well.

If it's a trap out to qemu on every vga window change, perhaps not
quite so well.



It's much more complicated.



Can you explain which code files/functions of KVM is involved in handling 
VGA memory window and page switching through the port write to the VGA 
window register (or is that part handled through QEMU), so a little bit 
architecture explaination would be nice?


BTW: In which KVM code parts is decided where "direct code" or an 
"emulated device code" is used?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Gerhard Wiesinger

On Wed, 21 Apr 2010, Avi Kivity wrote:


On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:

Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very low 
(QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is slow)


I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA index/value 
port) is about factor 5 slower, memory performance is about factor 100 
slower.


QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
tolerance) and listed only once, QEMU KVM is much more slower (details see 
below).


Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow? 


16-color vga is slow because kvm cannot map the framebuffer to the guest 
(writes are not interpreted as RAM writes).  256+-color vga should be fast, 
except when switching the vga window.  Note it's only fast on average, the 
first write into a page will be slow as kvm maps it in.


Which mode are you using?



I'm using VESA mode 0x101 (640x480 256 colors), but performance is 
there very low (~1MB/s). Test is also WITHOUT any vga window change, so 
there isn't any page switching overhead involved in this test case.



Any ideas for improvement?


Currently when the physical memory map changes (which is what happens when 
the vga window is updated), kvm drops the entire shadow cache.  It's possible 
to do this only for vga memory, but not easy.


I don't think changing VGA window is a problem because there are 
500.000-1Mio changes/s possible.


Would it be possible to handle these writes through QEMU directly (without 
KVM), because performance is there very well (looking at the code there 
is some pointer arithmetic and some memory write done)?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Avi Kivity

On 04/21/2010 01:08 PM, Jamie Lokier wrote:

Avi Kivity wrote:
   

On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
 

Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very
low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM
is slow)

I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA
index/value port) is about factor 5 slower, memory performance is
about factor 100 slower.

QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement
tolerance) and listed only once, QEMU KVM is much more slower (details
see below).

Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow?
   

16-color vga is slow because kvm cannot map the framebuffer to the guest
(writes are not interpreted as RAM writes).  256+-color vga should be
fast, except when switching the vga window.  Note it's only fast on
average, the first write into a page will be slow as kvm maps it in.
 

I don't understand: why is 256+-colour mappable and 16-colour not mappable?
   


Writes to vga in 16-color mode don't change set a memory location to a 
value, instead they change multiple memory locations.



Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?
   


Yes.


Currently when the physical memory map changes (which is what happens
when the vga window is updated), kvm drops the entire shadow cache.
It's possible to do this only for vga memory, but not easy.
 

If it's a page fault handled in the kernel, I would expect it to be
about as fast as those old VGA DOS-extender drivers which provide the
illusion of a single flat mapping, and bank switch on page faults -
multiplied by the speed of modern CPUs compared with then.  For many
graphics things those DOS-extender drivers worked perfectly well.

If it's a trap out to qemu on every vga window change, perhaps not
quite so well.
   


It's much more complicated.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
> >Hello,
> >
> >Finally I got QEMU-KVM to work but video performance under DOS is very 
> >low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM 
> >is slow)
> >
> >I'm measuring 2 performance critical video performance parameters:
> >1.) INT 10h, function AX=4F05h (set same window/set window/get window)
> >2.) Memory performance to segment page A000h
> >
> >So BIOS performance (which might be port performance to VGA 
> >index/value port) is about factor 5 slower, memory performance is 
> >about factor 100 slower.
> >
> >QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
> >tolerance) and listed only once, QEMU KVM is much more slower (details 
> >see below).
> >
> >Test programs can be provided, source code will be release soon.
> >
> >Any ideas why KVM is so slow? 
> 
> 16-color vga is slow because kvm cannot map the framebuffer to the guest 
> (writes are not interpreted as RAM writes).  256+-color vga should be 
> fast, except when switching the vga window.  Note it's only fast on 
> average, the first write into a page will be slow as kvm maps it in.

I don't understand: why is 256+-colour mappable and 16-colour not mappable?

Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?

> Which mode are you using?
> 
> >Any ideas for improvement?
> 
> Currently when the physical memory map changes (which is what happens 
> when the vga window is updated), kvm drops the entire shadow cache.  
> It's possible to do this only for vga memory, but not easy.

If it's a page fault handled in the kernel, I would expect it to be
about as fast as those old VGA DOS-extender drivers which provide the
illusion of a single flat mapping, and bank switch on page faults -
multiplied by the speed of modern CPUs compared with then.  For many
graphics things those DOS-extender drivers worked perfectly well.

If it's a trap out to qemu on every vga window change, perhaps not
quite so well.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html