[Qemu-devel] [PATCH 2/4] vnc: really call zlib if we want zlib

2010-05-06 Thread Corentin Chary
Signed-off-by: Corentin Chary 
---
 vnc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vnc.c b/vnc.c
index a91c3a3..d0c0d00 100644
--- a/vnc.c
+++ b/vnc.c
@@ -655,7 +655,7 @@ static void send_framebuffer_update(VncState *vs, int x, 
int y, int w, int h)
 {
 switch(vs->vnc_encoding) {
 case VNC_ENCODING_ZLIB:
-vnc_hextile_send_framebuffer_update(vs, x, y, w, h);
+vnc_zlib_send_framebuffer_update(vs, x, y, w, h);
 break;
 case VNC_ENCODING_HEXTILE:
 vnc_framebuffer_update(vs, x, y, w, h, VNC_ENCODING_HEXTILE);
-- 
1.7.0.2





[Qemu-devel] [PATCH 3/4] vnc: only use a single zlib stream

2010-05-06 Thread Corentin Chary
According to http://tigervnc.org/cgi-bin/rfbproto#zlib-encoding

Signed-off-by: Corentin Chary 
---
 vnc-encoding-zlib.c |   12 +---
 vnc.h   |2 +-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/vnc-encoding-zlib.c b/vnc-encoding-zlib.c
index 4a495ad..6a16a79 100644
--- a/vnc-encoding-zlib.c
+++ b/vnc-encoding-zlib.c
@@ -54,9 +54,9 @@ static void vnc_zlib_start(VncState *vs)
 vs->output = vs->zlib;
 }
 
-static int vnc_zlib_stop(VncState *vs, int stream_id)
+static int vnc_zlib_stop(VncState *vs)
 {
-z_streamp zstream = &vs->zlib_stream[stream_id];
+z_streamp zstream = &vs->zlib_stream;
 int previous_out;
 
 // switch back to normal output/zlib buffers
@@ -70,7 +70,7 @@ static int vnc_zlib_stop(VncState *vs, int stream_id)
 if (zstream->opaque != vs) {
 int err;
 
-VNC_DEBUG("VNC: initializing zlib stream %d\n", stream_id);
+VNC_DEBUG("VNC: initializing zlib stream\n");
 VNC_DEBUG("VNC: opaque = %p | vs = %p\n", zstream->opaque, vs);
 zstream->zalloc = zalloc;
 zstream->zfree = zfree;
@@ -122,7 +122,7 @@ void vnc_zlib_send_framebuffer_update(VncState *vs, int x, 
int y, int w, int h)
 // compress the stream
 vnc_zlib_start(vs);
 vnc_raw_send_framebuffer_update(vs, x, y, w, h);
-bytes_written = vnc_zlib_stop(vs, 0);
+bytes_written = vnc_zlib_stop(vs);
 
 if (bytes_written == -1)
 return;
@@ -136,7 +136,5 @@ void vnc_zlib_send_framebuffer_update(VncState *vs, int x, 
int y, int w, int h)
 
 void vnc_zlib_init(VncState *vs)
 {
-int i;
-for (i=0; i<(sizeof(vs->zlib_stream) / sizeof(z_stream)); i++)
-vs->zlib_stream[i].opaque = NULL;
+vs->zlib_stream.opaque = NULL;
 }
diff --git a/vnc.h b/vnc.h
index 1aa71b0..dfdb240 100644
--- a/vnc.h
+++ b/vnc.h
@@ -173,7 +173,7 @@ struct VncState
 /* Zlib */
 Buffer zlib;
 Buffer zlib_tmp;
-z_stream zlib_stream[4];
+z_stream zlib_stream;
 
 Notifier mouse_mode_notifier;
 
-- 
1.7.0.2





[Qemu-devel] Re: [patch uq/master 0/9] enable smp > 1 and related fixes

2010-05-06 Thread Avi Kivity

On 05/05/2010 09:24 PM, Anthony Liguori wrote:

On 05/04/2010 07:45 AM, Marcelo Tosatti wrote:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How does this work without an in-kernel apic (or does uq/master 
already have an in-kernel apic)?


An in-kernel apic isn't needed (qemu-kvm supports smp with userspace apic).

--
error compiling committee.c: too many arguments to function





[Qemu-devel] Re: [patch uq/master 0/9] enable smp > 1 and related fixes

2010-05-06 Thread Avi Kivity

On 05/04/2010 03:45 PM, Marcelo Tosatti wrote:

Applied all, thanks.

--
error compiling committee.c: too many arguments to function





[Qemu-devel] [PATCH] block: Fix protocol detection for Windows devices

2010-05-06 Thread Kevin Wolf
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.

Signed-off-by: Kevin Wolf 
---
 block.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 10a2b31..adb3e5d 100644
--- a/block.c
+++ b/block.c
@@ -287,16 +287,18 @@ static BlockDriver *find_protocol(const char *filename)
 char protocol[128];
 int len;
 const char *p;
+int is_drive;
 
 /* TODO Drivers without bdrv_file_open must be specified explicitly */
 
 #ifdef _WIN32
-if (is_windows_drive(filename) ||
-is_windows_drive_prefix(filename))
-return bdrv_find_format("file");
+is_drive = is_windows_drive(filename) ||
+is_windows_drive_prefix(filename);
+#else
+is_drive = 0;
 #endif
 p = strchr(filename, ':');
-if (!p) {
+if (!p || is_drive) {
 drv1 = find_hdev_driver(filename);
 if (!drv1) {
 drv1 = bdrv_find_format("file");
-- 
1.6.6.1





Re: [Qemu-devel] Call for 0.12.4

2010-05-06 Thread Kevin Wolf
Am 08.04.2010 20:37, schrieb Aurelien Jarno:
> Hi all,
> 
> A number of fixes have been accumulated in the stable-0.12 branch, and
> I think it's time to release a new stable version. I would like to see
> that happening for the end of next week (around the 18th of April).
> 
> If you want to see some patches included, please send a mail to the
> mailing list with the [STABLE] tag. I would clearly prefer patches that 
> are already in HEAD (if the patch can simply be cherry-picked, there is
> no need to send a patch, just the commit number), though other patches
> might be considered too.

Not sure if it's already too late for 0.12.4, but this one should be
applied to the stable-0.12 branch:

Commit: d6771bfa52744eb4f959198b4b0e59451463eebf
qemu-img: use the heap instead of the huge stack array for win32

On Windows, qemu-img convert is broken without this fix.

Kevin




[Qemu-devel] [PATCH] pckbd: support for commands 0xf0-0xff: Pulse output bit

2010-05-06 Thread Bernhard Kohl

I use a legacy guest OS which sends the command 0xfd to the keyboard
controller during initialization. To get rid of the message
"qemu: unsupported keyboard cmd=0x%02x\n" I added support for
the pulse output bit commands.

Signed-off-by: Bernhard Kohl 
---
 hw/pckbd.c |   23 ---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/pckbd.c b/hw/pckbd.c
index 7998aa6..554fbe4 100644
--- a/hw/pckbd.c
+++ b/hw/pckbd.c
@@ -50,7 +50,9 @@
 #define KBD_CCMD_WRITE_MOUSE0xD4/* Write the following byte to 
the mouse */

 #define KBD_CCMD_DISABLE_A200xDD/* HP vectra only ? */
 #define KBD_CCMD_ENABLE_A20 0xDF/* HP vectra only ? */
-#define KBD_CCMD_RESET0xFE
+#define KBD_CCMD_PULSE_BITS_3_0 0xF0/* Pulse bits 3-0 of the output 
port P2. */
+#define KBD_CCMD_RESET  0xFE/* Pulse bit 0 of the output 
port P2 = CPU reset. */
+#define KBD_CCMD_NO_OP  0xFF/* Pulse no bits of the output 
port P2. */


 /* Keyboard Commands */
 #define KBD_CMD_SET_LEDS0xED/* Set keyboard leds */
@@ -203,6 +205,21 @@ static void kbd_write_command(void *opaque, 
uint32_t addr, uint32_t val)

 #ifdef DEBUG_KBD
 printf("kbd: write cmd=0x%02x\n", val);
 #endif
+
+/* Bits 3-0 of the output port P2 of the keyboard controller may be 
pulsed
+ * low for approximately 6 micro seconds. Bits 3-0 of the 
KBD_CCMD_PULSE

+ * command specify the output port bits to be pulsed.
+ * 0: Bit should be pulsed. 1: Bit should not be modified.
+ * The only useful version of this command is pulsing bit 0,
+ * which does a CPU reset.
+ */
+if((val & KBD_CCMD_PULSE_BITS_3_0) == KBD_CCMD_PULSE_BITS_3_0) {
+if(!(val & 1))
+val = KBD_CCMD_RESET;
+else
+val = KBD_CCMD_NO_OP;
+}
+
 switch(val) {
 case KBD_CCMD_READ_MODE:
 kbd_queue(s, s->mode, 0);
@@ -265,8 +282,8 @@ static void kbd_write_command(void *opaque, uint32_t 
addr, uint32_t val)

 case KBD_CCMD_RESET:
 qemu_system_reset_request();
 break;
-case 0xff:
-/* ignore that - I don't know what is its use */
+case KBD_CCMD_NO_OP:
+/* ignore that */
 break;
 default:
 fprintf(stderr, "qemu: unsupported keyboard cmd=0x%02x\n", val);
--
1.6.6.1





[Qemu-devel] [PATCH] exec: optimize lduw_phys and stw_phys

2010-05-06 Thread Bernhard Kohl

Implementation of the optimized code for these two functions.

This is necessary for virtio which reads and writes VirtQueue index
fields using these functions. The assumption is that this are atomic
operations, which is not the case, if the memcpy() function which is
used in the non optimized code does single byte copying. This happens
for example with an older WindRiver glibc.

Signed-off-by: Bernhard Kohl 
---
 exec.c |   67 
+--

 1 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index 14d1fd7..fb40398 100644
--- a/exec.c
+++ b/exec.c
@@ -3739,12 +3739,36 @@ uint32_t ldub_phys(target_phys_addr_t addr)
 return val;
 }

-/* XXX: optimize */
+/* warning: addr must be aligned */
 uint32_t lduw_phys(target_phys_addr_t addr)
 {
-uint16_t val;
-cpu_physical_memory_read(addr, (uint8_t *)&val, 2);
-return tswap16(val);
+int io_index;
+uint8_t *ptr;
+uint32_t val;
+unsigned long pd;
+PhysPageDesc *p;
+
+p = phys_page_find(addr >> TARGET_PAGE_BITS);
+if (!p) {
+pd = IO_MEM_UNASSIGNED;
+} else {
+pd = p->phys_offset;
+}
+
+if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
+!(pd & IO_MEM_ROMD)) {
+/* I/O case */
+io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+if (p)
+addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
+val = io_mem_read[io_index][1](io_mem_opaque[io_index], addr);
+} else {
+/* RAM case */
+ptr = qemu_get_ram_ptr(pd & TARGET_PAGE_MASK) +
+(addr & ~TARGET_PAGE_MASK);
+val = lduw_p(ptr);
+}
+return val;
 }

 /* warning: addr must be aligned. The ram page is not masked as dirty
@@ -3861,11 +3885,40 @@ void stb_phys(target_phys_addr_t addr, uint32_t val)
 cpu_physical_memory_write(addr, &v, 1);
 }

-/* XXX: optimize */
+/* warning: addr must be aligned */
 void stw_phys(target_phys_addr_t addr, uint32_t val)
 {
-uint16_t v = tswap16(val);
-cpu_physical_memory_write(addr, (const uint8_t *)&v, 2);
+int io_index;
+uint8_t *ptr;
+unsigned long pd;
+PhysPageDesc *p;
+
+p = phys_page_find(addr >> TARGET_PAGE_BITS);
+if (!p) {
+pd = IO_MEM_UNASSIGNED;
+} else {
+pd = p->phys_offset;
+}
+
+if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+if (p)
+addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
+io_mem_write[io_index][1](io_mem_opaque[io_index], addr, val);
+} else {
+unsigned long addr1;
+addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+/* RAM case */
+ptr = qemu_get_ram_ptr(addr1);
+stw_p(ptr, val);
+if (!cpu_physical_memory_is_dirty(addr1)) {
+/* invalidate code */
+tb_invalidate_phys_page_range(addr1, addr1 + 2, 0);
+/* set dirty bit */
+phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+(0xff & ~CODE_DIRTY_FLAG);
+}
+}
 }

 /* XXX: optimize */
--
1.6.6.1





[Qemu-devel] [PATCH] pckbd: support for commands 0xf0-0xff: Pulse output bit

2010-05-06 Thread Bernhard Kohl

I use a legacy guest OS which sends the command 0xfd to the keyboard
controller during initialization. To get rid of the message
"qemu: unsupported keyboard cmd=0x%02x\n" I added support for
the pulse output bit commands.

Signed-off-by: Bernhard Kohl 
---
 hw/pckbd.c |   23 ---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/pckbd.c b/hw/pckbd.c
index 7998aa6..554fbe4 100644
--- a/hw/pckbd.c
+++ b/hw/pckbd.c
@@ -50,7 +50,9 @@
 #define KBD_CCMD_WRITE_MOUSE0xD4/* Write the following byte to 
the mouse */

 #define KBD_CCMD_DISABLE_A200xDD/* HP vectra only ? */
 #define KBD_CCMD_ENABLE_A20 0xDF/* HP vectra only ? */
-#define KBD_CCMD_RESET0xFE
+#define KBD_CCMD_PULSE_BITS_3_0 0xF0/* Pulse bits 3-0 of the output 
port P2. */
+#define KBD_CCMD_RESET  0xFE/* Pulse bit 0 of the output 
port P2 = CPU reset. */
+#define KBD_CCMD_NO_OP  0xFF/* Pulse no bits of the output 
port P2. */


 /* Keyboard Commands */
 #define KBD_CMD_SET_LEDS0xED/* Set keyboard leds */
@@ -203,6 +205,21 @@ static void kbd_write_command(void *opaque, 
uint32_t addr, uint32_t val)

 #ifdef DEBUG_KBD
 printf("kbd: write cmd=0x%02x\n", val);
 #endif
+
+/* Bits 3-0 of the output port P2 of the keyboard controller may be 
pulsed
+ * low for approximately 6 micro seconds. Bits 3-0 of the 
KBD_CCMD_PULSE

+ * command specify the output port bits to be pulsed.
+ * 0: Bit should be pulsed. 1: Bit should not be modified.
+ * The only useful version of this command is pulsing bit 0,
+ * which does a CPU reset.
+ */
+if((val & KBD_CCMD_PULSE_BITS_3_0) == KBD_CCMD_PULSE_BITS_3_0) {
+if(!(val & 1))
+val = KBD_CCMD_RESET;
+else
+val = KBD_CCMD_NO_OP;
+}
+
 switch(val) {
 case KBD_CCMD_READ_MODE:
 kbd_queue(s, s->mode, 0);
@@ -265,8 +282,8 @@ static void kbd_write_command(void *opaque, uint32_t 
addr, uint32_t val)

 case KBD_CCMD_RESET:
 qemu_system_reset_request();
 break;
-case 0xff:
-/* ignore that - I don't know what is its use */
+case KBD_CCMD_NO_OP:
+/* ignore that */
 break;
 default:
 fprintf(stderr, "qemu: unsupported keyboard cmd=0x%02x\n", val);
--
1.6.6.1





[Qemu-devel] Re: How to map PCI memory into the VM without trapping

2010-05-06 Thread Frank Berreth
Hi there,

this is just an update if you are interested in the outcome. I turns out
that my initial assumption that there would be page faults/trapping on the
memory pages was false. The reason the throughput is so low is because the
memory was mapped non-cached. The VGA driver and the ivshmem driver use
pci_ioremap_bar which will *always* map the PCI bar non-cached (even the
resourceX_wc).
Changing the driver(s) to use ioremap_cache or ioremap_wc speeds up things
quite a bit. I don't know if VGA framebuffer was always mapped this way --
it appears on a real system that usually graphics memory is mapped WC.
Mapping it UC would cause quite a performance degradation. This could be the
reason for the reported VGA performance drop in another email thread. IMHO,
since QEMU emulates VGA, this could be mapped WB.

Thanks,
Frank.

On Mon, May 3, 2010 at 2:07 PM, Frank Berreth  wrote:

> Hello,
>
> I am using a modified patch for the ivshmem device that was posted here
> recently and discovered some performance issues with it. Essentially I
> measure write performance into the PCI memory that is mapped into the VM as
> really slow (about 60mb/sec). I assume that is because there is trapping on
> each memory access (there are no handlers installed for handling read/write
> access to that memory though). The device shows up in the guest as a UIO
> device but I also tried accessing other PCI memory like the vga memory (16mb
> from vga-pci.c) and it has similar performance (through
> /sys/device/.../resource...).
>
> My question is how I can map that memory into the VM so it doesn't get
> trapped on in the normal use case. The mappings (registering PCI bar etc) is
> done as in the patch submitted recently).
>
> Thanks,
> Frank.
>
>


[Qemu-devel] Re: question on virtio

2010-05-06 Thread Rusty Russell
On Wed, 5 May 2010 08:39:47 pm Michael S. Tsirkin wrote:
> Hi!
> I see this in virtio_ring.c:
> 
> /* Put entry in available array (but don't update avail->idx *
>  until they do sync). */
> 
> Why is it done this way?
> It seems that updating the index straight away would be simpler, while
> this might allow the host to specilatively look up the buffer and handle
> it, without waiting for the kick.

I agree.  From my TODO:
what if we actually expose in ->add_buf?

I don't *think* anyone adds buffers without being ready for them to be used,
so changing this should be safe.

Want to give it a try and report back?

Thanks!
Rusty.




[Qemu-devel] [PATCH] A bit optimization for tlb_set_page() (resend)

2010-05-06 Thread Jun Koi
This patch avoids handling write watchpoints on read-only memory access.
It also breaks the searching loop for watchpoint once the setup for
handling watchpoint later is done.

Signed-off-by: Jun Koi 


diff --git a/exec.c b/exec.c
index 14d1fd7..6fd859f 100644
--- a/exec.c
+++ b/exec.c
@@ -2236,10 +2236,12 @@ void tlb_set_page(CPUState *env, target_ulong vaddr,
watchpoint trap routines.  */
 QTAILQ_FOREACH(wp, &env->watchpoints, entry) {
 if (vaddr == (wp->vaddr & TARGET_PAGE_MASK)) {
-iotlb = io_mem_watch + paddr;
-/* TODO: The memory case can be optimized by not trapping
-   reads of pages with a write breakpoint.  */
-address |= TLB_MMIO;
+/* Avoid trapping reads of pages with a write breakpoint. */
+if ((prot & PAGE_WRITE) || (wp->flags & BP_MEM_READ)) {
+iotlb = io_mem_watch + paddr;
+address |= TLB_MMIO;
+break;
+}
 }
 }




Re: [Qemu-devel] Qemu - samba share help

2010-05-06 Thread Sergei Steshenko
You need to export your share on host OS.

AFAIR in the guest OS the address to access SAMBA shares was 10.0.2.4.

Regards,
  Sergei.


--- On Wed, 5/5/10, Arpit Patel  wrote:

From: Arpit Patel 
Subject: [Qemu-devel] Qemu - samba share help
To: qemu-devel@nongnu.org
Date: Wednesday, May 5, 2010, 11:02 AM

Hi All,
I am trying to share files between host OS and guest OS, both are Ubuntu. 

Here is the command line that I am using to start guest OSqemu -kernel 
kernelimage -initrd initrd.img /dev/zero -append "cmdline" -smb /tmp

After which I tried to mount /tmp on guest OS, using IP address of host OS, 
something like mount //10.80.8.100/tmp /tmp, but fails saying "No such device".

Samba service is also running on Host OS.

Any ideas or any other way to share files between Host OS and Guest OS.

Thanks for Help.


Y







Re: [Qemu-devel] [PATCH 6/7] Fix zero-length write(2).

2010-05-06 Thread Aurelien Jarno
On Mon, Mar 29, 2010 at 10:54:42AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 

Thanks, applied.

> ---
>  exec.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 33854e1..d69194c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2461,6 +2461,9 @@ int page_check_range(target_ulong start, target_ulong 
> len, int flags)
>  assert(start < ((abi_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
>  #endif
>  
> +if (len == 0) {
> +return 0;
> +}
>  if (start + len - 1 < start) {
>  /* We've wrapped around.  */
>  return -1;
> -- 
> 1.6.6.1
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Gerd Hoffmann

  Hi,


First, I'd like to be able to load the vgabios via PCI ROM bar on all
pci vga cards (stdvga, vmware, soon qxl). The PCI ID in the bios has
to match the PCI ID of the card, so we'll need a bunch of vga bios
binaries, all identical except for the PCI ID. Or we need some kind of
binary patching. Otherwise seabios will not load them from the PCI ROM
bar.


vgabios could be build multiple times with PCIBIOS set and VENDOR_ID and
DEVICE_ID supplied from the Makefile like it's already done with
VGABIOS_DATE.


Yes, something like this I have in mind.

Works for stdvga and qxl.  vmware fails as it has the memory at pci 
region 1 (region 0 has ioports) and it seems vgabios isn't prepared to 
handle that ...


Anyone knows what the best place to send patches to is? 
vgabios-developers list @ savannah looks pretty much dead.  It has one 
(!) message in 2009 which is an unanswered question.


Or is upstream dead anyway and we should just go with 
http://git.qemu.org/vgabios.git/ ?


cheers,
  Gerd




[Qemu-devel] Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Michael S. Tsirkin
On Thu, May 06, 2010 at 12:01:34PM +0930, Rusty Russell wrote:
> On Thu, 6 May 2010 06:28:14 am Michael S. Tsirkin wrote:
> > Rusty,
> > this is a simplified form of a patch you posted in the past.
> > I have a vhost patch that, using this feature, shows external
> > to host bandwidth grow from 5 to 7 GB/s, by avoiding
> > an interrupt in the window after previous interrupt
> > was sent and before interrupts were disabled for the vq.
> > With vhost under some external to host loads I see
> > this window being hit about 30% sometimes.
> 
> Fascinating.  So you use this to guess if the guest is still processing?

Exactly.

> I haven't thought about it hard, but is that racy?

I thought about this really hard and I don't think it's
necessarily racy, as long as (pseudo code):
guest:
start:
disable interrupts
read(used)
write(last used)
enable interrupts
mb()
if read(used)
goto start

host:
write used
mb()
if (reached(read(last used), used))
interrupt

IOW, guest does read then write then read, host does write then read.

Now, the only way to miss an interrupt is if we read last used
value before it is written so we think guest is still processing.
But if that happens, this means that host has written used before
guest updated last used, and for this reason peek will see
used value and restart polling.

IOW, to miss an interrupt host must read a stale value.
For this host must read before guest write, then
host write already happened, so second read in
guest will see an updated value host has written.


Now, I also added an mb() in guest between read and write so
that last used index write can not get ahead of used index read.
It does feel good to have it there, but I can not say why
it's helpful. Works fine without it, but then these
subtle races might be hard to trigger. What do you think?


> Obviously happy to apply this when you finalize it.
> 
> Thanks!
> Rusty.




Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Rusty Russell
On Wed, 5 May 2010 03:33:43 pm Neil Brown wrote:
> On Wed, 5 May 2010 14:28:41 +0930
> Rusty Russell  wrote:
> 
> > On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
> > > Jens Axboe wrote:
> > > > On Tue, May 04 2010, Rusty Russell wrote:
> > > > > ISTR someone mentioning a desire for such an API years ago, so CC'ing 
> > > > > the
> > > > > usual I/O suspects...
> > > > 
> > > > It would be nice to have a more fuller API for this, but the reality is
> > > > that only the flush approach is really workable. Even just strict
> > > > ordering of requests could only be supported on SCSI, and even there the
> > > > kernel still lacks proper guarantees on error handling to prevent
> > > > reordering there.
> > > 
> > > There's a few I/O scheduling differences that might be useful:
> > > 
> > > 1. The I/O scheduler could freely move WRITEs before a FLUSH but not
> > >before a BARRIER.  That might be useful for time-critical WRITEs,
> > >and those issued by high I/O priority.
> > 
> > This is only because noone actually wants flushes or barriers, though
> > I/O people seem to only offer that.  We really want " must
> > occur before ".  That offers maximum choice to the I/O subsystem
> > and potentially to smart (virtual?) disks.
> > 
> > > 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
> > >only for data belonging to a particular file (e.g. fdatasync with
> > >no file size change, even on btrfs if O_DIRECT was used for the
> > >writes being committed).  That would entail tagging FLUSHes and
> > >WRITEs with a fs-specific identifier (such as inode number), opaque
> > >to the scheduler which only checks equality.
> > 
> > This is closer.  In userspace I'd be happy with a "all prior writes to this
> > struct file before all future writes".  Even if the original guarantees were
> > stronger (ie. inode basis).  We currently implement transactions using 4 
> > fsync
> > /msync pairs.
> > 
> > write_recovery_data(fd);
> > fsync(fd);
> > msync(mmap);
> > write_recovery_header(fd);
> > fsync(fd);
> > msync(mmap);
> > overwrite_with_new_data(fd);
> > fsync(fd);
> > msync(mmap);
> > remove_recovery_header(fd);
> > fsync(fd);
> > msync(mmap);
> 
> Seems over-zealous.
> If the recovery_header held a strong checksum of the recovery_data you would
> not need the first fsync, and as long as you have two places to write recovery
> data, you don't need the 3rd and 4th syncs.
> Just:
>   write_internally_checksummed_recovery_data_and_header_to_unused_log_space()
>   fsync / msync
>   overwrite_with_new_data()
> 
> To recovery you choose the most recent log_space and replay the content.
> That may be a redundant operation, but that is no loss.

I think you missed a checksum for the new data?  Otherwise we can't tell if
the new data is completely written.  But yes, I will steal this scheme for
TDB2, thanks!

In practice, it's the first sync which is glacial, the rest are pretty cheap.

> Also cannot see the point of msync if you have already performed an fsync,
> and if there is a point, I would expect you to call msync before
> fsync... Maybe there is some subtlety there that I am not aware of.

I assume it's this from the msync man page:

   msync()  flushes  changes  made  to the in-core copy of a file that was
   mapped into memory using mmap(2) back to disk.   Without  use  of  this
   call  there  is  no guarantee that changes are written back before mun‐
   map(2) is called. 

> > It's an implementation detail; barrier has less flexibility because it has
> > less information about what is required. I'm saying I want to give you as
> > much information as I can, even if you don't use it yet.
> 
> Only we know that approach doesn't work.
> People will learn that they don't need to give the extra information to still
> achieve the same result - just like they did with ext3 and fsync.
> Then when we improve the implementation to only provide the guarantees that
> you asked for, people will complain that they are getting empty files that
> they didn't expect.

I think that's an oversimplification: IIUC that occurred to people *not*
using fsync().  They weren't using it because it was too slow.  Providing
a primitive which is as fast or faster and more specific doesn't have the
same magnitude of social issues.

And we can't write userspace interfaces for idiots only.

> The abstraction I would like to see is a simple 'barrier' that contains no
> data and has a filesystem-wide effect.

I think you lack ambition ;)

Thinking about the single-file use case (eg. kvm guest or tdb), isn't that
suboptimal for md?  Since you have to hand your barrier to every device
whereas a file-wide primitive may theoretically only go to some.

Cheers,
Rusty.




[Qemu-devel] Re: [PATCH v7 4/6] char: Add qemu_chr_write_nb() for nonblocking writes

2010-05-06 Thread Gerd Hoffmann

I'm all for killing the blocking interface. Problem is that converting
over all users isn't exactly trivial and we have plenty of them. IMHO
it isn't realistic to do the switch with a single patch series.


If we're agreed we ought to kill the blocking interface, let's define a
new proper interface, rename all previous users to something ugly and
deprecated, and approach it that way.


__attribute__ ((deprecated)) ?

cheers,
  Gerd





Re: [Qemu-devel] [PATCH] vnc: set the right prefered encoding

2010-05-06 Thread Corentin Chary
On Tue, May 4, 2010 at 2:01 PM, Corentin Chary  wrote:
> From RFB specs: "The order of the encoding types given in this
> message is a hint by the client as to its preference (the first
> encoding specified being most preferred)"
>
> Signed-off-by: Corentin Chary 
> ---
>  vnc.c |   14 ++
>  1 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/vnc.c b/vnc.c
> index 5241a6a..2d05d8f 100644
> --- a/vnc.c
> +++ b/vnc.c
> @@ -1594,7 +1594,7 @@ static void set_encodings(VncState *vs, int32_t 
> *encodings, size_t n_encodings)
>
>     vnc_zlib_init(vs);
>     vs->features = 0;
> -    vs->vnc_encoding = 0;
> +    vs->vnc_encoding = -1;
>     vs->tight_compression = 9;
>     vs->tight_quality = 9;
>     vs->absolute = -1;
> @@ -1603,18 +1603,24 @@ static void set_encodings(VncState *vs, int32_t 
> *encodings, size_t n_encodings)
>         enc = encodings[i];
>         switch (enc) {
>         case VNC_ENCODING_RAW:
> -            vs->vnc_encoding = enc;
> +            if (vs->vnc_encoding != -1) {
> +                vs->vnc_encoding = enc;
> +            }

hum patch is broken, sending a fixed patch today, sorry

should be if (vs->vnc_encoding == -1)

-- 
Corentin Chary
http://xf.iksaif.net




Re: [Qemu-devel] vgabios + qemu: issues and plans.

2010-05-06 Thread Gerd Hoffmann

First, I'd like to be able to load the vgabios via PCI ROM bar on all
pci vga cards (stdvga, vmware, soon qxl).  The PCI ID in the bios has to
match the PCI ID of the card, so we'll need a bunch of vga bios
binaries, all identical except for the PCI ID.  Or we need some kind of
binary patching.  Otherwise seabios will not load them from the PCI ROM
bar.


Make vga bios aware of all ids?


Would work for the code doing the lfb address lookup.  But I don't think 
the PCI option rom header allows multiple entries.



I also want to eliminate the magic number.
Boch vga bios is paravirtualized to get VBE address dynamically via a new
pci device with special pci device of device id 0x1234.


0x1234 happens to be the vendor id the qemu stdvga has.  Probably the 
same is true for bochs.  So this code actually looks up the emulated vga 
card ;)



Are you meaning to follow bochs way?


Yes.

cheers,
  Gerd





[Qemu-devel] [PATCH] block: Fix bdrv_commit

2010-05-06 Thread Kevin Wolf
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.

Signed-off-by: Kevin Wolf 
---
 block.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index adb3e5d..ffe82f8 100644
--- a/block.c
+++ b/block.c
@@ -701,12 +701,12 @@ int bdrv_commit(BlockDriverState *bs)
 bdrv_delete(bs->backing_hd);
 bs->backing_hd = NULL;
 bs_rw = bdrv_new("");
-rw_ret = bdrv_open(bs_rw, filename, open_flags | BDRV_O_RDWR, NULL);
+rw_ret = bdrv_open(bs_rw, filename, open_flags | BDRV_O_RDWR, drv);
 if (rw_ret < 0) {
 bdrv_delete(bs_rw);
 /* try to re-open read-only */
 bs_ro = bdrv_new("");
-ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, NULL);
+ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, drv);
 if (ret < 0) {
 bdrv_delete(bs_ro);
 /* drive not functional anymore */
@@ -758,7 +758,7 @@ ro_cleanup:
 bdrv_delete(bs->backing_hd);
 bs->backing_hd = NULL;
 bs_ro = bdrv_new("");
-ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, NULL);
+ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, drv);
 if (ret < 0) {
 bdrv_delete(bs_ro);
 /* drive not functional anymore */
-- 
1.6.6.1





[Qemu-devel] [PATCH RFC] qemu/virtio: use last used index published by guest

2010-05-06 Thread Michael S. Tsirkin
Reduces irq_window in guest by only injecting
an interrupt if guest has handled all buffers we
used so far.

Signed-off-by: Michael S. Tsirkin 
---

This is the qemu part of the story.

 hw/vhost_net.c |6 ++
 hw/virtio.c|   15 +++
 hw/virtio.h|4 
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 2e292ee..d77c9b2 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -51,6 +51,9 @@ unsigned vhost_net_get_features(struct vhost_net *net, 
unsigned features)
 if (!(net->dev.features & (1 << VIRTIO_RING_F_INDIRECT_DESC))) {
 features &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (!(net->dev.features & (1 << VIRTIO_RING_F_PUBLISH_USED))) {
+features &= ~(1 << VIRTIO_RING_F_PUBLISH_USED);
+}
 features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
 return features;
 }
@@ -64,6 +67,9 @@ void vhost_net_ack_features(struct vhost_net *net, unsigned 
features)
 if (features & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
 net->dev.acked_features |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (features & (1 << VIRTIO_RING_F_PUBLISH_USED)) {
+net->dev.acked_features |= (1 << VIRTIO_RING_F_PUBLISH_USED);
+}
 }
 
 static int vhost_net_get_fd(VLANClientState *backend)
diff --git a/hw/virtio.c b/hw/virtio.c
index e7657ae..5d686f0 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -71,6 +71,7 @@ struct VirtQueue
 target_phys_addr_t pa;
 uint16_t last_avail_idx;
 int inuse;
+int num_notify;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 VirtIODevice *vdev;
@@ -139,6 +140,11 @@ static inline uint16_t vring_avail_ring(VirtQueue *vq, int 
i)
 return lduw_phys(pa);
 }
 
+static inline uint16_t vring_last_used_idx(VirtQueue *vq)
+{
+return vring_avail_ring(vq, vq->vring.num);
+}
+
 static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val)
 {
 target_phys_addr_t pa;
@@ -234,6 +240,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 wmb();
 vring_used_idx_increment(vq, count);
 vq->inuse -= count;
+vq->num_notify += count;
 }
 
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
@@ -603,6 +610,14 @@ void virtio_irq(VirtQueue *vq)
 
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
+/* Do not notify if guest did not yet see the last update. */
+if ((vdev->guest_features & (1 << VIRTIO_RING_F_PUBLISH_USED)) &&
+   vring_last_used_idx(vq) !=
+   (uint16_t)(vring_used_idx(vq) + vq->num_notify))
+   return;
+
+vq->num_notify = 0;
+
 /* Always notify when queue is empty (when feature acknowledge) */
 if ((vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT) &&
 (!(vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) ||
diff --git a/hw/virtio.h b/hw/virtio.h
index f885f1b..4bdfded 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -43,6 +43,8 @@
 #define VIRTIO_F_NOTIFY_ON_EMPTY24
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC 28
+/* The Guest publishes last-seen used index at the end of the avail ring. */
+#define VIRTIO_RING_F_PUBLISH_USED 29
 /* A guest should never accept this.  It implies negotiation is broken. */
 #define VIRTIO_F_BAD_FEATURE   30
 
@@ -189,6 +191,8 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev);
 void virtio_net_exit(VirtIODevice *vdev);
 
 #define DEFINE_VIRTIO_COMMON_FEATURES(_state, _field) \
+   DEFINE_PROP_BIT("publish_used", _state, _field, \
+   VIRTIO_RING_F_PUBLISH_USED, true), \
DEFINE_PROP_BIT("indirect_desc", _state, _field, \
VIRTIO_RING_F_INDIRECT_DESC, true)
 
-- 
1.7.1.12.g42b7f




Re: [Qemu-devel] Call for 0.12.4

2010-05-06 Thread Anthony Liguori

On 05/06/2010 08:09 AM, Kevin Wolf wrote:

Am 08.04.2010 20:37, schrieb Aurelien Jarno:
   

Hi all,

A number of fixes have been accumulated in the stable-0.12 branch, and
I think it's time to release a new stable version. I would like to see
that happening for the end of next week (around the 18th of April).

If you want to see some patches included, please send a mail to the
mailing list with the [STABLE] tag. I would clearly prefer patches that
are already in HEAD (if the patch can simply be cherry-picked, there is
no need to send a patch, just the commit number), though other patches
might be considered too.
 

Not sure if it's already too late for 0.12.4, but this one should be
applied to the stable-0.12 branch:
   


I've already tagged 0.12.4.

Regards,

Anthony Liguori


Commit: d6771bfa52744eb4f959198b4b0e59451463eebf
qemu-img: use the heap instead of the huge stack array for win32

On Windows, qemu-img convert is broken without this fix.

Kevin
   






[Qemu-devel] [PATCH 1/4] vnc: set the right prefered encoding

2010-05-06 Thread Corentin Chary
>From RFB specs: "The order of the encoding types given in this
message is a hint by the client as to its preference (the first
encoding specified being most preferred)"

Signed-off-by: Corentin Chary 
---
 vnc.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/vnc.c b/vnc.c
index b1a3fdb..a91c3a3 100644
--- a/vnc.c
+++ b/vnc.c
@@ -1587,6 +1587,13 @@ static void send_ext_audio_ack(VncState *vs)
 vnc_flush(vs);
 }
 
+static void set_encoding(VncState *vs, int encoding)
+{
+if (vs->vnc_encoding == -1) {
+vs->vnc_encoding = encoding;
+}
+}
+
 static void set_encodings(VncState *vs, int32_t *encodings, size_t n_encodings)
 {
 int i;
@@ -1594,7 +1601,7 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 
 vnc_zlib_init(vs);
 vs->features = 0;
-vs->vnc_encoding = 0;
+vs->vnc_encoding = -1;
 vs->tight_compression = 9;
 vs->tight_quality = 9;
 vs->absolute = -1;
@@ -1603,18 +1610,18 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 enc = encodings[i];
 switch (enc) {
 case VNC_ENCODING_RAW:
-vs->vnc_encoding = enc;
+set_encoding(vs, enc);
 break;
 case VNC_ENCODING_COPYRECT:
 vs->features |= VNC_FEATURE_COPYRECT_MASK;
 break;
 case VNC_ENCODING_HEXTILE:
 vs->features |= VNC_FEATURE_HEXTILE_MASK;
-vs->vnc_encoding = enc;
+set_encoding(vs, enc);
 break;
 case VNC_ENCODING_ZLIB:
 vs->features |= VNC_FEATURE_ZLIB_MASK;
-vs->vnc_encoding = enc;
+set_encoding(vs, enc);
 break;
 case VNC_ENCODING_DESKTOPRESIZE:
 vs->features |= VNC_FEATURE_RESIZE_MASK;
-- 
1.7.0.2





Re: [Qemu-devel] [PATCH] qemu: address todo comment in exec.c

2010-05-06 Thread Aurelien Jarno
On Tue, Apr 06, 2010 at 02:18:19PM +0300, Michael S. Tsirkin wrote:
> exec.c has a comment 'XXX: optimize' for lduw_phys/stw_phys,
> so let's do it, along the lines of stl_phys.
> 
> The reason to address 16 bit accesses specifically is that virtio relies
> on these accesses to be done atomically, using memset as we do now
> breaks this assumption, which is reported to cause qemu with kvm
> to read wrong index values under stress.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=525323

Thanks, applied.

> Signed-off-by: Michael S. Tsirkin 
> ---
>  exec.c |   67 +--
>  1 files changed, 60 insertions(+), 7 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 33854e1..262c255 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3788,12 +3788,36 @@ uint32_t ldub_phys(target_phys_addr_t addr)
>  return val;
>  }
>  
> -/* XXX: optimize */
> +/* warning: addr must be aligned */
>  uint32_t lduw_phys(target_phys_addr_t addr)
>  {
> -uint16_t val;
> -cpu_physical_memory_read(addr, (uint8_t *)&val, 2);
> -return tswap16(val);
> +int io_index;
> +uint8_t *ptr;
> +uint64_t val;
> +unsigned long pd;
> +PhysPageDesc *p;
> +
> +p = phys_page_find(addr >> TARGET_PAGE_BITS);
> +if (!p) {
> +pd = IO_MEM_UNASSIGNED;
> +} else {
> +pd = p->phys_offset;
> +}
> +
> +if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
> +!(pd & IO_MEM_ROMD)) {
> +/* I/O case */
> +io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
> +if (p)
> +addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
> +val = io_mem_read[io_index][1](io_mem_opaque[io_index], addr);
> +} else {
> +/* RAM case */
> +ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
> +(addr & ~TARGET_PAGE_MASK);
> +val = lduw_p(ptr);
> +}
> +return val;
>  }
>  
>  /* warning: addr must be aligned. The ram page is not masked as dirty
> @@ -3910,11 +3934,40 @@ void stb_phys(target_phys_addr_t addr, uint32_t val)
>  cpu_physical_memory_write(addr, &v, 1);
>  }
>  
> -/* XXX: optimize */
> +/* warning: addr must be aligned */
>  void stw_phys(target_phys_addr_t addr, uint32_t val)
>  {
> -uint16_t v = tswap16(val);
> -cpu_physical_memory_write(addr, (const uint8_t *)&v, 2);
> +int io_index;
> +uint8_t *ptr;
> +unsigned long pd;
> +PhysPageDesc *p;
> +
> +p = phys_page_find(addr >> TARGET_PAGE_BITS);
> +if (!p) {
> +pd = IO_MEM_UNASSIGNED;
> +} else {
> +pd = p->phys_offset;
> +}
> +
> +if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
> +io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
> +if (p)
> +addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
> +io_mem_write[io_index][1](io_mem_opaque[io_index], addr, val);
> +} else {
> +unsigned long addr1;
> +addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
> +/* RAM case */
> +ptr = phys_ram_base + addr1;
> +stw_p(ptr, val);
> +if (!cpu_physical_memory_is_dirty(addr1)) {
> +/* invalidate code */
> +tb_invalidate_phys_page_range(addr1, addr1 + 2, 0);
> +/* set dirty bit */
> +phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
> +(0xff & ~CODE_DIRTY_FLAG);
> +}
> +}
>  }
>  
>  /* XXX: optimize */
> -- 
> 1.7.0.2.280.gc6f05
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
> > Seems over-zealous.
> > If the recovery_header held a strong checksum of the recovery_data you would
> > not need the first fsync, and as long as you have two places to write 
> > recovery
> > data, you don't need the 3rd and 4th syncs.
> > Just:
> >   
> > write_internally_checksummed_recovery_data_and_header_to_unused_log_space()
> >   fsync / msync
> >   overwrite_with_new_data()
> > 
> > To recovery you choose the most recent log_space and replay the content.
> > That may be a redundant operation, but that is no loss.
> 
> I think you missed a checksum for the new data?  Otherwise we can't tell if
> the new data is completely written.

The data checksum can go in the recovery-data block.  If there's
enough slack in the log, by the time that recovery-data block is
overwritten, you can be sure that an fsync has been done for that
data (by a later commit).

> But yes, I will steal this scheme for TDB2, thanks!

Take a look at the filesystems.  I think ext4 did some optimisations
in this area, and that checksums had to be added anyway due to a
subtle replay-corruption problem that happens when the log is
partially corrupted, and followed by non-corrupt blocks.

Also, you can remove even more fsyncs by adding a bit of slack to the
data space and writing into unused/fresh areas some of the time -
i.e. a bit like btrfs/zfs or anything log-structured, but you don't
have to go all the way with that.

> In practice, it's the first sync which is glacial, the rest are pretty cheap.

The 3rd and 4th fsyncs imply a disk seek each, just because the
preceding writes are to different areas of the disk.  Seeks are quite
slow - but not as slow as ext3 fsyncs :-) What do you mean by cheap?
That it's only a couple of seeks, or that you don't see even that?

> 
> > Also cannot see the point of msync if you have already performed an fsync,
> > and if there is a point, I would expect you to call msync before
> > fsync... Maybe there is some subtlety there that I am not aware of.
> 
> I assume it's this from the msync man page:
> 
>msync()  flushes  changes  made  to the in-core copy of a file that was
>mapped into memory using mmap(2) back to disk.   Without  use  of  this
>call  there  is  no guarantee that changes are written back before mun‐
>map(2) is called. 

Historically, that means msync() ensures dirty mapping data is written
to the file as if with write(), and that mapping pages are removed or
refreshed to get the effect of read() (possibly a lazy one).  It's
more obvious in the early mmap implementations where mappings don't
share pages with the filesystem cache, so msync() has explicit
behaviour.

Like with write(), after calling msync() you would then call fsync()
to ensure the data is flushed to disk.

If you've been calling fsync then msync, I guess that's another fine
example of how these function are so hard to test, that they aren't.

Historically on Linux, msync has been iffy on some architectures, and
I'm still not sure it has the same semantics as other unixes.  fsync
as we know has also been iffy, and even now that fsync is tidier it
does not always issue a hardware-level cache commit.

But then historically writable mmap has been iffy on a boatload of
unixes.

> > > It's an implementation detail; barrier has less flexibility because it has
> > > less information about what is required. I'm saying I want to give you as
> > > much information as I can, even if you don't use it yet.
> > 
> > Only we know that approach doesn't work.
> > People will learn that they don't need to give the extra information to 
> > still
> > achieve the same result - just like they did with ext3 and fsync.
> > Then when we improve the implementation to only provide the guarantees that
> > you asked for, people will complain that they are getting empty files that
> > they didn't expect.
> 
> I think that's an oversimplification: IIUC that occurred to people *not*
> using fsync().  They weren't using it because it was too slow.  Providing
> a primitive which is as fast or faster and more specific doesn't have the
> same magnitude of social issues.

I agree with Rusty.  Let's make it perform well so there is no reason
to deliberately avoid using it, and let's make say what apps actually
want to request without being way too strong.

And please, if anyone has ideas on how we could make correct use of
these functions *testable* by app authors, I'm all ears.  Right now it
is quite difficult - pulling power on hard disks mid-transaction is
not a convenient method :)

> > The abstraction I would like to see is a simple 'barrier' that contains no
> > data and has a filesystem-wide effect.
> 
> I think you lack ambition ;)
> 
> Thinking about the single-file use case (eg. kvm guest or tdb), isn't that
> suboptimal for md?  Since you have to hand your barrier to every device
> whereas a file-wide primitive may theoretically only go to some.

Yes.

Note that database-like programs s

[Qemu-devel] Re: [patch uq/master 5/9] kvm: synchronize state from cpu context

2010-05-06 Thread Avi Kivity

On 05/04/2010 03:45 PM, Marcelo Tosatti wrote:

From: Jan Kiszka

It is not safe to retrieve the KVM internal state of a given cpu
while its potentially modifying it.

Queue the request to run on cpu context, similarly to qemu-kvm.
   


Even better is to query the state in the cpu thread; but that can come 
later.


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
> On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
> > Jens Axboe wrote:
> > > On Tue, May 04 2010, Rusty Russell wrote:
> > > > ISTR someone mentioning a desire for such an API years ago, so CC'ing 
> > > > the
> > > > usual I/O suspects...
> > > 
> > > It would be nice to have a more fuller API for this, but the reality is
> > > that only the flush approach is really workable. Even just strict
> > > ordering of requests could only be supported on SCSI, and even there the
> > > kernel still lacks proper guarantees on error handling to prevent
> > > reordering there.
> > 
> > There's a few I/O scheduling differences that might be useful:
> > 
> > 1. The I/O scheduler could freely move WRITEs before a FLUSH but not
> >before a BARRIER.  That might be useful for time-critical WRITEs,
> >and those issued by high I/O priority.
> 
> This is only because noone actually wants flushes or barriers, though
> I/O people seem to only offer that.  We really want " must
> occur before ".  That offers maximum choice to the I/O subsystem
> and potentially to smart (virtual?) disks.

We do want flushes for the "D" in ACID - such things as after
receiving a mail, or blog update into a database file (could be TDB),
and confirming that to the sender, to have high confidence that the
update won't disappear on system crash or power failure.

Less obviously, it's also needed for the "C" in ACID when more than
one file is involved.  "C" is about differently updated things staying
consistent with each other.

For example, imagine you have a TDB file mapping Samba usernames to
passwords, and another mapping Samba usernames to local usernames.  (I
don't know if you do this; it's just an illustration).

To rename a Samba user involves updating both.  Let's ignore transient
transactional issues :-) and just think about what happens with
per-file barriers and no sync, when a crash happens long after the
updates, and before the system has written out all data and issued low
level cache flushes.

After restarting, due to lack of sync, the Samba username could be
present in one file and not the other.

> > 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
> >only for data belonging to a particular file (e.g. fdatasync with
> >no file size change, even on btrfs if O_DIRECT was used for the
> >writes being committed).  That would entail tagging FLUSHes and
> >WRITEs with a fs-specific identifier (such as inode number), opaque
> >to the scheduler which only checks equality.
> 
> This is closer.  In userspace I'd be happy with a "all prior writes to this
> struct file before all future writes".  Even if the original guarantees were
> stronger (ie. inode basis).  We currently implement transactions using 4 fsync
> /msync pairs.
> 
>   write_recovery_data(fd);
>   fsync(fd);
>   msync(mmap);
>   write_recovery_header(fd);
>   fsync(fd);
>   msync(mmap);
>   overwrite_with_new_data(fd);
>   fsync(fd);
>   msync(mmap);
>   remove_recovery_header(fd);
>   fsync(fd);
>   msync(mmap);
> 
> Yet we really only need ordering, not guarantees about it actually hitting
> disk before returning.
> 
> > In other words, FLUSH can be more relaxed than BARRIER inside the
> > kernel.  It's ironic that we think of fsync as stronger than
> > fbarrier outside the kernel :-)
> 
> It's an implementation detail; barrier has less flexibility because it has
> less information about what is required. I'm saying I want to give you as
> much information as I can, even if you don't use it yet.

I agree, and I've started a few threads about it over the last couple of years.

An fsync_range() system call would be very easy to use and
(most importantly) easy to understand.

With optional flags to weaken it (into fdatasync, barrier without sync,
sync without barrier, one-sided barrier, no lowlevel cache-flush, don't rush,
etc.), it would be very versatile, and still easy to understand.

With an AIO version, and another flag meaning don't rush, just return
when satisfied, and I suspect it would be useful for the most
demanding I/O apps.

-- Jamie




[Qemu-devel] [PATCH 4/4] vnc: adjust compression zstream level

2010-05-06 Thread Corentin Chary
Signed-off-by: Corentin Chary 
---
 vnc-encoding-zlib.c |9 -
 vnc.h   |1 +
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/vnc-encoding-zlib.c b/vnc-encoding-zlib.c
index 6a16a79..29dd1b7 100644
--- a/vnc-encoding-zlib.c
+++ b/vnc-encoding-zlib.c
@@ -83,10 +83,17 @@ static int vnc_zlib_stop(VncState *vs)
 return -1;
 }
 
+vs->zlib_level = vs->tight_compression;
 zstream->opaque = vs;
 }
 
-// XXX what to do if tight_compression changed in between?
+if (vs->tight_compression != vs->zlib_level) {
+if (deflateParams(zstream, vs->tight_compression,
+  Z_DEFAULT_STRATEGY) != Z_OK) {
+return -1;
+}
+vs->zlib_level = vs->tight_compression;
+}
 
 // reserve memory in output buffer
 buffer_reserve(&vs->output, vs->zlib.offset + 64);
diff --git a/vnc.h b/vnc.h
index dfdb240..96f3fe7 100644
--- a/vnc.h
+++ b/vnc.h
@@ -174,6 +174,7 @@ struct VncState
 Buffer zlib;
 Buffer zlib_tmp;
 z_stream zlib_stream;
+int zlib_level;
 
 Notifier mouse_mode_notifier;
 
-- 
1.7.0.2





[Qemu-devel] Bug in net/socket.c: info_str is overwritten

2010-05-06 Thread Miguel Di Ciurcio Filho
Hi there,

I'm working on the conversion of the 'info network' command do QMP,
and I think I've found some problems.

Running qemu like this: qemu -net socket,listen=:

In net/socket.c, the function net_socket_listen_init() [1] is called
and a listening socket is created. There is nothing registering this,
so when using 'info network' as is, no information about this
listening socket is shown.

When a connection is accepted, the handler net_socket_accept() calls
net_socket_fd_init(). If the socket is
UDP/multicast net_socket_fd_init() calls net_socket_fd_init_dgram, if
the socket is TCP it calls net_socket_fd_init_dgram_stream. [2]

In both cases the info_str string is written inside
net_socket_fd_init_(stream|dgram) [3], and after that, it is
overwritten on a subsequent
snprintf() in net_socket_accept() [4].

net_socket_fd_init_(stream|dgram) always puts into the info_str the fd
number, and this information is overwritten latter. So, is the fd
number
relevant to be transmitted over QMP? Right now this information is
being lost, IMHO.

Same thing happens when qemu is run as a client:
net_socket_connect_init() calls net_socket_fd_init_(dgram|stream) and
latter overwrites info_str.

[1] http://git.qemu.org/qemu.git/tree/net/socket.c#n375
[2] http://git.qemu.org/qemu.git/tree/net/socket.c#n336
[3] http://git.qemu.org/qemu.git/tree/net/socket.c#n310
[4] http://git.qemu.org/qemu.git/tree/net/socket.c#n369




[Qemu-devel] [PATCH][RESEND] exec: optimize lduw_phys and stw_phys

2010-05-06 Thread Bernhard Kohl

Implementation of the optimized code for these two functions.

This is necessary for virtio which reads and writes VirtQueue index
fields using these functions. The assumption is that this are atomic
operations, which is not the case, if the memcpy() function which is
used in the non optimized code does single byte copying. This happens
for example with an older WindRiver glibc.

Signed-off-by: Bernhard Kohl 
---
RESEND: This message did not reach the gmane archive and maybe others.
---
 exec.c |   67 
+--

 1 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index 14d1fd7..fb40398 100644
--- a/exec.c
+++ b/exec.c
@@ -3739,12 +3739,36 @@ uint32_t ldub_phys(target_phys_addr_t addr)
 return val;
 }

-/* XXX: optimize */
+/* warning: addr must be aligned */
 uint32_t lduw_phys(target_phys_addr_t addr)
 {
-uint16_t val;
-cpu_physical_memory_read(addr, (uint8_t *)&val, 2);
-return tswap16(val);
+int io_index;
+uint8_t *ptr;
+uint32_t val;
+unsigned long pd;
+PhysPageDesc *p;
+
+p = phys_page_find(addr >> TARGET_PAGE_BITS);
+if (!p) {
+pd = IO_MEM_UNASSIGNED;
+} else {
+pd = p->phys_offset;
+}
+
+if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
+!(pd & IO_MEM_ROMD)) {
+/* I/O case */
+io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+if (p)
+addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
+val = io_mem_read[io_index][1](io_mem_opaque[io_index], addr);
+} else {
+/* RAM case */
+ptr = qemu_get_ram_ptr(pd & TARGET_PAGE_MASK) +
+(addr & ~TARGET_PAGE_MASK);
+val = lduw_p(ptr);
+}
+return val;
 }

 /* warning: addr must be aligned. The ram page is not masked as dirty
@@ -3861,11 +3885,40 @@ void stb_phys(target_phys_addr_t addr, uint32_t val)
 cpu_physical_memory_write(addr, &v, 1);
 }

-/* XXX: optimize */
+/* warning: addr must be aligned */
 void stw_phys(target_phys_addr_t addr, uint32_t val)
 {
-uint16_t v = tswap16(val);
-cpu_physical_memory_write(addr, (const uint8_t *)&v, 2);
+int io_index;
+uint8_t *ptr;
+unsigned long pd;
+PhysPageDesc *p;
+
+p = phys_page_find(addr >> TARGET_PAGE_BITS);
+if (!p) {
+pd = IO_MEM_UNASSIGNED;
+} else {
+pd = p->phys_offset;
+}
+
+if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+if (p)
+addr = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
+io_mem_write[io_index][1](io_mem_opaque[io_index], addr, val);
+} else {
+unsigned long addr1;
+addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+/* RAM case */
+ptr = qemu_get_ram_ptr(addr1);
+stw_p(ptr, val);
+if (!cpu_physical_memory_is_dirty(addr1)) {
+/* invalidate code */
+tb_invalidate_phys_page_range(addr1, addr1 + 2, 0);
+/* set dirty bit */
+phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+(0xff & ~CODE_DIRTY_FLAG);
+}
+}
 }

 /* XXX: optimize */
--
1.6.6.1





Re: [Qemu-devel] Fate of the read-only block drivers?

2010-05-06 Thread Kevin Wolf
Am 05.05.2010 20:27, schrieb Christoph Hellwig:
> Currently we have four very simple, read-only block drivers in the
> tree:
> 
>  - cloop:
>   This one is known buggy for non-trivial use and didn't get
>   any chance but the usual API changes and cleanup sweeps
>   since it was commited in 2004.
> 
>  - dmg:
>   This one is really grotty and by auditing the code and comparing
>   it to dmg2img completely buggy for non-compressed chunks.  It's
>   also missing lots of the features in modern dmg image.  Also
>   there's no creation tool for Linux.  No non-trivial commits
>   since the initial import in 2004.
> 
>  - bochs:
>   There is an bximage tools to actually create images for it,
>   but not to actually add any content to them.  It had one
>   non-trivial commit in 2007 to add support for growable formats
>   since it's initial commit in 2004.
> 
>  - parallels:
>   I can't find any way to actually create the image except with
>   the parallels image tool which seems to require a commercial
>   parallels license.
>   No non-trivial commits since the initial import in 2007.
> 
> At this point I'm not sure if it's worth bothering to fix these up,
> as they're bitrotted and except for cloop there's no reasy way to
> actually create test images for them.

We can create images at least for cloop and bochs, which is 50%, even
though the latter not automated. (The only way I found was using bochs,
and the biggest challenge there is finding a guest OS that boots up in
finite time. A FreeDOS floppy did the trick for my manual test.)

Likewise, not all of them are known broken, it's only cloop and one path
in dmg. I'm not sure about the good reasons to keep them, but I guess
the code wouldn't exist in the first place if nobody had cared about the
functionality. It's probably easy to remove code that can't possibly
work today, but that's true only for part of this code.

> Any good reason to keep them, and if yes any good idea for a test plan
> to keep them working?

Basically I see two options for testing them:

a) Make the read-write, so you can test consistency of qemu's
implementation. This is what we're doing for formats like VMDK. Probably
not worth it.

b) Put some binary images in the qemu-iotests repository. For each
format we would need someone who owns the respective software and can
convert our test image to that format. For cloop and bochs we can
probably do it ourselves, for dmg and parallels we would need help.

Kevin




[Qemu-devel] Re: [PATCH] block: Fix bdrv_commit

2010-05-06 Thread Naphtali Sprei
Kevin Wolf wrote:
> When reopening the image, don't guess the driver, but use the same driver as
> was used before. This is important if the format=... option was used for that
> image.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index adb3e5d..ffe82f8 100644
> --- a/block.c
> +++ b/block.c
> @@ -701,12 +701,12 @@ int bdrv_commit(BlockDriverState *bs)
>  bdrv_delete(bs->backing_hd);
>  bs->backing_hd = NULL;
>  bs_rw = bdrv_new("");
> -rw_ret = bdrv_open(bs_rw, filename, open_flags | BDRV_O_RDWR, NULL);
> +rw_ret = bdrv_open(bs_rw, filename, open_flags | BDRV_O_RDWR, drv);
>  if (rw_ret < 0) {
>  bdrv_delete(bs_rw);
>  /* try to re-open read-only */
>  bs_ro = bdrv_new("");
> -ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, 
> NULL);
> +ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, drv);
>  if (ret < 0) {
>  bdrv_delete(bs_ro);
>  /* drive not functional anymore */
> @@ -758,7 +758,7 @@ ro_cleanup:
>  bdrv_delete(bs->backing_hd);
>  bs->backing_hd = NULL;
>  bs_ro = bdrv_new("");
> -ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, NULL);
> +ret = bdrv_open(bs_ro, filename, open_flags & ~BDRV_O_RDWR, drv);
>  if (ret < 0) {
>  bdrv_delete(bs_ro);
>  /* drive not functional anymore */

Acked-by: Naphtali Sprei 




Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Avi Kivity

On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:

+   /* We publish the last-seen used index at the end of the available ring.
+* It is at the end for backwards compatibility. */
+   vr->last_used_idx =&(vr)->avail->ring[num];
+   /* Verify that last used index does not spill over the used ring. */
+   BUG_ON((void *)vr->last_used_idx +
+  sizeof *vr->last_used_idx>  (void *)vr->used);
  }
   


Shouldn't this be on its own cache line?

One way to achieve this is to allocate it at the end.

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] vgabios + qemu: issues and plans.

2010-05-06 Thread Isaku Yamahata
On Wed, May 05, 2010 at 03:38:21PM +0200, Gerd Hoffmann wrote:
>   Hi,
>
> Today we have two vgabios versions in qemu:  The standard one  
> (vgabios.bin) and the cirrus one (vgabios-cirrus.bin).
>
> The cirrus vgabios is a PCI ROM.  We can (and do) load it into the ROM  
> PCI bar.  The vgabios checks the pci config space to figure where the  
> linear framebuffer (for vesa graphics) is mapped to.  It knows how to  
> program the cirrus.
>
> The standard bios isn't a PCI ROM.  We have to load it using the seabios  
> firmware interface.  It expects to find the linear framebuffer at the  
> magic address 0xe000.  It uses the bochs extentions to implement  
> vesa graphics support.
>
> So, what is wrong with this?
>
> First, I'd like to be able to load the vgabios via PCI ROM bar on all  
> pci vga cards (stdvga, vmware, soon qxl).  The PCI ID in the bios has to  
> match the PCI ID of the card, so we'll need a bunch of vga bios  
> binaries, all identical except for the PCI ID.  Or we need some kind of  
> binary patching.  Otherwise seabios will not load them from the PCI ROM 
> bar.

Make vga bios aware of all ids?
Then single vga bios image can handle all vga devices.
If new device is introduced (qxl), new id will be added to id table
I thing that adding a new vga device is rare.
Or do you want to more generic/dynamic solution?


> Second, I want to get rid of the magic address 0xe000 (except for  
> isa-vga).  This is basically just a matter of updating to vgabios  
> version 0.6c.  And this needs one vga bios binary per vga card too as  
> the PCI ID is used to lookup the card (and then the framebuffer address)  
> in PCI config space.

I also want to eliminate the magic number.
Boch vga bios is paravirtualized to get VBE address dynamically via a new
pci device with special pci device of device id 0x1234.
Are you meaning to follow bochs way?

Or how about to use qemu fw interface to get the address dynamically?


>
> Comments?  Especially on the binary patching?  Worth it?  Or just build  
> a bunch of binaries?  They are not *that* big after all ...
>
> cheers,
>   Gerd
>
>

-- 
yamahata




Re: [Qemu-devel] question on virtio

2010-05-06 Thread Jamie Lokier
Michael S. Tsirkin wrote:
> Hi!
> I see this in virtio_ring.c:
> 
> /* Put entry in available array (but don't update avail->idx *
>  until they do sync). */
> 
> Why is it done this way?
> It seems that updating the index straight away would be simpler, while
> this might allow the host to specilatively look up the buffer and handle
> it, without waiting for the kick.

Even better, if the host updates a location containing which index it
has seen recently, you can avoid the kick entirely during sustained
flows - just like your recent patch to avoid sending irqs to the
guest.

-- Jamie




[Qemu-devel] [PATCH] vdi: Fix image creation

2010-05-06 Thread Kevin Wolf
The number of blocks needs to be rounded up to cover all of the virtual hard
disk. Without this fix, we can't even open our own images if their size is not
a multiple of the block size.

Signed-off-by: Kevin Wolf 
---
 block/vdi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2b4d2c2..b990bbc 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -827,7 +827,7 @@ static int vdi_create(const char *filename, 
QEMUOptionParameter *options)
 return -errno;
 }
 
-blocks = bytes / block_size;
+blocks = (bytes + block_size - 1) / block_size;
 bmap_size = blocks * sizeof(uint32_t);
 bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
 
-- 
1.6.6.1





Re: [Qemu-devel] [PATCH] QMP: Spec: Private Extensions support

2010-05-06 Thread Markus Armbruster
Anthony, no reply from you; did it fall through the cracks?  If you're
fine with my draft, I'll turn it into a proper patch.


Markus Armbruster  writes:

> Anthony asked me to take a stab at rewriting his draft to something more
> along the lines of what I'm thinking.  Here goes.  I put some remarks
> [in brackets].
>
> FYI, I'll be out of town until Wednesday.
>
>
> 6. Downstream extension of QMP
> --
>
> We recommend that downstream consumers of QEMU do *not* modify QMP.
> Management tools should be able to support both upstream and downstream
> versions of QMP without special logic, and downstream extensions are
> inherently at odds with that.
>
> However, we recognize that it is sometimes impossible for downstreams to
> avoid modifying QMP.  Both upstream and downstream need to take care to
> preserve long-term compatibility and interoperability.
>
> To help with that, QMP reserves JSON object member names beginning with
> '__' (double underscore) for downstream use ("downstream names").  This
> means upstream will never use any downstream names for its commands,
> arguments, errors, asynchronous events, and so forth.
>
> Any new names downstream wishes to add must begin with '__'.  To ensure
> compatibility with other downstreams, it is strongly recommended that
> you prefix the commands with '__RFQDN_' where RFQDN is a valid, reverse
> fully qualified domain name which you control.  For example, a qemu-kvm
> specific monitor command would be:
>
> (qemu) __org.linux-kvm_enable_irqchip
>
> Downstream must not change the server greeting (section 2.2) other than
> to offer additional capabilities.  But see below for why even that is
> discouraged.
>
> Section '5 Compatibility Considerations' applies to downstream as well
> as to upstream, obviously.  [That section needs work!]  It follows that
> downstream must behave exactly like upstream for any input not
> containing members with downstream names ("downstream members"), except
> it may add members with downstream names to its output.
>
> Thus, a client should not be able to distinguish downstream from
> upstream as long as it doesn't send input with downstream members, and
> properly ignores any downstream members in the output it receives.
>
> [I fully support everything up to this point.  I have some reservations
> on the rest, and I doubt it'll be all that useful, but I don't really
> mind having it, at least not in this form.]
>
> Advice on downstream modifications:
> [I made a honest effort at capturing Anthony's intentions here,
> my apologies if I screwed it up.]
>
> 1. Introducing new commands is okay.  If you want to extend an existing
>command, consider introducing a new one with the new behaviour
>instead.  [FIXME Could use a rationale: why is extending bad?  Make
>sure to cover errors, because that's needed for 3.]
>
> 2. Introducing new asynchronous messages is okay.  If you want to extend
>an existing message, consider adding a new one instead.  [FIXME Could
>use a rationale: why is extending bad?]
>
> 3. Introducing new errors for use in new commands is okay.  Adding new
>errors to existing commands counts as extension, so 1. applies.
>
> 4. New capabilities are strongly discouraged.  Capabilities are for
>evolving the basic protocol, and multiple diverging basic protocol
>dialects are most undesirable.




[Qemu-devel] Adding AUDIO CD support in Qemu : Buffer I/O Error on device sr0

2010-05-06 Thread Shubham Mankhand
Dear Developers,

I am working on a Xen HVM platform and trying to add AUDIO CD
support to Qemu. I modified ide_atapi_cmd function to handle audio media.
This includes changing cdrom_read_toc to set ADR/Control bits to
0x10 if media is CDS_AUDIO. Changed GPCMD_MODE_SENSE_10
case to include cdda bit. I also modified READ_CD to do
a ide_atapi_cmd_read for case 0xf0 and have
added GPCMD_READ_SUBCHANNEL command too.

Beacuse of the above changes, now when i load audio cd and do 'xm
block-configure ...'
Windows Media Player now gets launched but it is still unable to play the track.

Dom0 Dmesg shows the following error:
[ 1296.401294] sr 1:0:0:0: [sr0] Result: hostbyte=0x00 driverbyte=0x08
[ 1296.401483] sr 1:0:0:0: [sr0] Sense Key : 0x5 [current]
[ 1296.401772] sr 1:0:0:0: [sr0] ASC=0x64 ASCQ=0x0
[ 1296.402028] end_request: I/O error, dev sr0, sector 0
[ 1296.402129] Buffer I/O error on device sr0, logical block 0
[ 1296.402232] Buffer I/O error on device sr0, logical block 1
[ 1296.402345] Buffer I/O error on device sr0, logical block 2
[ 1296.402450] Buffer I/O error on device sr0, logical block 3
[ 1296.405520] sr 1:0:0:0: [sr0] Result: hostbyte=0x00 driverbyte=0x08
[ 1296.405712] sr 1:0:0:0: [sr0] Sense Key : 0x5 [current]
[ 1296.406023] sr 1:0:0:0: [sr0] ASC=0x64 ASCQ=0x0
[ 1296.406253] end_request: I/O error, dev sr0, sector 0
[ 1296.406353] Buffer I/O error on device sr0, logical block 0

Tracing qemu i get the following call chain from READ CD handler:

[ide_atapi_cmd](lba=0x200 nb_sec=10 trnsfr=f0) ATAPI Packet= be 04 00
00 02 00 00 00 10 f0 00 00
[ide_atapi_cmd_read_dma]read dma: LBA=512 nb_sectors=16
[[ide_atapi_cmd_read_dma_cb]]aio_read_cd: lba=512 n=16
[bdrv_aio_read]
[bdrv_check_request] ret=0
[bdrv_aio_read]
[ide_atapi_cmd_read_dma_cb] ret<0 goto eot
[dma_bdrv_cb](dbs->sg_cur_index == dbs->sg->nsg || ret < 0)

Frankly, from this point i do not understand what is happening. Can
any one of you please help me out understanding what is the reason
that READ CD command is failing and what is the reason of the Buffer I/O error
in dmesg.

It will be great if someone can give me some pointers to understand
the root cause
or suggest someways to further debug this.


Thanks & Regards,
Shubham




Re: [Qemu-devel] Re: How to map PCI memory into the VM without trapping

2010-05-06 Thread Cam Macdonell
On Wed, May 5, 2010 at 5:20 PM, Frank Berreth  wrote:
> Hi there,
> this is just an update if you are interested in the outcome. I turns out
> that my initial assumption that there would be page faults/trapping on the
> memory pages was false. The reason the throughput is so low is because the
> memory was mapped non-cached. The VGA driver and the ivshmem driver use
> pci_ioremap_bar which will *always* map the PCI bar non-cached (even the
> resourceX_wc).
> Changing the driver(s) to use ioremap_cache or ioremap_wc speeds up things
> quite a bit. I don't know if VGA framebuffer was always mapped this way --
> it appears on a real system that usually graphics memory is mapped WC.
> Mapping it UC would cause quite a performance degradation. This could be the
> reason for the reported VGA performance drop in another email thread. IMHO,
> since QEMU emulates VGA, this could be mapped WB.
> Thanks,
> Frank.
>

Hi Frank,

Thanks for the note.  I'll make the change to cached and see if that
helps with some of the tests I'm running.

Cheers,
Cam




[Qemu-devel] [PATCH 0/5] tcg: Initialize prologue after guest_base fixed

2010-05-06 Thread Richard Henderson
By doing this we can make any number of decisions about code generation
during the prologue.  For instance, we can decide whether or not to
reserve a register to hold the value of GUEST_BASE.

This fixes a latent bug in the two ppc ports in that GUEST_BASE was 
being loaded as an immediate before the final value of GUEST_BASE is
computed.

This also fixes some register reservation problems I noticed in the
ia64 port, while trying to decide which register to use for holding
GUEST_BASE across the TB.


r~



Richard Henderson (5):
  tcg: Initialize the prologue after GUEST_BASE is fixed.
  tcg-hppa: Load GUEST_BASE as an immediate.
  tcg-ia64: Fix some register usage issues.
  tcg-ia64: Load GUEST_BASE into a register.
  tcg-ppc: Conditionally reserve TCG_GUEST_BASE_REG.

 bsd-user/main.c|9 ++-
 exec.c |5 +
 linux-user/main.c  |9 ++-
 tcg/hppa/tcg-target.c  |   12 +--
 tcg/ia64/tcg-target.c  |  199 +++-
 tcg/ppc/tcg-target.c   |8 +-
 tcg/ppc64/tcg-target.c |9 +-
 tcg/tcg.c  |3 +
 tcg/tcg.h  |1 +
 9 files changed, 167 insertions(+), 88 deletions(-)





[Qemu-devel] [PATCH 5/5] tcg-ppc: Conditionally reserve TCG_GUEST_BASE_REG.

2010-05-06 Thread Richard Henderson
We need not reserve the register unless we're going to use it.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c   |8 
 tcg/ppc64/tcg-target.c |9 -
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index ce078e4..654d244 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -933,7 +933,10 @@ void tcg_target_qemu_prologue (TCGContext *s)
 tcg_out32 (s, STW | RS (0) | RA (1) | (frame_size + LR_OFFSET));
 
 #ifdef CONFIG_USE_GUEST_BASE
-tcg_out_movi (s, TCG_TYPE_I32, TCG_GUEST_BASE_REG, GUEST_BASE);
+if (GUEST_BASE) {
+tcg_out_movi (s, TCG_TYPE_I32, TCG_GUEST_BASE_REG, GUEST_BASE);
+tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+}
 #endif
 
 tcg_out32 (s, MTSPR | RS (3) | CTR);
@@ -1914,9 +1917,6 @@ void tcg_target_init(TCGContext *s)
 #ifdef _CALL_SYSV
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13);
 #endif
-#ifdef CONFIG_USE_GUEST_BASE
-tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
-#endif
 
 tcg_add_target_add_op_defs(ppc_op_defs);
 }
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 2d436a5..a1e643c 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -899,7 +899,10 @@ void tcg_target_qemu_prologue (TCGContext *s)
 tcg_out32 (s, STD | RS (0) | RA (1) | (frame_size + 16));
 
 #ifdef CONFIG_USE_GUEST_BASE
-tcg_out_movi (s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
+if (GUEST_BASE) {
+tcg_out_movi (s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
+tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+}
 #endif
 
 tcg_out32 (s, MTSPR | RS (3) | CTR);
@@ -1692,9 +1695,5 @@ void tcg_target_init (TCGContext *s)
 #endif
 tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13);
 
-#ifdef CONFIG_USE_GUEST_BASE
-tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
-#endif
-
 tcg_add_target_add_op_defs (ppc_op_defs);
 }
-- 
1.7.0.1





[Qemu-devel] Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-06 Thread Anthony Liguori

On 04/21/2010 12:53 PM, Cam Macdonell wrote:

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

 -device ivshmem,size=[,shm=]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

 -device ivshmem,size=[,shm=]
 [,chardev=][,msi=on][,irqfd=on][,vectors=n]
 -chardev socket,path=,id=

(shared memory server is qemu.git/contrib/ivshmem-server)

Sample programs and init scripts are in a git repo here:

 www.gitorious.org/nahanni
---
  Makefile.target |3 +
  hw/ivshmem.c|  727 +++
  qemu-char.c |6 +
  qemu-char.h |3 +
  qemu-doc.texi   |   25 ++
  5 files changed, 764 insertions(+), 0 deletions(-)
  create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 1ffd802..bc9a681 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
  obj-y += rtl8139.o
  obj-y += e1000.o

+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
  # Hardware support
  obj-i386-y = pckbd.o dma.o
  obj-i386-y += vga.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..f8d8fdb
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,727 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include
+#include
+#include
+#include
+#include
+#include
   


This will break the Windows along with any non-Linux unix or any Linux 
old enough to not have eventfd support.


If it's based on cirrus_vga.c and rtl8139.c, then it ought to carry the 
respective copyrights, no?


Regards,

Anthony Liguori


+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "msix.h"
+#include "qemu-kvm.h"
+#include "libkvm.h"
+
+#include
+#include
+#include
+#include
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState * chr;
+CharDriverState ** eventfd_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+unsigned long ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+int nr_allocated_vms;
+/* array of eventfds for each guest */
+int ** eventfds;
+/* keep track of # of eventfds for each guest*/
+int * eventfds_posn_count;
+
+int nr_alloc_guests;
+int vm_id;
+int num_eventfds;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 4,
+IVPosition = 8,
+Doorbell = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) {
+return (ivs->features&  (1<<  feature));
+}
+
+static inline int is_power_of_two(int x) {
+return (x&  (x-1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size);
+cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s->intrstatus&  s->intrmask)&  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
+   isr ? 1 : 0, s->intrstatus, s->intrmask);
+}
+
+qemu_set_irq(s->dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
+
+s->intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+uint32_t ret = s->intrmask;
+
+IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
+
+return ret;
+}
+
+static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
+
+s->intrstatus = val;
+
+ivshmem_u

[Qemu-devel] [PATCH 3/5] tcg-ia64: Fix some register usage issues.

2010-05-06 Thread Richard Henderson
(1) The output registers were not marked call-clobbered, even though
they can be modified by called functions.
(2) The thread pointer was not marked reserved.
(3) R4-R6 are call-saved, but not saved by the prologue.  Rather than
save them, mark them reserved so that we don't use them.

Signed-off-by: Richard Henderson 
---
 tcg/ia64/tcg-target.c |   67 ++--
 1 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index 401dfec..acd4ce8 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -2283,39 +2283,56 @@ void tcg_target_init(TCGContext *s)
0xull);
 tcg_regset_set(tcg_target_available_regs[TCG_TYPE_I64],
0xull);
-tcg_regset_set(tcg_target_call_clobber_regs,
-   (1 << TCG_REG_R8)  |
-   (1 << TCG_REG_R9)  |
-   (1 << TCG_REG_R10) |
-   (1 << TCG_REG_R11) |
-   (1 << TCG_REG_R13) |
-   (1 << TCG_REG_R14) |
-   (1 << TCG_REG_R15) |
-   (1 << TCG_REG_R16) |
-   (1 << TCG_REG_R17) |
-   (1 << TCG_REG_R18) |
-   (1 << TCG_REG_R19) |
-   (1 << TCG_REG_R20) |
-   (1 << TCG_REG_R21) |
-   (1 << TCG_REG_R22) |
-   (1 << TCG_REG_R23) |
-   (1 << TCG_REG_R24) |
-   (1 << TCG_REG_R25) |
-   (1 << TCG_REG_R26) |
-   (1 << TCG_REG_R27) |
-   (1 << TCG_REG_R28) |
-   (1 << TCG_REG_R29) |
-   (1 << TCG_REG_R30) |
-   (1 << TCG_REG_R31));
-tcg_regset_clear(s->reserved_regs);
 
+tcg_regset_clear(tcg_target_call_clobber_regs);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R8);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R9);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R10);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R11);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R14);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R15);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R16);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R17);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R18);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R19);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R20);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R21);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R22);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R23);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R24);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R25);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R26);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R27);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R28);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R29);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R30);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R31);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R56);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R57);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R58);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R59);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R60);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R61);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R62);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R63);
+
+tcg_regset_clear(s->reserved_regs);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);   /* zero register */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1);   /* global pointer */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R2);   /* internal use */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R3);   /* internal use */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R12);  /* stack pointer */
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13);  /* thread pointer */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R32);  /* return address */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R33);  /* PFS */
 
+/* The following 3 are not in use, are call-saved, but *not* saved
+   by the prologue.  Therefore we cannot use them without modifying
+   the prologue.  There doesn't seem to be any good reason to use
+   these as opposed to the windowed registers.  */
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_R4);
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_R5);
+tcg_regset_set

[Qemu-devel] [PATCH 2/5] tcg-hppa: Load GUEST_BASE as an immediate.

2010-05-06 Thread Richard Henderson
Now that the prologue is generated after GUEST_BASE is fixed,
we can load it as an immediate, and also avoid reserving the
register if it isn't necessary.

Signed-off-by: Richard Henderson 
---
 tcg/hppa/tcg-target.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/hppa/tcg-target.c b/tcg/hppa/tcg-target.c
index 012e486..a5f2162 100644
--- a/tcg/hppa/tcg-target.c
+++ b/tcg/hppa/tcg-target.c
@@ -1629,11 +1629,10 @@ void tcg_target_qemu_prologue(TCGContext *s)
 }
 
 #ifdef CONFIG_USE_GUEST_BASE
-/* Note that GUEST_BASE can change after the prologue is generated.
-   To combat that, load the value from the variable instead of
-   embedding a constant here.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG,
-   TCG_REG_R0, (tcg_target_long)&guest_base);
+if (GUEST_BASE != 0) {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
+tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+}
 #endif
 
 /* Jump to TB, and adjust R18 to be the return address.  */
@@ -1679,9 +1678,6 @@ void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_DP);  /* data pointer */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);  /* stack pointer */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R31); /* ble link reg */
-#ifdef CONFIG_USE_GUEST_BASE
-tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
-#endif
 
 tcg_add_target_add_op_defs(hppa_op_defs);
 }
-- 
1.7.0.1





Re: [Qemu-devel] [PATCH] QMP: Spec: Private Extensions support

2010-05-06 Thread Anthony Liguori

On 05/06/2010 10:52 AM, Markus Armbruster wrote:

Anthony, no reply from you; did it fall through the cracks?  If you're
fine with my draft, I'll turn it into a proper patch.
   


Yes, sorry, I thought I had already responded as such.

Regards,

Anthony Liguori


Markus Armbruster  writes:

   

Anthony asked me to take a stab at rewriting his draft to something more
along the lines of what I'm thinking.  Here goes.  I put some remarks
[in brackets].

FYI, I'll be out of town until Wednesday.


6. Downstream extension of QMP
--

We recommend that downstream consumers of QEMU do *not* modify QMP.
Management tools should be able to support both upstream and downstream
versions of QMP without special logic, and downstream extensions are
inherently at odds with that.

However, we recognize that it is sometimes impossible for downstreams to
avoid modifying QMP.  Both upstream and downstream need to take care to
preserve long-term compatibility and interoperability.

To help with that, QMP reserves JSON object member names beginning with
'__' (double underscore) for downstream use ("downstream names").  This
means upstream will never use any downstream names for its commands,
arguments, errors, asynchronous events, and so forth.

Any new names downstream wishes to add must begin with '__'.  To ensure
compatibility with other downstreams, it is strongly recommended that
you prefix the commands with '__RFQDN_' where RFQDN is a valid, reverse
fully qualified domain name which you control.  For example, a qemu-kvm
specific monitor command would be:

 (qemu) __org.linux-kvm_enable_irqchip

Downstream must not change the server greeting (section 2.2) other than
to offer additional capabilities.  But see below for why even that is
discouraged.

Section '5 Compatibility Considerations' applies to downstream as well
as to upstream, obviously.  [That section needs work!]  It follows that
downstream must behave exactly like upstream for any input not
containing members with downstream names ("downstream members"), except
it may add members with downstream names to its output.

Thus, a client should not be able to distinguish downstream from
upstream as long as it doesn't send input with downstream members, and
properly ignores any downstream members in the output it receives.

[I fully support everything up to this point.  I have some reservations
on the rest, and I doubt it'll be all that useful, but I don't really
mind having it, at least not in this form.]

Advice on downstream modifications:
[I made a honest effort at capturing Anthony's intentions here,
my apologies if I screwed it up.]

1. Introducing new commands is okay.  If you want to extend an existing
command, consider introducing a new one with the new behaviour
instead.  [FIXME Could use a rationale: why is extending bad?  Make
sure to cover errors, because that's needed for 3.]

2. Introducing new asynchronous messages is okay.  If you want to extend
an existing message, consider adding a new one instead.  [FIXME Could
use a rationale: why is extending bad?]

3. Introducing new errors for use in new commands is okay.  Adding new
errors to existing commands counts as extension, so 1. applies.

4. New capabilities are strongly discouraged.  Capabilities are for
evolving the basic protocol, and multiple diverging basic protocol
dialects are most undesirable.
 






[Qemu-devel] Re: [Commit 20d97356] Fix OpenBSD build

2010-05-06 Thread Blue Swirl
On 5/5/10, Kevin Wolf  wrote:
> Hi Blue Swirl,
>
>  did I miss the patch posted for this commit? I can't find it on the
>  mailing list. I think the BlockDriver declarations should really stay at
>  the end of the source file of each driver.
>
>  What about using a different way to get rid of the static forward
>  declaration, namely using bdrv_find_format() instead of referencing it
>  directly? If you agree I'll post patches to revert your change and use
>  this solution instead.

Somehow I can't form in my mind the change from your description, but
I'd be fine with that too if it also compiles in OpenBSD.




[Qemu-devel] Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-06 Thread Cam Macdonell
On Thu, May 6, 2010 at 11:32 AM, Anthony Liguori  wrote:
> On 04/21/2010 12:53 PM, Cam Macdonell wrote:
>>
>> Support an inter-vm shared memory device that maps a shared-memory object
>> as a
>> PCI device in the guest.  This patch also supports interrupts between
>> guest by
>> communicating over a unix domain socket.  This patch applies to the
>> qemu-kvm
>> repository.
>>
>>     -device ivshmem,size=[,shm=]
>>
>> Interrupts are supported between multiple VMs by using a shared memory
>> server
>> by using a chardev socket.
>>
>>     -device ivshmem,size=[,shm=]
>>                     [,chardev=][,msi=on][,irqfd=on][,vectors=n]
>>     -chardev socket,path=,id=
>>
>> (shared memory server is qemu.git/contrib/ivshmem-server)
>>
>> Sample programs and init scripts are in a git repo here:
>>
>>     www.gitorious.org/nahanni
>> ---
>>  Makefile.target |    3 +
>>  hw/ivshmem.c    |  727
>> +++
>>  qemu-char.c     |    6 +
>>  qemu-char.h     |    3 +
>>  qemu-doc.texi   |   25 ++
>>  5 files changed, 764 insertions(+), 0 deletions(-)
>>  create mode 100644 hw/ivshmem.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 1ffd802..bc9a681 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>  obj-y += rtl8139.o
>>  obj-y += e1000.o
>>
>> +# Inter-VM PCI shared memory
>> +obj-y += ivshmem.o
>> +
>>  # Hardware support
>>  obj-i386-y = pckbd.o dma.o
>>  obj-i386-y += vga.o
>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
>> new file mode 100644
>> index 000..f8d8fdb
>> --- /dev/null
>> +++ b/hw/ivshmem.c
>> @@ -0,0 +1,727 @@
>> +/*
>> + * Inter-VM Shared Memory PCI device.
>> + *
>> + * Author:
>> + *      Cam Macdonell
>> + *
>> + * Based On: cirrus_vga.c and rtl8139.c
>> + *
>> + * This code is licensed under the GNU GPL v2.
>> + */
>> +#include
>> +#include
>> +#include
>> +#include
>> +#include
>> +#include
>>
>
> This will break the Windows along with any non-Linux unix or any Linux old
> enough to not have eventfd support.

I'll wrap it with

#ifdef CONFIG_EVENTFD

> If it's based on cirrus_vga.c and rtl8139.c, then it ought to carry the
> respective copyrights, no?

Sure, I can add those

Cam

>
> Regards,
>
> Anthony Liguori
>
>> +#include "hw.h"
>> +#include "console.h"
>> +#include "pc.h"
>> +#include "pci.h"
>> +#include "sysemu.h"
>> +
>> +#include "msix.h"
>> +#include "qemu-kvm.h"
>> +#include "libkvm.h"
>> +
>> +#include
>> +#include
>> +#include
>> +#include
>> +
>> +#define IVSHMEM_IRQFD   0
>> +#define IVSHMEM_MSI     1
>> +
>> +#define DEBUG_IVSHMEM
>> +#ifdef DEBUG_IVSHMEM
>> +#define IVSHMEM_DPRINTF(fmt, args...)        \
>> +    do {printf("IVSHMEM: " fmt, ##args); } while (0)
>> +#else
>> +#define IVSHMEM_DPRINTF(fmt, args...)
>> +#endif
>> +
>> +typedef struct EventfdEntry {
>> +    PCIDevice *pdev;
>> +    int vector;
>> +} EventfdEntry;
>> +
>> +typedef struct IVShmemState {
>> +    PCIDevice dev;
>> +    uint32_t intrmask;
>> +    uint32_t intrstatus;
>> +    uint32_t doorbell;
>> +
>> +    CharDriverState * chr;
>> +    CharDriverState ** eventfd_chr;
>> +    int ivshmem_mmio_io_addr;
>> +
>> +    pcibus_t mmio_addr;
>> +    unsigned long ivshmem_offset;
>> +    uint64_t ivshmem_size; /* size of shared memory region */
>> +    int shm_fd; /* shared memory file descriptor */
>> +
>> +    int nr_allocated_vms;
>> +    /* array of eventfds for each guest */
>> +    int ** eventfds;
>> +    /* keep track of # of eventfds for each guest*/
>> +    int * eventfds_posn_count;
>> +
>> +    int nr_alloc_guests;
>> +    int vm_id;
>> +    int num_eventfds;
>> +    uint32_t vectors;
>> +    uint32_t features;
>> +    EventfdEntry *eventfd_table;
>> +
>> +    char * shmobj;
>> +    char * sizearg;
>> +} IVShmemState;
>> +
>> +/* registers for the Inter-VM shared memory device */
>> +enum ivshmem_registers {
>> +    IntrMask = 0,
>> +    IntrStatus = 4,
>> +    IVPosition = 8,
>> +    Doorbell = 12,
>> +};
>> +
>> +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int
>> feature) {
>> +    return (ivs->features&  (1<<  feature));
>> +}
>> +
>> +static inline int is_power_of_two(int x) {
>> +    return (x&  (x-1)) == 0;
>> +}
>> +
>> +static void ivshmem_map(PCIDevice *pci_dev, int region_num,
>> +                    pcibus_t addr, pcibus_t size, int type)
>> +{
>> +    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
>> +
>> +    IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr,
>> (uint32_t)size);
>> +    cpu_register_physical_memory(addr, s->ivshmem_size,
>> s->ivshmem_offset);
>> +
>> +}
>> +
>> +/* accessing registers - based on rtl8139 */
>> +static void ivshmem_update_irq(IVShmemState *s, int val)
>> +{
>> +    int isr;
>> +    isr = (s->intrstatus&  s->intrmask)&  0x;
>> +
>> +    /* don't print ISR resets */
>> +    if (isr) {
>> +        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
>> +           isr ? 1 : 0, s->intrstatus, s->int

[Qemu-devel] Re: [PATCH] vdi: Fix image creation

2010-05-06 Thread Stefan Weil

Am 06.05.2010 14:55, schrieb Kevin Wolf:
The number of blocks needs to be rounded up to cover all of the 
virtual hard
disk. Without this fix, we can't even open our own images if their 
size is not

a multiple of the block size.

Signed-off-by: Kevin Wolf 
---
block/vdi.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2b4d2c2..b990bbc 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -827,7 +827,7 @@ static int vdi_create(const char *filename, 
QEMUOptionParameter *options)

return -errno;
}

- blocks = bytes / block_size;
+ blocks = (bytes + block_size - 1) / block_size;
bmap_size = blocks * sizeof(uint32_t);
bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));




'bytes' (for header.disk_size) must be fixed, too, and so does vdi_open.
I'll send a patch which hopefully addresses all these points.

Thanks + kind regards,
Stefan





[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Sebastian Herbszt

Gerd Hoffmann wrote:

  Hi,


First, I'd like to be able to load the vgabios via PCI ROM bar on all
pci vga cards (stdvga, vmware, soon qxl). The PCI ID in the bios has
to match the PCI ID of the card, so we'll need a bunch of vga bios
binaries, all identical except for the PCI ID. Or we need some kind of
binary patching. Otherwise seabios will not load them from the PCI ROM
bar.


vgabios could be build multiple times with PCIBIOS set and VENDOR_ID and
DEVICE_ID supplied from the Makefile like it's already done with
VGABIOS_DATE.


Yes, something like this I have in mind.

Works for stdvga and qxl.  vmware fails as it has the memory at pci 
region 1 (region 0 has ioports) and it seems vgabios isn't prepared to 
handle that ...


Do you mean vgabios currently doesn't work with qemu and "-vga vmware" ?

Anyone knows what the best place to send patches to is? 
vgabios-developers list @ savannah looks pretty much dead.  It has one 
(!) message in 2009 which is an unanswered question.


Or is upstream dead anyway and we should just go with 
http://git.qemu.org/vgabios.git/ ?



From the "amount" of messages on the mailing list i conclude vgabios

just works good enough for most people. I would not say the developement
is dead, but event driven and there is no itch to strach at the moment.

Try sending patches to the mailing list and/or the vgabios home at
http://savannah.nongnu.org/projects/vgabios .

Sebastian


cheers,
  Gerd





Re: [Qemu-devel] [PATCH 1/3] cursor: add cursor functions.

2010-05-06 Thread Blue Swirl
On 5/5/10, Gerd Hoffmann  wrote:
> Add a new cursor type to console.h and a bunch of functions to
>  deal with cursors the (new) cursor.c file.
>
>  Signed-off-by: Gerd Hoffmann 
>  ---
>   Makefile.objs |3 +-
>   console.h |   24 ++-
>   cursor.c  |  208 
> +
>   3 files changed, 232 insertions(+), 3 deletions(-)
>   create mode 100644 cursor.c
>
>  diff --git a/Makefile.objs b/Makefile.objs
>  index ecdd53e..1ee6e9d 100644
>  --- a/Makefile.objs
>  +++ b/Makefile.objs
>  @@ -48,7 +48,8 @@ common-obj-y = $(block-obj-y)
>   common-obj-y += $(net-obj-y)
>   common-obj-y += $(qobject-obj-y)
>   common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
>  -common-obj-y += readline.o console.o async.o qemu-error.o
>  +common-obj-y += readline.o console.o cursor.o async.o qemu-error.o
>  +
>   common-obj-y += tcg-runtime.o host-utils.o
>   common-obj-y += irq.o ioport.o input.o
>   common-obj-$(CONFIG_PTIMER) += ptimer.o
>  diff --git a/console.h b/console.h
>  index 6def115..88861cb 100644
>  --- a/console.h
>  +++ b/console.h
>  @@ -126,6 +126,27 @@ struct DisplaySurface {
>  struct PixelFormat pf;
>   };
>
>  +/* cursor data format is 32bit RGBA */
>  +typedef struct QEMUCursor {
>  +int width, height;
>  +int hot_x, hot_y;
>  +int refcount;
>  +uint32_tdata[];
>  +} QEMUCursor;
>  +
>  +QEMUCursor *cursor_alloc(int width, int height);
>  +void cursor_get(QEMUCursor *c);
>  +void cursor_put(QEMUCursor *c);
>  +QEMUCursor *cursor_builtin_hidden(void);
>  +QEMUCursor *cursor_builtin_left_ptr(void);
>  +void cursor_print_ascii_art(QEMUCursor *c, const char *prefix);
>  +int cursor_get_mono_bpl(QEMUCursor *c);
>  +void cursor_set_mono(QEMUCursor *c,
>  + uint32_t foreground, uint32_t background, uint8_t 
> *image,
>  + int transparent, uint8_t *mask);
>  +void cursor_get_mono_image(QEMUCursor *c, int foreground, uint8_t *mask);
>  +void cursor_get_mono_mask(QEMUCursor *c, int transparent, uint8_t *mask);
>  +
>   struct DisplayChangeListener {
>  int idle;
>  uint64_t gui_timer_interval;
>  @@ -158,8 +179,7 @@ struct DisplayState {
>  struct DisplayChangeListener* listeners;
>
>  void (*mouse_set)(int x, int y, int on);
>  -void (*cursor_define)(int width, int height, int bpp, int hot_x, int 
> hot_y,
>  -  uint8_t *image, uint8_t *mask);
>  +void (*cursor_define)(QEMUCursor *cursor);
>
>  struct DisplayState *next;
>   };
>  diff --git a/cursor.c b/cursor.c
>  new file mode 100644
>  index 000..3995a31
>  --- /dev/null
>  +++ b/cursor.c
>  @@ -0,0 +1,208 @@
>  +#include "qemu-common.h"
>  +#include "console.h"
>  +
>  +static const char cursor_hidden_32[32*32];
>  +static const char cursor_left_ptr_32[32*32] = {
>  +""
>  +" X  "
>  +" XX "
>  +" X.X"
>  +" X..X   "
>  +" X...X  "
>  +" XX "
>  +" X.X"
>  +" X..X   "
>  +" X...X  "
>  +" XX "
>  +" X.X"
>  +" X..X..X"
>  +" X.X X..X   "
>  +" XX  X..X   "
>  +" XX..X  "
>  +"  X..X  "
>  +"   X..X "
>  +"   X..X "
>  +"XX  "
>  +""
>  +};

Is this format standard? How about using X bitmap format instead:
$ cat /usr/include/X11/bitmaps/left_ptr
#define left_ptr_width 16
#define left_ptr_height 16
#define left_ptr_x_hot 3
#define left_ptr_y_hot 1
static char left_ptr_bits[] = {
   0x00, 0x00, 0x08, 0x00, 0x18, 0x00, 0x38, 0x00, 0x78, 0x00, 0xf8, 0x00,
   0xf8, 0x01, 0xf8, 0x03, 0xf8, 0x07, 0xf8, 0x00, 0xd8, 0x00, 0x88, 0x01,
   0x80, 0x01, 0x00, 0x03, 0x00, 0x03, 0x00, 0x00};

Then there would be no need of parsing.

>  +
>  +/* for creating built-in cursors */
>  +static void cursor_parse_ascii_art(QEMUCursor *c, const char *ptr)
>  +{
>  +int i, pixels;
>  +
>  +pixels = c->width * c->height;
>  +for (i = 0; i < pixels; i++) {
>  +switch (ptr[i]) {
>  +case 'X': /* black */
>  +c->data[i] = 0xff00;
>  +break;
>  +case '.': /* white */
>  +c->data[i] = 0x;
>  +break;
>  +case ' ': /* transparent */
>  +default:
>  +c->data[i] = 0x;
>  +break;
>  +}
>  +}
>  +}
>  +
>  +/* nice for debugging */
>  +void cursor_print_ascii_art(QEMUCursor *c, const cha

[Qemu-devel] [PATCH 1/5] tcg: Initialize the prologue after GUEST_BASE is fixed.

2010-05-06 Thread Richard Henderson
This will allow backends to make intelligent choices about how
to implement GUEST_BASE.

Signed-off-by: Richard Henderson 
---
 bsd-user/main.c   |9 -
 exec.c|5 +
 linux-user/main.c |9 -
 tcg/tcg.c |3 +++
 tcg/tcg.h |1 +
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index b1c438d..05cc3d9 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -30,7 +30,7 @@
 #include "qemu-common.h"
 /* For tb_lock */
 #include "exec-all.h"
-
+#include "tcg.h"
 #include "qemu-timer.h"
 #include "envlist.h"
 
@@ -970,6 +970,13 @@ int main(int argc, char **argv)
 syscall_init();
 signal_init();
 
+#if defined(CONFIG_USE_GUEST_BASE)
+/* Now that we've loaded the binary, GUEST_BASE is fixed.  Delay
+   generating the prologue until now so that the prologue can take
+   the real value of GUEST_BASE into account.  */
+tcg_prologue_init(&tcg_ctx);
+#endif
+
 /* build Task State */
 memset(ts, 0, sizeof(TaskState));
 init_task_state(ts);
diff --git a/exec.c b/exec.c
index 95b92f7..f417bb6 100644
--- a/exec.c
+++ b/exec.c
@@ -574,6 +574,11 @@ void cpu_exec_init_all(unsigned long tb_size)
 #if !defined(CONFIG_USER_ONLY)
 io_mem_init();
 #endif
+#if !defined(CONFIG_USER_ONLY) || !defined(CONFIG_USE_GUEST_BASE)
+/* There's no guest base to take into account, so go ahead and
+   initialize the prologue now.  */
+tcg_prologue_init(&tcg_ctx);
+#endif
 }
 
 #if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
diff --git a/linux-user/main.c b/linux-user/main.c
index eafcfc1..331cc30 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -31,7 +31,7 @@
 #include "cache-utils.h"
 /* For tb_lock */
 #include "exec-all.h"
-
+#include "tcg.h"
 #include "qemu-timer.h"
 #include "envlist.h"
 
@@ -2984,6 +2984,13 @@ int main(int argc, char **argv, char **envp)
 syscall_init();
 signal_init();
 
+#if defined(CONFIG_USE_GUEST_BASE)
+/* Now that we've loaded the binary, GUEST_BASE is fixed.  Delay
+   generating the prologue until now so that the prologue can take
+   the real value of GUEST_BASE into account.  */
+tcg_prologue_init(&tcg_ctx);
+#endif
+
 #if defined(TARGET_I386)
 cpu_x86_set_cpl(env, 3);
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 53da2c4..9105266 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -240,7 +240,10 @@ void tcg_context_init(TCGContext *s)
 }
 
 tcg_target_init(s);
+}
 
+void tcg_prologue_init(TCGContext *s)
+{
 /* init global prologue and epilogue */
 s->code_buf = code_gen_prologue;
 s->code_ptr = s->code_buf;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 166c889..bab13cd 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -346,6 +346,7 @@ static inline void *tcg_malloc(int size)
 }
 
 void tcg_context_init(TCGContext *s);
+void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
 
 int tcg_gen_code(TCGContext *s, uint8_t *gen_code_buf);
-- 
1.7.0.1





[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Sebastian Herbszt

Gerd Hoffmann wrote:

First, I'd like to be able to load the vgabios via PCI ROM bar on all
pci vga cards (stdvga, vmware, soon qxl).  The PCI ID in the bios has to
match the PCI ID of the card, so we'll need a bunch of vga bios
binaries, all identical except for the PCI ID.  Or we need some kind of
binary patching.  Otherwise seabios will not load them from the PCI ROM
bar.


Make vga bios aware of all ids?


Would work for the code doing the lfb address lookup.  But I don't think 
the PCI option rom header allows multiple entries.


AFAIK it does not, but you can put multiple PCI option roms (concatenated)
on the BAR (see "6.3.1. PCI Expansion ROM Contents").


I also want to eliminate the magic number.
Boch vga bios is paravirtualized to get VBE address dynamically via a new
pci device with special pci device of device id 0x1234.


0x1234 happens to be the vendor id the qemu stdvga has.  Probably the 
same is true for bochs.  So this code actually looks up the emulated vga 
card ;)


Bochs and qemu use 0x1234:0x.

Sebastian


Are you meaning to follow bochs way?


Yes.

cheers,
  Gerd









[Qemu-devel] [PATCH] vdi: Fix image opening and creation for odd disk sizes

2010-05-06 Thread Stefan Weil
This patch fixes a regression introduced by commit
95a2f9bc588c3f83375d87b0a9394f89a1bcfada.

The fix is based on a patch from Kevin Wolf. Here his comment:

"The number of blocks needs to be rounded up to cover all of the virtual hard
disk. Without this fix, we can't even open our own images if their size is not
a multiple of the block size."

While Kevin's patch addressed vdi_create, my modification also fixes
vdi_open which now accepts any image which is large enough to hold
the blocks.

I also decided to keep the original code in vdi_create which rounds down.
Rounding works in both directions, and there are good arguments for both,
so I just left the original simple code.

It is very important to use the rounded value for the new disk size, too -
otherwise VirtualBox cannot open our disk image.

Cc: Kevin Wolf 
Cc: François Revol 
Signed-off-by: Stefan Weil 
---
 block/vdi.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 1d257b4..2213819 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -414,8 +414,8 @@ static int vdi_open(BlockDriverState *bs, int flags)
 } else if (header.block_size != 1 * MiB) {
 logout("unsupported block size %u B\n", header.block_size);
 goto fail;
-} else if ((header.disk_size + header.block_size - 1) / header.block_size 
!=
-   (uint64_t)header.blocks_in_image) {
+} else if (header.disk_size < 
+   (uint64_t)header.blocks_in_image * header.block_size) {
 logout("unexpected block number %u B\n", header.blocks_in_image);
 goto fail;
 } else if (!uuid_is_null(header.uuid_link)) {
@@ -827,7 +827,10 @@ static int vdi_create(const char *filename, 
QEMUOptionParameter *options)
 return -errno;
 }
 
+/* VirtualBox wants a disk size which is a multiple of the block size. */
 blocks = bytes / block_size;
+bytes = blocks * block_size;
+
 bmap_size = blocks * sizeof(uint32_t);
 bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
 
-- 
1.7.0





Re: [Qemu-devel] eepro100: missing gpxe-eepro100-80862449.rom?

2010-05-06 Thread Stefan Weil

Am 05.05.2010 22:03, schrieb Sebastian Herbszt:

eepro100_register_devices() references three option roms:
- gpxe-eepro100-80861209.rom
- gpxe-eepro100-80861229.rom
- gpxe-eepro100-80862449.rom

The last one seems to be missing because it's not in the pc-bios 
directory.


Sebastian


Yes, it is missing. I did not add it for several reasons:

* The new variant i82801 (which needs the missing ROM) is currently not 
available

  via model=i82801 (it can be selected by experts using other methods).

* Experts can get the missing ROM from ROM-o-matic.

* All three ROMs differ only in one or two bytes (device id). That's a waste
  of binary space (56832 bytes!). Therefore I plan to remove all but 
the first
  ROM file. The missing ones can be created by make or patched on the 
fly by
  QEMU during load. Maybe it is also possible to create ROM files which 
support

  more than one vendor/device entry - that would be the best solution.

Regards
Stefan







[Qemu-devel] [PATCH 4/5] tcg-ia64: Load GUEST_BASE into a register.

2010-05-06 Thread Richard Henderson
Saves one bundle per memory operation.

Signed-off-by: Richard Henderson 
---
 tcg/ia64/tcg-target.c |  132 
 1 files changed, 88 insertions(+), 44 deletions(-)

diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index acd4ce8..0d275e9 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -40,6 +40,12 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+#ifdef CONFIG_USE_GUEST_BASE
+#define TCG_GUEST_BASE_REG TCG_REG_R55
+#else
+#define TCG_GUEST_BASE_REG TCG_REG_R0
+#endif
+
 /* Branch registers */
 enum {
 TCG_REG_B0 = 0,
@@ -1641,9 +1647,13 @@ static inline void tcg_out_qemu_st(TCGContext *s, const 
TCGArg *args, int opc)
 
 static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 {
+static uint64_t const opc_ld_m1[4] = {
+OPC_LD1_M1, OPC_LD2_M1, OPC_LD4_M1, OPC_LD8_M1
+};
+static uint64_t const opc_sxt_i29[4] = {
+OPC_SXT1_I29, OPC_SXT2_I29, OPC_SXT4_I29, 0
+};
 int addr_reg, data_reg, mem_index, s_bits, bswap;
-uint64_t opc_ld_m1[4] = { OPC_LD1_M1, OPC_LD2_M1, OPC_LD4_M1, OPC_LD8_M1 };
-uint64_t opc_sxt_i29[8] = { OPC_SXT1_I29, OPC_SXT2_I29, OPC_SXT4_I29, 0 };
 
 data_reg = *args++;
 addr_reg = *args++;
@@ -1656,19 +1666,21 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const 
TCGArg *args, int opc)
 bswap = 0;
 #endif
 
-tcg_out_bundle(s, mLX,
-   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
-   tcg_opc_l2 ((tcg_target_long) GUEST_BASE),
-   tcg_opc_x2 (TCG_REG_P0, OPC_MOVL_X2, TCG_REG_R2,
-   GUEST_BASE));
-
 #if TARGET_LONG_BITS == 32
-tcg_out_bundle(s, mII,
-   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
-   tcg_opc_i29(TCG_REG_P0, OPC_ZXT4_I29,
-   TCG_REG_R3, addr_reg),
-   tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1, TCG_REG_R2,
-   TCG_REG_R2, TCG_REG_R3));
+if (GUEST_BASE != 0) {
+tcg_out_bundle(s, mII,
+   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
+   tcg_opc_i29(TCG_REG_P0, OPC_ZXT4_I29,
+   TCG_REG_R3, addr_reg),
+   tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1, TCG_REG_R2,
+   TCG_GUEST_BASE_REG, TCG_REG_R3));
+} else {
+tcg_out_bundle(s, miI,
+   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
+   tcg_opc_i29(TCG_REG_P0, OPC_ZXT4_I29,
+   TCG_REG_R2, addr_reg),
+   tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0));
+}
 
 if (!bswap || s_bits == 0) {
 if (s_bits == opc) {
@@ -1724,12 +1736,20 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const 
TCGArg *args, int opc)
 }
 }
 #else
-tcg_out_bundle(s, MmI,
-   tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1, TCG_REG_R2,
-   TCG_REG_R2, addr_reg),
-   tcg_opc_m1 (TCG_REG_P0, opc_ld_m1[s_bits],
-   data_reg, TCG_REG_R2),
-   tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0));
+if (GUEST_BASE != 0) {
+tcg_out_bundle(s, MmI,
+   tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1, TCG_REG_R2,
+   TCG_GUEST_BASE_REG, addr_reg),
+   tcg_opc_m1 (TCG_REG_P0, opc_ld_m1[s_bits],
+   data_reg, TCG_REG_R2),
+   tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0));
+} else {
+tcg_out_bundle(s, mmI,
+   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
+   tcg_opc_m1 (TCG_REG_P0, opc_ld_m1[s_bits],
+   data_reg, addr_reg),
+   tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0));
+}
 
 if (bswap && s_bits == 1) {
 tcg_out_bundle(s, mII,
@@ -1764,8 +1784,13 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const 
TCGArg *args, int opc)
 
 static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 {
+static uint64_t const opc_st_m4[4] = {
+OPC_ST1_M4, OPC_ST2_M4, OPC_ST4_M4, OPC_ST8_M4
+};
 int addr_reg, data_reg, bswap;
-uint64_t opc_st_m4[4] = { OPC_ST1_M4, OPC_ST2_M4, OPC_ST4_M4, OPC_ST8_M4 };
+#if TARGET_LONG_BITS == 64
+uint64_t add_guest_base;
+#endif
 
 data_reg = *args++;
 addr_reg = *args++;
@@ -1776,19 +1801,22 @@ static inline void tcg_out_qemu_st(TCGContext *s, const 
TCGArg *args, int opc)
 bswap = 0;
 #endif
 
-tcg_out_bundle(s, mLX,
-   tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
-   tcg_opc_l2 ((tcg_target_long) GUEST_BASE),
-   tcg_opc_x2 (TCG_REG_P0, OPC_MOVL_X2, TCG_REG_R2,
-   GUEST_BASE));
-
 #if TARGET_LONG_BI

[Qemu-devel] [PATCH] block/vdi: Allow disk images of size 0

2010-05-06 Thread Stefan Weil
Even it is not very useful, users may create images of size 0.

Without the special option CONFIG_ZERO_MALLOC, qemu_mallocz
aborts execution when it is told to allocate 0 bytes,
so avoid this kind of call.

Cc: Kevin Wolf 
Signed-off-by: Stefan Weil 
---
 block/vdi.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2213819..02b9fea 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -435,7 +435,9 @@ static int vdi_open(BlockDriverState *bs, int flags)
 
 bmap_size = header.blocks_in_image * sizeof(uint32_t);
 bmap_size = (bmap_size + SECTOR_SIZE - 1) / SECTOR_SIZE;
-s->bmap = qemu_malloc(bmap_size * SECTOR_SIZE);
+if (bmap_size > 0) {
+s->bmap = qemu_malloc(bmap_size * SECTOR_SIZE);
+}
 if (bdrv_read(bs->file, s->bmap_sector, (uint8_t *)s->bmap, bmap_size) < 
0) {
 goto fail_free_bmap;
 }
@@ -860,7 +862,10 @@ static int vdi_create(const char *filename, 
QEMUOptionParameter *options)
 result = -errno;
 }
 
-bmap = (uint32_t *)qemu_mallocz(bmap_size);
+bmap = NULL;
+if (bmap_size > 0) {
+bmap = (uint32_t *)qemu_mallocz(bmap_size);
+}
 for (i = 0; i < blocks; i++) {
 if (image_type == VDI_TYPE_STATIC) {
 bmap[i] = i;
-- 
1.7.0





[Qemu-devel] Re: sparc64 lazy conditional codes evaluation

2010-05-06 Thread Blue Swirl
On 5/5/10, Igor Kovalenko  wrote:
> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl  wrote:
>  > On 5/3/10, Igor Kovalenko  wrote:
>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl  wrote:
>  >>  > On 5/3/10, Igor Kovalenko  wrote:
>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl  
> wrote:
>  >>  >>  > On 5/3/10, Igor Kovalenko  wrote:
>  >>  >>  >> Hi!
>  >>  >>  >>
>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  >>  >>  we return from trap handler with mismatching conditionals.
>  >>  >>  >>
>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  >>  >>  machine is working through silo initialization. I use gentoo 
> minimal cd
>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with 
> silo boot
>  >>  >>  >>  would experience the same. Once in a while it would report crc 
> error,
>  >>  >>  >>  unable to open cd partition or it would fail to decompress image.
>  >>  >>  >
>  >>  >>  > I think I've also seen this.
>  >>  >>  >
>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  >>  >>  possibly followed by a few instructions which do not touch 
> conditionals,
>  >>  >>  >>  then conditional branch insn. If it happens that we trap while 
> processing
>  >>  >>  >>  conditional branch insn so it is restarted after return from 
> trap then
>  >>  >>  >>  seldom conditional codes are calculated incorrectly.
>  >>  >>  >>
>  >>  >>  >>  I cannot point to exact cause but it appears that after trap 
> return
>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  >>  >>  since adding more cond evaluation flushes over the code helps.
>  >>  >>  >>
>  >>  >>  >>  We already tried doing flush more frequently and it is still not
>  >>  >>  >>  complete, so the question is how to finally do this once and 
> right :)
>  >>  >>  >>
>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  >>  >>  the following list appears to be good start. Plan is to prepare
>  >>  >>  >>  a change to qemu and find a way to test it.
>  >>  >>  >>
>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>  >>  >>use DisasContext->cc_op to predict if flags should be not 
> evaluated
>  >>  >>  >>due to overriding insn. Instead we can drop cc_op from 
> disassembler
>  >>  >>  >>context and simplify code to only use cc_op from env.
>  >>  >>  >
>  >>  >>  > Not currently, but in the future we may use that to do even lazier
>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  >>  >>  > could be much more optimal by changing the branch to do the
>  >>  >>  > comparison. Here's an old unfinished patch to do some of this.
>
>
> I wonder if it buys anything. Sparc RISC architecture means optimizing 
> compiler
>  would prevent any extra flags computation, right? So it is basically 1-to-1
>  conditional computation and use. Or even worse, if we delay computation
>  until there are two or more consumers, correct?

For x86 target, that is the other part of savings from using lazy
condition computation. It's true that it will not benefit RISC targets
much.

But I think you are missing the other part, where the actual flags are
not calculated but instead the original comparison can be used.

For example, consider this Sparc64 code:
IN:
0x01fff000be58:  cmp  %g2, 0x51

OUT: [size=82]
0x40df16b0:  mov0x10(%r14),%rbp
0x40df16b4:  mov%rbp,%rbx
0x40df16b7:  sub$0x51,%rbx
0x40df16bb:  mov%rbx,%r12
0x40df16be:  mov%r12,0x10a60(%r14)
0x40df16c5:  mov%rbp,0x60(%r14)
0x40df16c9:  mov$0x51,%ebp
0x40df16ce:  mov%rbp,0x68(%r14)
0x40df16d2:  mov%rbx,0x70(%r14)
0x40df16d6:  mov$0x7,%ebp
0x40df16db:  mov%ebp,0x78(%r14)
0x40df16df:  mov$0x1fff000be5c,%rbp
0x40df16e9:  mov%rbp,0x48(%r14)
0x40df16ed:  mov$0x1fff000be60,%rbp
0x40df16f7:  mov%rbp,0x50(%r14)
0x40df16fb:  xor%eax,%eax
0x40df16fd:  jmpq   0xbfface

--
IN:
0x01fff000be5c:  bne  0x1fff000c268

OUT: [size=95]
0x40df1710:  callq  0x518260
0x40df1715:  mov0x98(%r14),%ebp
0x40df171c:  mov%ebp,%ebx
0x40df171e:  shr$0x16,%rbx
0x40df1722:  and$0x1,%ebx
0x40df1725:  xor$0x1,%rbx
0x40df1729:  mov%rbx,0x90(%r14)
0x40df1730:  mov$0x1fff000be60,%rbp
0x40df173a:  mov%rbp,0x48(%r14)
0x40df173e:  test   %rbx,%rbx
0x40df1741:  je 0x40df175a
0x40df1747:  mov$0x1fff000c268,%rbp
0x40df1751:  mov%rbp,0x50(%r14)
0x40df1755:  jmpq   0x40df1768
0x40df175a:  mov$0x1fff000be64,%rbp
0x40df1764:  mov%rbp,0x50(%r14)
0x40df1768:  xor%eax,%eax
0x40df176a:  jmpq   0xbfface

Instead of calculating any flags, we should generate for combined
'cmp/btest + branch' sequences something like:
mov0x10(%r14),%rbp
mov%rbp,%rbx
cmp$0x51,%rbx
je 0x40ac275a
mov$0x1fff000c268,%rbp
mov%rbp,0x50(%r14)
jmpq   0x40ac2768
mov$0x1fff000be64,%rbp
mov%rbp,0x50

[Qemu-devel] Re: Re: [RFC] [PATCH] add ahci support into qemu

2010-05-06 Thread Sebastian Herbszt

Stuart Brady wrote:

On Tue, May 04, 2010 at 10:51:37PM +0200, Sebastian Herbszt wrote:

>diff --git a/hw/pci_ids.h b/hw/pci_ids.h
>index fe7a121..4d4de93 100644
>--- a/hw/pci_ids.h
>+++ b/hw/pci_ids.h
>@@ -97,3 +97,4 @@
>#define PCI_DEVICE_ID_INTEL_82371AB  0x7111
>#define PCI_DEVICE_ID_INTEL_82371AB_20x7112
>#define PCI_DEVICE_ID_INTEL_82371AB_30x7113
>+#define PCI_DEVICE_ID_INTEL_ICH6R_AHCI0x2652

The list is sorted by vendor and device id. This entry should go
after "PCI_DEVICE_ID_INTEL_ESB_9". The naming scheme seems
to be VENDOR_DEVICE_FUNCTION, so i suggest something like
PCI_DEVICE_ID_INTEL_ICH6R_2 or PCI_DEVICE_ID_INTEL_82801FR_2.


Linux seems to have called this PCI_DEVICE_ID_INTEL_ICH6_4 at one point.

So this should be function 4, no?


Well, i am not sure if the last part is the function number. pci_ids.h from the 
linux
kernel got this:

#define PCI_DEVICE_ID_INTEL_ICH6_0  0x2640
#define PCI_DEVICE_ID_INTEL_ICH6_1  0x2641
#define PCI_DEVICE_ID_INTEL_ICH6_2  0x2642
#define PCI_DEVICE_ID_INTEL_ICH6_16 0x266a
#define PCI_DEVICE_ID_INTEL_ICH6_17 0x266d
#define PCI_DEVICE_ID_INTEL_ICH6_18 0x266e
#define PCI_DEVICE_ID_INTEL_ICH6_19 0x266f

According to the Intel spec the SATA controller is D31:F2. So we got device 31
and function 2. The IDE controller is D31:F1, but above it's ICH6_19.

Sebastian


Cheers,
--
Stuart Brady








[Qemu-devel] Re: Qemu - samba share help

2010-05-06 Thread Jan Kiszka
Arpit Patel wrote:
> Hi All,
> 
> I am trying to share files between host OS and guest OS, both are Ubuntu.
> 
> Here is the command line that I am using to start guest OS
> *qemu -kernel kernelimage -initrd initrd.img /dev/zero -append "cmdline"
> -smb /tmp*
> *
> *
> After which I tried to *mount* /tmp on guest OS, using IP address of host
> OS,
> something like *mount //10.80.8.100/tmp /tmp*, but fails saying "*No such
> device"*.

Try //10.0.2.4/qemu (a virtual server inside the virtual net that QEMU
creates with its user-mode IP stack, 'qemu' is the default share name).

> 
> Samba service is also running on Host OS.

Actually not needed for -smb, it will start a separate smbd instance.
But recent samba versions can cause troubles here (hard-coded paths
prevent unprivileged smbd instances). In that case you need to fall back
to the main host smb server. That would be reachable under 10.0.2.2
(default IP config), share names as you assigned.

> 
> Any ideas or any other way to share files between Host OS and Guest OS.
> 
> Thanks for Help.
> 

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Gerd Hoffmann

  Hi,


Works for stdvga and qxl. vmware fails as it has the memory at pci
region 1 (region 0 has ioports) and it seems vgabios isn't prepared to
handle that ...


Do you mean vgabios currently doesn't work with qemu and "-vga vmware" ?


It works, but uses the bochs lfb at the magic address 0xe000.

Killing the 0xe00 alias mapping and switching over to use the pci 
region instead requires some changes in pci_get_lfb_addr to handle the 
situation.



Try sending patches to the mailing list and/or the vgabios home at
http://savannah.nongnu.org/projects/vgabios .


Ok, I'll try to send my patch series down that route.

thanks,
  Gerd




Re: [Qemu-devel] [PATCH 1/3] cursor: add cursor functions.

2010-05-06 Thread Blue Swirl
On 5/6/10, Gerd Hoffmann  wrote:
> >
> > >  +static const char cursor_left_ptr_32[32*32] = {
> > >  +""
> > >  +" X  "
> > >  +" XX "
> > >  +" X.X"
> > >  +" X..X   "
> > >  +" X...X  "
> > >  +" XX "
> > >  +" X.X"
> > >  +" X..X   "
> > >  +" X...X  "
> > >  +" XX "
> > >  +" X.X"
> > >  +" X..X..X"
> > >  +" X.X X..X   "
> > >  +" XX  X..X   "
> > >  +" XX..X  "
> > >  +"  X..X  "
> > >  +"   X..X "
> > >  +"   X..X "
> > >  +"XX  "
> > >  +""
> > >  +};
> > >
> >
> > Is this format standard?
> >
>
>  Inspiried by xpm but simplified a bit as full xpm support just for a
> built-in fallback cursor would have been overkill.  The nice thing about xpm
> is that you can use any text editor for editing icons.

But for X bitmaps, you can use any real drawing program. The same
would also apply to full xpm. But read on.

> > How about using X bitmap format instead:
> > $ cat /usr/include/X11/bitmaps/left_ptr
> >
>
>
> > Then there would be no need of parsing.
> >
>
>  Well.  You still would have to convert it as qemu internal cursor format is
> defined as 32bit depth, rgb with alpha channel.  So it doesn't buy you that
> much.

There's still more wasted memory compared to binary format.

I think the best would be to store only the 32 bit image and perform
any conversions offline. This was the approach with for example
keyboard tables, we don't convert them at startup but the original
tables are retained as comments.




Re: [Qemu-devel] [PATCH 1/3] cursor: add cursor functions.

2010-05-06 Thread Gerd Hoffmann

  +static const char cursor_left_ptr_32[32*32] = {
  +""
  +" X  "
  +" XX "
  +" X.X"
  +" X..X   "
  +" X...X  "
  +" XX "
  +" X.X"
  +" X..X   "
  +" X...X  "
  +" XX "
  +" X.X"
  +" X..X..X"
  +" X.X X..X   "
  +" XX  X..X   "
  +" XX..X  "
  +"  X..X  "
  +"   X..X "
  +"   X..X "
  +"XX  "
  +""
  +};


Is this format standard?


Inspiried by xpm but simplified a bit as full xpm support just for a 
built-in fallback cursor would have been overkill.  The nice thing about 
xpm is that you can use any text editor for editing icons.



How about using X bitmap format instead:
$ cat /usr/include/X11/bitmaps/left_ptr



Then there would be no need of parsing.


Well.  You still would have to convert it as qemu internal cursor format 
is defined as 32bit depth, rgb with alpha channel.  So it doesn't buy 
you that much.


cheers,
  Gerd




Re: [Qemu-devel] [PATCH] block: Fix protocol detection for Windows devices

2010-05-06 Thread Christoph Hellwig
On Thu, May 06, 2010 at 02:56:38PM +0200, Kevin Wolf wrote:
> We can't assume the file protocol for Windows devices, they need the same
> detection as other files for which an explicit protocol is not specified.

Looks good,


Reviewed-by: Christoph Hellwig 





Re: [Qemu-devel] [PATCH] block: Fix bdrv_commit

2010-05-06 Thread Christoph Hellwig
On Thu, May 06, 2010 at 04:44:34PM +0200, Kevin Wolf wrote:
> When reopening the image, don't guess the driver, but use the same driver as
> was used before. This is important if the format=... option was used for that
> image.
> 
> Signed-off-by: Kevin Wolf 

Looks good,


Reviewed-by: Christoph Hellwig 





[Qemu-devel] [PATCH 1/2] parallels: use pread

2010-05-06 Thread Christoph Hellwig

Use pread instead of lseek + read in preparation of using the qemu
block API.

Signed-off-by: Christoph Hellwig 

Index: qemu-kevin/block/parallels.c
===
--- qemu-kevin.orig/block/parallels.c   2010-05-06 21:59:25.809255629 +0200
+++ qemu-kevin/block/parallels.c2010-05-06 22:01:11.343284544 +0200
@@ -83,7 +83,7 @@ static int parallels_open(BlockDriverSta
 
 s->fd = fd;
 
-if (read(fd, &ph, sizeof(ph)) != sizeof(ph))
+if (pread(fd, &ph, sizeof(ph), 0) != sizeof(ph))
 goto fail;
 
 if (memcmp(ph.magic, HEADER_MAGIC, 16) ||
@@ -93,14 +93,11 @@ static int parallels_open(BlockDriverSta
 
 bs->total_sectors = le32_to_cpu(ph.nb_sectors);
 
-if (lseek(s->fd, 64, SEEK_SET) != 64)
-   goto fail;
-
 s->tracks = le32_to_cpu(ph.tracks);
 
 s->catalog_size = le32_to_cpu(ph.catalog_entries);
 s->catalog_bitmap = qemu_malloc(s->catalog_size * 4);
-if (read(s->fd, s->catalog_bitmap, s->catalog_size * 4) !=
+if (pread(s->fd, s->catalog_bitmap, s->catalog_size * 4, 64) !=
s->catalog_size * 4)
goto fail;
 for (i = 0; i < s->catalog_size; i++)
@@ -114,28 +111,18 @@ fail:
 return -1;
 }
 
-static inline int seek_to_sector(BlockDriverState *bs, int64_t sector_num)
+static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
 {
 BDRVParallelsState *s = bs->opaque;
 uint32_t index, offset;
-uint64_t position;
 
 index = sector_num / s->tracks;
 offset = sector_num % s->tracks;
 
-// not allocated
+/* not allocated */
 if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
return -1;
-
-position = (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
-
-//fprintf(stderr, "sector: %llx index=%x offset=%x pointer=%x 
position=%x\n",
-// sector_num, index, offset, s->catalog_bitmap[index], position);
-
-if (lseek(s->fd, position, SEEK_SET) != position)
-   return -1;
-
-return 0;
+return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
 }
 
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
@@ -144,11 +131,13 @@ static int parallels_read(BlockDriverSta
 BDRVParallelsState *s = bs->opaque;
 
 while (nb_sectors > 0) {
-   if (!seek_to_sector(bs, sector_num)) {
-   if (read(s->fd, buf, 512) != 512)
-   return -1;
-   } else
+int64_t position = seek_to_sector(bs, sector_num);
+if (position >= 0) {
+if (pread(s->fd, buf, 512, position) != 512)
+return -1;
+} else {
 memset(buf, 0, 512);
+}
 nb_sectors--;
 sector_num++;
 buf += 512;




[Qemu-devel] [PATCH 2/2] parallels: use qemu block API

2010-05-06 Thread Christoph Hellwig
Use bdrv_pwrite to access the backing device instead of pread, and
convert the driver to implementing the bdrv_open method which gives
it an already opened BlockDriverState for the underlying device.

Signed-off-by: Christoph Hellwig 

Index: qemu-kevin/block/parallels.c
===
--- qemu-kevin.orig/block/parallels.c   2010-05-03 13:04:37.494261748 +0200
+++ qemu-kevin/block/parallels.c2010-05-03 13:05:55.781255810 +0200
@@ -46,7 +46,6 @@ struct parallels_header {
 } __attribute__((packed));
 
 typedef struct BDRVParallelsState {
-int fd;
 
 uint32_t *catalog_bitmap;
 int catalog_size;
@@ -68,22 +67,15 @@ static int parallels_probe(const uint8_t
 return 0;
 }
 
-static int parallels_open(BlockDriverState *bs, const char *filename, int 
flags)
+static int parallels_open(BlockDriverState *bs, int flags)
 {
 BDRVParallelsState *s = bs->opaque;
-int fd, i;
+int i;
 struct parallels_header ph;
 
-fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
-if (fd < 0) {
-return -1;
-}
-
 bs->read_only = 1; // no write support yet
 
-s->fd = fd;
-
-if (pread(fd, &ph, sizeof(ph), 0) != sizeof(ph))
+if (bdrv_pread(bs->file, 0, &ph, sizeof(ph)) != sizeof(ph))
 goto fail;
 
 if (memcmp(ph.magic, HEADER_MAGIC, 16) ||
@@ -97,7 +89,7 @@ static int parallels_open(BlockDriverSta
 
 s->catalog_size = le32_to_cpu(ph.catalog_entries);
 s->catalog_bitmap = qemu_malloc(s->catalog_size * 4);
-if (pread(s->fd, s->catalog_bitmap, s->catalog_size * 4, 64) !=
+if (bdrv_pread(bs->file, 64, s->catalog_bitmap, s->catalog_size * 4) !=
s->catalog_size * 4)
goto fail;
 for (i = 0; i < s->catalog_size; i++)
@@ -107,7 +99,6 @@ static int parallels_open(BlockDriverSta
 fail:
 if (s->catalog_bitmap)
qemu_free(s->catalog_bitmap);
-close(fd);
 return -1;
 }
 
@@ -128,12 +119,10 @@ static int64_t seek_to_sector(BlockDrive
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
-BDRVParallelsState *s = bs->opaque;
-
 while (nb_sectors > 0) {
 int64_t position = seek_to_sector(bs, sector_num);
 if (position >= 0) {
-if (pread(s->fd, buf, 512, position) != 512)
+if (bdrv_pread(bs->file, position, buf, 512) != 512)
 return -1;
 } else {
 memset(buf, 0, 512);
@@ -149,14 +138,13 @@ static void parallels_close(BlockDriverS
 {
 BDRVParallelsState *s = bs->opaque;
 qemu_free(s->catalog_bitmap);
-close(s->fd);
 }
 
 static BlockDriver bdrv_parallels = {
 .format_name   = "parallels",
 .instance_size = sizeof(BDRVParallelsState),
 .bdrv_probe= parallels_probe,
-.bdrv_file_open= parallels_open,
+.bdrv_open = parallels_open,
 .bdrv_read = parallels_read,
 .bdrv_close= parallels_close,
 };




[Qemu-devel] [PATCH] Add missing 'static' attribute

2010-05-06 Thread Stefan Weil
Function usage() is only used locally.

Signed-off-by: Stefan Weil 
---
 tests/qruncom.c |2 +-
 tests/runcom.c  |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qruncom.c b/tests/qruncom.c
index a8d0ef6..079f7a2 100644
--- a/tests/qruncom.c
+++ b/tests/qruncom.c
@@ -89,7 +89,7 @@ int errno;
 
 #define COM_BASE_ADDR0x10100
 
-void usage(void)
+static void usage(void)
 {
 printf("qruncom version 0.1 (c) 2003 Fabrice Bellard\n"
"usage: qruncom file.com\n"
diff --git a/tests/runcom.c b/tests/runcom.c
index cbbaf31..6380566 100644
--- a/tests/runcom.c
+++ b/tests/runcom.c
@@ -25,7 +25,7 @@ _syscall2(int, vm86, int, func, struct vm86plus_struct *, v86)
 
 #define COM_BASE_ADDR0x10100
 
-void usage(void)
+static void usage(void)
 {
 printf("runcom version 0.1 (c) 2003 Fabrice Bellard\n"
"usage: runcom file.com\n"
-- 
1.7.0





[Qemu-devel] [PATCH] mips-dis: Add missing static attributes

2010-05-06 Thread Stefan Weil
mips_abi_choices and mips_arch_choices are only used locally.

Signed-off-by: Stefan Weil 
---
 mips-dis.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mips-dis.c b/mips-dis.c
index 56bffe5..4623a1c 100644
--- a/mips-dis.c
+++ b/mips-dis.c
@@ -3035,7 +3035,7 @@ struct mips_abi_choice
   const char * const *fpr_names;
 };
 
-struct mips_abi_choice mips_abi_choices[] =
+static struct mips_abi_choice mips_abi_choices[] =
 {
   { "numeric", mips_gpr_names_numeric, mips_fpr_names_numeric },
   { "32", mips_gpr_names_oldabi, mips_fpr_names_32 },
@@ -3086,7 +3086,7 @@ struct mips_arch_choice
 
 //~ #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
-const struct mips_arch_choice mips_arch_choices[] =
+static const struct mips_arch_choice mips_arch_choices[] =
 {
   { "numeric", 0, 0, 0, 0,
 mips_cp0_names_numeric, NULL, 0, mips_hwr_names_numeric },
-- 
1.7.0





[Qemu-devel] [PATCH] darwin-user: Add missing static attribute

2010-05-06 Thread Stefan Weil
Function usage is only used locally, so add "static".

Signed-off-by: Stefan Weil 
---
 darwin-user/main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/darwin-user/main.c b/darwin-user/main.c
index ade7d48..175e12f 100644
--- a/darwin-user/main.c
+++ b/darwin-user/main.c
@@ -704,7 +704,7 @@ void cpu_loop(CPUX86State *env)
 }
 #endif
 
-void usage(void)
+static void usage(void)
 {
 printf("qemu-" TARGET_ARCH " version " QEMU_VERSION ", Copyright (c) 
2003-2004 Fabrice Bellard\n"
"usage: qemu-" TARGET_ARCH " [-h] [-d opts] [-L path] [-s size] 
program [arguments...]\n"
-- 
1.7.0





[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Sebastian Herbszt

Gerd Hoffmann wrote:

  Hi,


Works for stdvga and qxl. vmware fails as it has the memory at pci
region 1 (region 0 has ioports) and it seems vgabios isn't prepared to
handle that ...


Do you mean vgabios currently doesn't work with qemu and "-vga vmware" ?


It works, but uses the bochs lfb at the magic address 0xe000.

Killing the 0xe00 alias mapping and switching over to use the pci 
region instead requires some changes in pci_get_lfb_addr to handle the 
situation.


Does this only happen with VBE?

Sebastian





Re: [Qemu-devel] [PATCH] sparc64: implement global translation table entries v1

2010-05-06 Thread Blue Swirl
Thanks, applied all.

On 5/4/10, Igor V. Kovalenko  wrote:
> From: Igor V. Kovalenko 
>
>  - match global tte against any context
>  - show global tte in MMU dump
>
>  v0->v1: added default case to switch statement in demap_tlb
>  - should fix gcc warning about uninitialized context variable
>
>  Signed-off-by: Igor V. Kovalenko 
>  ---
>   target-sparc/cpu.h   |   18 
>   target-sparc/helper.c|   33 -
>   target-sparc/op_helper.c |   53 
> ++
>   3 files changed, 76 insertions(+), 28 deletions(-)
>
>  diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
>  index 0e7f390..b705728 100644
>  --- a/target-sparc/cpu.h
>  +++ b/target-sparc/cpu.h
>  @@ -513,6 +513,24 @@ static inline void cpu_set_cwp(CPUSPARCState *env1, int 
> new_cwp)
>   /* sun4m.c, sun4u.c */
>   void cpu_check_irqs(CPUSPARCState *env);
>
>  +#if defined (TARGET_SPARC64)
>  +
>  +static inline int compare_masked(uint64_t x, uint64_t y, uint64_t mask)
>  +{
>  +return (x & mask) == (y & mask);
>  +}
>  +
>  +#define MMU_CONTEXT_BITS 13
>  +#define MMU_CONTEXT_MASK ((1 << MMU_CONTEXT_BITS) - 1)
>  +
>  +static inline int tlb_compare_context(const SparcTLBEntry *tlb,
>  +  uint64_t context)
>  +{
>  +return compare_masked(context, tlb->tag, MMU_CONTEXT_MASK);
>  +}
>  +
>  +#endif
>  +
>   static inline void PUT_PSR(CPUSPARCState *env1, target_ulong val)
>   {
>  env1->psr = val & PSR_ICC;
>  diff --git a/target-sparc/helper.c b/target-sparc/helper.c
>  index 1f0f7d4..4ece01b 100644
>  --- a/target-sparc/helper.c
>  +++ b/target-sparc/helper.c
>  @@ -381,17 +381,11 @@ static inline target_phys_addr_t 
> ultrasparc_truncate_physical(uint64_t x)
>   * UltraSparc IIi I/DMMUs
>   */
>
>  -static inline int compare_masked(uint64_t x, uint64_t y, uint64_t mask)
>  -{
>  -return (x & mask) == (y & mask);
>  -}
>  -
>   // Returns true if TTE tag is valid and matches virtual address value in 
> context
>   // requires virtual address mask value calculated from TTE entry size
>   static inline int ultrasparc_tag_match(SparcTLBEntry *tlb,
> uint64_t address, uint64_t context,
>  -   target_phys_addr_t *physical,
>  -   int is_nucleus)
>  +   target_phys_addr_t *physical)
>   {
>  uint64_t mask;
>
>  @@ -413,8 +407,7 @@ static inline int ultrasparc_tag_match(SparcTLBEntry 
> *tlb,
>
>  // valid, context match, virtual address match?
>  if (TTE_IS_VALID(tlb->tte) &&
>  -((is_nucleus && compare_masked(0, tlb->tag, 0x1fff))
>  - || TTE_IS_GLOBAL(tlb->tte) || compare_masked(context, tlb->tag, 
> 0x1fff))
>  +(TTE_IS_GLOBAL(tlb->tte) || tlb_compare_context(tlb, context))
>  && compare_masked(address, tlb->tag, mask))
>  {
>  // decode physical address
>  @@ -431,7 +424,6 @@ static int get_physical_address_data(CPUState *env,
>   {
>  unsigned int i;
>  uint64_t context;
>  -int is_nucleus;
>
>  if ((env->lsu & DMMU_E) == 0) { /* DMMU disabled */
>  *physical = ultrasparc_truncate_physical(address);
>  @@ -439,14 +431,16 @@ static int get_physical_address_data(CPUState *env,
>  return 0;
>  }
>
>  -context = env->dmmu.mmu_primary_context & 0x1fff;
>  -is_nucleus = env->tl > 0;
>  +if (env->tl == 0) {
>  +context = env->dmmu.mmu_primary_context & 0x1fff;
>  +} else {
>  +context = 0;
>  +}
>
>  for (i = 0; i < 64; i++) {
>  // ctx match, vaddr match, valid?
>  if (ultrasparc_tag_match(&env->dtlb[i],
>  - address, context, physical,
>  - is_nucleus)) {
>  + address, context, physical)) {
>  // access ok?
>  if (((env->dtlb[i].tte & 0x4) && is_user) ||
>  (!(env->dtlb[i].tte & 0x2) && (rw == 1))) {
>  @@ -492,7 +486,6 @@ static int get_physical_address_code(CPUState *env,
>   {
>  unsigned int i;
>  uint64_t context;
>  -int is_nucleus;
>
>  if ((env->lsu & IMMU_E) == 0 || (env->pstate & PS_RED) != 0) {
>  /* IMMU disabled */
>  @@ -501,14 +494,16 @@ static int get_physical_address_code(CPUState *env,
>  return 0;
>  }
>
>  -context = env->dmmu.mmu_primary_context & 0x1fff;
>  -is_nucleus = env->tl > 0;
>  +if (env->tl == 0) {
>  +context = env->dmmu.mmu_primary_context & 0x1fff;
>  +} else {
>  +context = 0;
>  +}
>
>  for (i = 0; i < 64; i++) {
>  // ctx match, vaddr match, valid?
>  if (ultrasparc_tag_match(&env->itlb[i],
>  - address, context, physical,
>  - is_nucleus)) {
>  + address, context, physical))

[Qemu-devel] Re: vgabios + qemu: issues and plans.

2010-05-06 Thread Gerd Hoffmann

On 05/06/10 22:19, Sebastian Herbszt wrote:

Gerd Hoffmann wrote:

Hi,


Works for stdvga and qxl. vmware fails as it has the memory at pci
region 1 (region 0 has ioports) and it seems vgabios isn't prepared to
handle that ...


Do you mean vgabios currently doesn't work with qemu and "-vga vmware" ?


It works, but uses the bochs lfb at the magic address 0xe000.

Killing the 0xe00 alias mapping and switching over to use the pci
region instead requires some changes in pci_get_lfb_addr to handle the
situation.


Does this only happen with VBE?


Yes.

cheers,
  Gerd




Re: [Qemu-devel] Re: [PATCH] Remove IO_MEM_SUBWIDTH.

2010-05-06 Thread Artyom Tarasenko
2010/4/28 Artyom Tarasenko :
> 2010/4/27 Richard Henderson :
>> On 04/26/2010 02:54 PM, Artyom Tarasenko wrote:
>>> This patch introduces a regression. qemu crashes on lance test:
>>
>> I'm not sure how to get to this, since the sparc-test images don't
>> include ifconfig, and I havn't been able to find a sparc install
>> image that works (doesn't support sparc32 or sparc64 fails to load).
>>
>> That said, try this and see if it works.
>>
>>
>> r~
>>
>> ---
>> diff --git a/exec.c b/exec.c
>> index 14d1fd7..572d3fd 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -3286,6 +3286,8 @@ static int cpu_register_io_memory_fixed(int io_index,
>>                                         CPUWriteMemoryFunc * const 
>> *mem_write,
>>                                         void *opaque)
>>  {
>> +    int i;
>> +
>>     if (io_index <= 0) {
>>         io_index = get_free_io_mem_idx();
>>         if (io_index == -1)
>> @@ -3296,8 +3298,14 @@ static int cpu_register_io_memory_fixed(int io_index,
>>             return -1;
>>     }
>>
>> -    memcpy(io_mem_read[io_index], mem_read, 3 * sizeof(CPUReadMemoryFunc*));
>> -    memcpy(io_mem_write[io_index], mem_write, 3 * 
>> sizeof(CPUWriteMemoryFunc*));
>> +    for (i = 0; i < 3; ++i) {
>> +        io_mem_read[io_index][i]
>> +            = (mem_read[i] ? mem_read[i] : unassigned_mem_read[i]);
>> +    }
>> +    for (i = 0; i < 3; ++i) {
>> +        io_mem_write[io_index][i]
>> +            = (mem_write[i] ? mem_write[i] : unassigned_mem_write[i]);
>> +    }
>>     io_mem_opaque[io_index] = opaque;
>>
>>     return (io_index << IO_MEM_SHIFT);
>>

Why the fix didn't make it into the git?
Does it introduce other problems?

> Looks good, thanks.
>
> Acked-by: Artyom Tarasenko 
>

-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/




[Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset

2010-05-06 Thread Marcelo Tosatti
On Thu, Nov 19, 2009 at 01:19:55PM +0100, Jan Kiszka wrote:
> Hi,
> 
> I just managed to push a qemu-kvm process (git rev. b496fe3431) into an
> endless loop in qcow2_alloc_cluster_offset, namely over
> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight):
> 
> (gdb) bt
> #0  0x0048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, 
> offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at 
> /data/qemu-kvm/block/qcow2-cluster.c:750
> #1  0x004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at 
> /data/qemu-kvm/block/qcow2.c:587
> #2  0x00482a44 in qcow_aio_writev (bs=, 
> sector_num=, qiov=, 
> nb_sectors=, cb=, opaque= optimized out>) at /data/qemu-kvm/block/qcow2.c:645
> #3  0x00470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, 
> qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 , 
> opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362
> #4  0x00472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, 
> buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736
> #5  0x00435581 in ide_sector_write (s=0xc92650) at 
> /data/qemu-kvm/hw/ide/core.c:622
> #6  0x00425fc2 in kvm_handle_io (env=) at 
> /data/qemu-kvm/kvm-all.c:553
> #7  kvm_run (env=) at /data/qemu-kvm/qemu-kvm.c:964
> #8  0x00426049 in kvm_cpu_exec (env=0x1000) at 
> /data/qemu-kvm/qemu-kvm.c:1651
> #9  0x0042627d in kvm_main_loop_cpu (_env=) at 
> /data/qemu-kvm/qemu-kvm.c:1893
> #10 ap_main_loop (_env=) at 
> /data/qemu-kvm/qemu-kvm.c:1943
> #11 0x7f48ae89d070 in start_thread () from /lib64/libpthread.so.0
> #12 0x7f48abf0711d in clone () from /lib64/libc.so.6
> #13 0x in ?? ()
> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $5 = (struct QCowL2Meta *) 0xcb3568
> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, 
> depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight 
> = {le_next = 0xcb3568, le_prev = 0xc4ebd8}}
> 
> So next == first.
> 

Seen the exact same bug twice in a row while installing FC12 with IDE
disk, current qemu-kvm.git. 

qemu-system-x86_64 -drive file=/root/images/fc12-ide.img,cache=writeback \
-m 1000  -vnc :1 \
-net nic,model=virtio \
-net tap,script=/root/ifup.sh -serial stdio \
-cdrom /root/iso/linux/Fedora-12-x86_64-DVD.iso -monitor
telnet::4445,server,nowait -usbdevice tablet

Can't reproduce though.





Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Rusty Russell
On Thu, 6 May 2010 07:30:00 pm Avi Kivity wrote:
> On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:
> > +   /* We publish the last-seen used index at the end of the available ring.
> > +* It is at the end for backwards compatibility. */
> > +   vr->last_used_idx =&(vr)->avail->ring[num];
> > +   /* Verify that last used index does not spill over the used ring. */
> > +   BUG_ON((void *)vr->last_used_idx +
> > +  sizeof *vr->last_used_idx>  (void *)vr->used);
> >   }
> >
> 
> Shouldn't this be on its own cache line?

It's next to the available ring; because that's where the guest publishes
its data.  That whole page is guest-write, host-read.

Putting it on a cacheline by itself would be a slight pessimization; the host
cpu would have to get the last_used_idx cacheline and the avail descriptor
cacheline every time.  This way, they are sometimes the same cacheline.

Hope that clarifies,
Rusty.




[Qemu-devel] Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Rusty Russell
On Thu, 6 May 2010 03:49:46 pm Michael S. Tsirkin wrote:
> Now, I also added an mb() in guest between read and write so
> that last used index write can not get ahead of used index read.
> It does feel good to have it there, but I can not say why
> it's helpful. Works fine without it, but then these
> subtle races might be hard to trigger. What do you think?

I couldn't see that in the patch?  I don't think it's necessary
though, since the write of depends last_used depends on the read of
used (and no platform we care about would reorder such a thing).

I'm reasonably happy, but we should write some convenient test for
missing interrupts.

I'm thinking of a sender which does a loop: blasts 1MB of UDP packets,
then prints the time and sleep(1).  The receiver would print the time
every 1MB of received data.  The two times should almost exactly correspond.

Assuming that the network doesn't overflow and lose stuff, this should
identify any missing wakeup/interrupts (depending on direction used).

Cheers,
Rusty.