date:20110302

Added CC to Tristan. I doubt that he is still interested in EFI, though.

On Thu, Mar 03, 2011 at 04:46:34PM +0900, Isaku Yamahata wrote:
> 
> Seabios has the patch to address the similar issue with
> the changeset of b82a1e49fc0e72fb9bf1a642d6aa707345b0f398,
> which enables memory/io unconditionally.
> 
> I suppose the EFI bios is very old so that it has the same issue.
> I think the following file is the one to modify.
> 
> efi-vfirmware.hg/edk2-sparse/EdkQemuPkg/Pei/BochsPciScan/BochsPciScan.c
> 
> thanks,
> 
> On Thu, Mar 03, 2011 at 08:43:11AM +0200, vagran wrote:
> > I am using TianoCore EFI by Tristan Gingold which is published
> > on http://wiki.qemu.org/download/efi-bios.tar.bz2. If you would try
> > to load it on Qemu 0.14.0 (built either for i386 or x86_64) you will
> > see nothing on VGA display or serial console. But it still will be
> > able to load OS after timeout if you have proper disk image.
> >> It seems your EFI BIOS doesn't enable memor, io or master bits
> >> in command register.
> >>   
> >>
> >> or disableintx.
> >>   
> > I have checked your guess and figured out that it works only
> > if both memory and io bits are not cleared. So the following
> > patch also works:
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 8b76cea..bcf9b16 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -163,8 +163,9 @@ void pci_device_reset(PCIDevice *dev)
> > pci_device_deassert_intx(dev);
> > /* Clear all writeable bits */
> > pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> > - pci_get_word(dev->wmask + PCI_COMMAND) |
> > - pci_get_word(dev->w1cmask + PCI_COMMAND));
> > + (pci_get_word(dev->wmask + PCI_COMMAND) |
> > + pci_get_word(dev->w1cmask +  
> > PCI_COMMAND)) &
> > + ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY));
> > pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> >  pci_get_word(dev->wmask + PCI_STATUS) |
> >  pci_get_word(dev->w1cmask + PCI_STATUS));
> >
> > So probably the problem is in EFI BIOS. But I was not able to find
> > its source code. Anyone knows how is it built?
> >
> > Best regards,
> > Artyom.
> >
> >
> > Isaku Yamahata wrote:
> >> On Thu, Mar 03, 2011 at 12:03:53PM +0900, Isaku Yamahata wrote:
> >>   
> >>> Hi. Thank you for reporting.
> >>> Can you elaborate on the changeset that you pointed out and
> >>> your work around?
> >>>
> >>> Regarding to the changeset, it had the issue, but I suppose
> >>> 80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
> >>> Do you found any other issue?
> >>>
> >>> Regarding to your workaround, what was the problem?
> >>> What EFI BIOS are you using? Tiano-core derivatives that
> >>> Tristan Gingold worked on? Or other one?
> >>> It seems your EFI BIOS doesn't enable memor, io or master bits
> >>> in command register.
> >>> 
> >>
> >> or disableintx.
> >>
> >>   
> >>> If so, the issue is in the bios, not qemu.
> >>>
> >>> thanks,
> >>>
> >>> On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:
> >>> 
>  vagran wrote:
>    
> > Hi,
> > I have noted that Qemu VGA and serial console with EFI BIOS 
> > stopped  working in
> > 0.14.0 (and in latest development snapshot is still not working). 
> >  Everything was
> > fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
> > properly configured on used disk image. The only effect is that   
> > neither VGA nor
> > serial console is not functioning. After short investigation I 
> > have  discovered
> > that this functionality was broken by this commit:
> >
> > commit 9bb3358627d87d8de25fb41b7276575539d799a7
> > Author: Isaku Yamahata 
> > Date:   Fri Nov 19 18:56:02 2010 +0900
> >
> > Do you have any idea how this change could affect EFI consoles?
> >
> > 
>  After further investigation I have found that the following patch 
>  provides
>  a workaround for the problem, may be it could be useful for somebody who
>  is more familiar with Qemu PCI code:
> 
>  diff --git a/hw/pci.c b/hw/pci.c
>  index 8b76cea..06dd7ab 100644
>  --- a/hw/pci.c
>  +++ b/hw/pci.c
>  @@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
>  pci_update_irq_status(dev);
>  pci_device_deassert_intx(dev);
>  /* Clear all writeable bits */
>  +#if 0
>  pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
>   pci_get_word(dev->wmask + PCI_COMMAND) |
>   pci_get_word(dev->w1cmask + 
>  PCI_COMMAND));
>  +#endif
>  pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
>   pci_get_word(dev->wmask + PCI_STATUS) |
>   pci_get

[Qemu-devel] Re: [PATCH v4 2/2] rtl8139: add vlan tag insertion

2011-03-02 Thread Michael S. Tsirkin

On Wed, Mar 02, 2011 at 05:36:20PM -0500, Benjamin Poirier wrote:
> Add support to the emulated hardware to insert vlan tags in packets
> going from the guest to the network.
> 
> Signed-off-by: Benjamin Poirier 
> Cc: Igor V. Kovalenko 
> Cc: Jason Wang 
> Cc: Michael S. Tsirkin 
> ---
>  hw/rtl8139.c |  102 
> ++
>  1 files changed, 74 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/rtl8139.c b/hw/rtl8139.c
> index 6e770bd..0ea79e4 100644
> --- a/hw/rtl8139.c
> +++ b/hw/rtl8139.c
> @@ -832,11 +832,14 @@ static int rtl8139_can_receive(VLANClientState *nc)
>  }
>  }
>  
> -static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, 
> size_t size_, int do_interrupt)
> +static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf,
> +size_t buf_size, int do_interrupt, const uint8_t *dot1q_buf)
>  {
>  RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
> +/* size_ is the total length of argument buffers */
> +int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0);
> +/* size is the length of the buffer passed to the driver */
>  int size = size_;
> -const uint8_t *dot1q_buf;
>  const uint8_t *next_part;
>  size_t next_part_size;
>  
> @@ -1018,10 +1021,13 @@ static ssize_t rtl8139_do_receive(VLANClientState 
> *nc, const uint8_t *buf, size_
>  /* next_part starts right after the vlan header (if any), at the
>   * ethertype for the payload */
>  next_part = &buf[ETHER_ADDR_LEN * 2];
> -if (s->CpCmd & CPlusRxVLAN && be16_to_cpup((uint16_t *)
> -&buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN) {
> -dot1q_buf = &buf[ETHER_ADDR_LEN * 2];
> -next_part += VLAN_HDR_LEN;
> +if (s->CpCmd & CPlusRxVLAN && (dot1q_buf || be16_to_cpup((uint16_t *)
> +&buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN)) {
> +if (!dot1q_buf) {
> +/* the tag is in the buffer */
> +dot1q_buf = &buf[ETHER_ADDR_LEN * 2];
> +next_part += VLAN_HDR_LEN;
> +}
>  size -= VLAN_HDR_LEN;
>  
>  rxdw1 &= ~CP_RX_VLAN_TAG_MASK;
> @@ -1034,9 +1040,8 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
> const uint8_t *buf, size_
>  } else {
>  /* reset VLAN tag flag */
>  rxdw1 &= ~CP_RX_TAVA;
> -dot1q_buf = NULL;
>  }
> -next_part_size = buf + size_ - next_part;
> +next_part_size = buf + buf_size - next_part;
>  
>  /* if too small buffer, then expand it */
>  if (size < MIN_BUF_SIZE) {
> @@ -1164,10 +1169,18 @@ static ssize_t rtl8139_do_receive(VLANClientState 
> *nc, const uint8_t *buf, size_
>  
>  /* if too small buffer, then expand it */
>  if (size < MIN_BUF_SIZE) {
> -memcpy(buf1, buf, size);
> +if (unlikely(dot1q_buf)) {

unlikely kind of iffy here

> +memcpy(buf1, buf, 2 * ETHER_ADDR_LEN);
> +memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN);
> +memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 *
> +ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN);
> +} else {
> +memcpy(buf1, buf, size);
> +}
>  memset(buf1 + size, 0, MIN_BUF_SIZE - size);
>  buf = buf1;
>  size = MIN_BUF_SIZE;
> +dot1q_buf = NULL;
>  }
>  
>  /* begin ring receiver mode */
> @@ -1196,8 +1209,19 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
> const uint8_t *buf, size_
>  rtl8139_write_buffer(s, (uint8_t *)&val, 4);
>  
>  /* receive/copy to target memory */
> -rtl8139_write_buffer(s, buf, size);
> -val = rtl8139_crc32(0, buf, size);
> +if (unlikely(dot1q_buf)) {
> +rtl8139_write_buffer(s, buf, 2 * ETHER_ADDR_LEN);
> +val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
> +rtl8139_write_buffer(s, dot1q_buf, VLAN_HDR_LEN);
> +val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
> +rtl8139_write_buffer(s, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
> +ETHER_ADDR_LEN);
> +val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
> +ETHER_ADDR_LEN);
> +} else {
> +rtl8139_write_buffer(s, buf, size);
> +val = rtl8139_crc32(0, buf, size);
> +}
>  
>  /* write checksum */
>  #if defined (RTL8139_CALCULATE_RXCRC)
> @@ -1227,7 +1251,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
> const uint8_t *buf, size_
>  
>  static ssize_t rtl8139_receive(VLANClientState *nc, const uint8_t *buf, 
> size_t size)
>  {
> -return rtl8139_do_receive(nc, buf, size, 1);
> +return rtl8139_do_receive(nc, buf, size, 1, NULL);
>  }
>  
>  static void rtl8139_reset_rxring

Re: [Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0


Seabios has the patch to address the similar issue with
the changeset of b82a1e49fc0e72fb9bf1a642d6aa707345b0f398,
which enables memory/io unconditionally.

I suppose the EFI bios is very old so that it has the same issue.
I think the following file is the one to modify.

efi-vfirmware.hg/edk2-sparse/EdkQemuPkg/Pei/BochsPciScan/BochsPciScan.c

thanks,

On Thu, Mar 03, 2011 at 08:43:11AM +0200, vagran wrote:
> I am using TianoCore EFI by Tristan Gingold which is published
> on http://wiki.qemu.org/download/efi-bios.tar.bz2. If you would try
> to load it on Qemu 0.14.0 (built either for i386 or x86_64) you will
> see nothing on VGA display or serial console. But it still will be
> able to load OS after timeout if you have proper disk image.
>> It seems your EFI BIOS doesn't enable memor, io or master bits
>> in command register.
>>   
>>
>> or disableintx.
>>   
> I have checked your guess and figured out that it works only
> if both memory and io bits are not cleared. So the following
> patch also works:
> diff --git a/hw/pci.c b/hw/pci.c
> index 8b76cea..bcf9b16 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -163,8 +163,9 @@ void pci_device_reset(PCIDevice *dev)
> pci_device_deassert_intx(dev);
> /* Clear all writeable bits */
> pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> - pci_get_word(dev->wmask + PCI_COMMAND) |
> - pci_get_word(dev->w1cmask + PCI_COMMAND));
> + (pci_get_word(dev->wmask + PCI_COMMAND) |
> + pci_get_word(dev->w1cmask +  
> PCI_COMMAND)) &
> + ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY));
> pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
>  pci_get_word(dev->wmask + PCI_STATUS) |
>  pci_get_word(dev->w1cmask + PCI_STATUS));
>
> So probably the problem is in EFI BIOS. But I was not able to find
> its source code. Anyone knows how is it built?
>
> Best regards,
> Artyom.
>
>
> Isaku Yamahata wrote:
>> On Thu, Mar 03, 2011 at 12:03:53PM +0900, Isaku Yamahata wrote:
>>   
>>> Hi. Thank you for reporting.
>>> Can you elaborate on the changeset that you pointed out and
>>> your work around?
>>>
>>> Regarding to the changeset, it had the issue, but I suppose
>>> 80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
>>> Do you found any other issue?
>>>
>>> Regarding to your workaround, what was the problem?
>>> What EFI BIOS are you using? Tiano-core derivatives that
>>> Tristan Gingold worked on? Or other one?
>>> It seems your EFI BIOS doesn't enable memor, io or master bits
>>> in command register.
>>> 
>>
>> or disableintx.
>>
>>   
>>> If so, the issue is in the bios, not qemu.
>>>
>>> thanks,
>>>
>>> On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:
>>> 
 vagran wrote:
   
> Hi,
> I have noted that Qemu VGA and serial console with EFI BIOS 
> stopped  working in
> 0.14.0 (and in latest development snapshot is still not working). 
>  Everything was
> fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
> properly configured on used disk image. The only effect is that   
> neither VGA nor
> serial console is not functioning. After short investigation I 
> have  discovered
> that this functionality was broken by this commit:
>
> commit 9bb3358627d87d8de25fb41b7276575539d799a7
> Author: Isaku Yamahata 
> Date:   Fri Nov 19 18:56:02 2010 +0900
>
> Do you have any idea how this change could affect EFI consoles?
>
> 
 After further investigation I have found that the following patch provides
 a workaround for the problem, may be it could be useful for somebody who
 is more familiar with Qemu PCI code:

 diff --git a/hw/pci.c b/hw/pci.c
 index 8b76cea..06dd7ab 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
 pci_update_irq_status(dev);
 pci_device_deassert_intx(dev);
 /* Clear all writeable bits */
 +#if 0
 pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
  pci_get_word(dev->wmask + PCI_COMMAND) |
  pci_get_word(dev->w1cmask + PCI_COMMAND));
 +#endif
 pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
  pci_get_word(dev->wmask + PCI_STATUS) |
  pci_get_word(dev->w1cmask + PCI_STATUS));

 Best regards,
 Artyom.

   
>>> -- 
>>> yamahata
>>>
>>> 
>

-- 
yamahata

Re: [Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0

On Thu, Mar 03, 2011 at 08:43:11AM +0200, vagran wrote:
> I am using TianoCore EFI by Tristan Gingold which is published
> on http://wiki.qemu.org/download/efi-bios.tar.bz2. If you would try
> to load it on Qemu 0.14.0 (built either for i386 or x86_64) you will
> see nothing on VGA display or serial console. But it still will be
> able to load OS after timeout if you have proper disk image.

Thank you for the info. Then I can also test it locally.

thanks,

>> It seems your EFI BIOS doesn't enable memor, io or master bits
>> in command register.
>>   
>>
>> or disableintx.
>>   
> I have checked your guess and figured out that it works only
> if both memory and io bits are not cleared. So the following
> patch also works:
> diff --git a/hw/pci.c b/hw/pci.c
> index 8b76cea..bcf9b16 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -163,8 +163,9 @@ void pci_device_reset(PCIDevice *dev)
> pci_device_deassert_intx(dev);
> /* Clear all writeable bits */
> pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> - pci_get_word(dev->wmask + PCI_COMMAND) |
> - pci_get_word(dev->w1cmask + PCI_COMMAND));
> + (pci_get_word(dev->wmask + PCI_COMMAND) |
> + pci_get_word(dev->w1cmask +  
> PCI_COMMAND)) &
> + ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY));
> pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
>  pci_get_word(dev->wmask + PCI_STATUS) |
>  pci_get_word(dev->w1cmask + PCI_STATUS));
>
> So probably the problem is in EFI BIOS. But I was not able to find
> its source code. Anyone knows how is it built?
>
> Best regards,
> Artyom.
>
>
> Isaku Yamahata wrote:
>> On Thu, Mar 03, 2011 at 12:03:53PM +0900, Isaku Yamahata wrote:
>>   
>>> Hi. Thank you for reporting.
>>> Can you elaborate on the changeset that you pointed out and
>>> your work around?
>>>
>>> Regarding to the changeset, it had the issue, but I suppose
>>> 80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
>>> Do you found any other issue?
>>>
>>> Regarding to your workaround, what was the problem?
>>> What EFI BIOS are you using? Tiano-core derivatives that
>>> Tristan Gingold worked on? Or other one?
>>> It seems your EFI BIOS doesn't enable memor, io or master bits
>>> in command register.
>>> 
>>
>> or disableintx.
>>
>>   
>>> If so, the issue is in the bios, not qemu.
>>>
>>> thanks,
>>>
>>> On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:
>>> 
 vagran wrote:
   
> Hi,
> I have noted that Qemu VGA and serial console with EFI BIOS 
> stopped  working in
> 0.14.0 (and in latest development snapshot is still not working). 
>  Everything was
> fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
> properly configured on used disk image. The only effect is that   
> neither VGA nor
> serial console is not functioning. After short investigation I 
> have  discovered
> that this functionality was broken by this commit:
>
> commit 9bb3358627d87d8de25fb41b7276575539d799a7
> Author: Isaku Yamahata 
> Date:   Fri Nov 19 18:56:02 2010 +0900
>
> Do you have any idea how this change could affect EFI consoles?
>
> 
 After further investigation I have found that the following patch provides
 a workaround for the problem, may be it could be useful for somebody who
 is more familiar with Qemu PCI code:

 diff --git a/hw/pci.c b/hw/pci.c
 index 8b76cea..06dd7ab 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
 pci_update_irq_status(dev);
 pci_device_deassert_intx(dev);
 /* Clear all writeable bits */
 +#if 0
 pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
  pci_get_word(dev->wmask + PCI_COMMAND) |
  pci_get_word(dev->w1cmask + PCI_COMMAND));
 +#endif
 pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
  pci_get_word(dev->wmask + PCI_STATUS) |
  pci_get_word(dev->w1cmask + PCI_STATUS));

 Best regards,
 Artyom.

   
>>> -- 
>>> yamahata
>>>
>>> 
>

-- 
yamahata

[Qemu-devel] [PATCH v2 3/3] correct VNC_DIRTY_WORDS on 64 bit machine

VNC_DIRTY_WORDS is wrong on 64 bit long machine.

Signed-off-by: Wen Congyang 

---
 ui/vnc.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 8a1e7b9..5fc54e5 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -81,7 +81,7 @@ typedef void VncSendHextileTile(VncState *vs,
 
 #define VNC_MAX_WIDTH 2560
 #define VNC_MAX_HEIGHT 2048
-#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))
+#define VNC_DIRTY_WORDS (DIV_ROUND_UP(VNC_MAX_WIDTH, (16 * BITS_PER_LONG)))
 
 #define VNC_STAT_RECT  64
 #define VNC_STAT_COLS (VNC_MAX_WIDTH / VNC_STAT_RECT)
-- 
1.7.1

[Qemu-devel] Re: [PATCH 3/3] correct VNC_DIRTY_WORDS on 64 bit machine

At 03/03/2011 02:41 PM, Corentin Chary Write:
> On Thu, Mar 3, 2011 at 3:44 AM, Wen Congyang  wrote:
>> VNC_DIRTY_WORDS is wrong on 64 bit long machine.
>>
>> Signed-off-by: Wen Congyang 
>>
>> ---
>>  ui/vnc.h |3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/ui/vnc.h b/ui/vnc.h
>> index 8a1e7b9..239a7a7 100644
>> --- a/ui/vnc.h
>> +++ b/ui/vnc.h
>> @@ -81,7 +81,8 @@ typedef void VncSendHextileTile(VncState *vs,
>>
>>  #define VNC_MAX_WIDTH 2560
>>  #define VNC_MAX_HEIGHT 2048
>> -#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))
>> +#define divideup(x, y)  (((x) + ((y) - 1)) / (y))
> 
> osdep.h:#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

Thank you for pointing out it. I will update my patch.

> 
>

Re: [Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0

2011-03-02 Thread vagran


I am using TianoCore EFI by Tristan Gingold which is published
on http://wiki.qemu.org/download/efi-bios.tar.bz2. If you would try
to load it on Qemu 0.14.0 (built either for i386 or x86_64) you will
see nothing on VGA display or serial console. But it still will be
able to load OS after timeout if you have proper disk image.

It seems your EFI BIOS doesn't enable memor, io or master bits
in command register.
  


or disableintx.
  

I have checked your guess and figured out that it works only
if both memory and io bits are not cleared. So the following
patch also works:
diff --git a/hw/pci.c b/hw/pci.c
index 8b76cea..bcf9b16 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -163,8 +163,9 @@ void pci_device_reset(PCIDevice *dev)
pci_device_deassert_intx(dev);
/* Clear all writeable bits */
pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
- pci_get_word(dev->wmask + PCI_COMMAND) |
- pci_get_word(dev->w1cmask + PCI_COMMAND));
+ (pci_get_word(dev->wmask + PCI_COMMAND) |
+ pci_get_word(dev->w1cmask + 
PCI_COMMAND)) &

+ ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY));
pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
 pci_get_word(dev->wmask + PCI_STATUS) |
 pci_get_word(dev->w1cmask + PCI_STATUS));

So probably the problem is in EFI BIOS. But I was not able to find
its source code. Anyone knows how is it built?

Best regards,
Artyom.


Isaku Yamahata wrote:

On Thu, Mar 03, 2011 at 12:03:53PM +0900, Isaku Yamahata wrote:
  

Hi. Thank you for reporting.
Can you elaborate on the changeset that you pointed out and
your work around?

Regarding to the changeset, it had the issue, but I suppose
80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
Do you found any other issue?

Regarding to your workaround, what was the problem?
What EFI BIOS are you using? Tiano-core derivatives that
Tristan Gingold worked on? Or other one?
It seems your EFI BIOS doesn't enable memor, io or master bits
in command register.



or disableintx.

  

If so, the issue is in the bios, not qemu.

thanks,

On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:


vagran wrote:
  

Hi,
I have noted that Qemu VGA and serial console with EFI BIOS stopped  
working in
0.14.0 (and in latest development snapshot is still not working).  
Everything was

fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
properly configured on used disk image. The only effect is that  
neither VGA nor
serial console is not functioning. After short investigation I have  
discovered

that this functionality was broken by this commit:

commit 9bb3358627d87d8de25fb41b7276575539d799a7
Author: Isaku Yamahata 
Date:   Fri Nov 19 18:56:02 2010 +0900

Do you have any idea how this change could affect EFI consoles?



After further investigation I have found that the following patch provides
a workaround for the problem, may be it could be useful for somebody who
is more familiar with Qemu PCI code:

diff --git a/hw/pci.c b/hw/pci.c
index 8b76cea..06dd7ab 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
pci_update_irq_status(dev);
pci_device_deassert_intx(dev);
/* Clear all writeable bits */
+#if 0
pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
 pci_get_word(dev->wmask + PCI_COMMAND) |
 pci_get_word(dev->w1cmask + PCI_COMMAND));
+#endif
pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
 pci_get_word(dev->wmask + PCI_STATUS) |
 pci_get_word(dev->w1cmask + PCI_STATUS));

Best regards,
Artyom.

  

--
yamahata

[Qemu-devel] Re: [PATCH 3/3] correct VNC_DIRTY_WORDS on 64 bit machine

2011-03-02 Thread Corentin Chary

On Thu, Mar 3, 2011 at 3:44 AM, Wen Congyang  wrote:
> VNC_DIRTY_WORDS is wrong on 64 bit long machine.
>
> Signed-off-by: Wen Congyang 
>
> ---
>  ui/vnc.h |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/ui/vnc.h b/ui/vnc.h
> index 8a1e7b9..239a7a7 100644
> --- a/ui/vnc.h
> +++ b/ui/vnc.h
> @@ -81,7 +81,8 @@ typedef void VncSendHextileTile(VncState *vs,
>
>  #define VNC_MAX_WIDTH 2560
>  #define VNC_MAX_HEIGHT 2048
> -#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))
> +#define divideup(x, y)  (((x) + ((y) - 1)) / (y))

osdep.h:#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))


-- 
Corentin Chary
http://xf.iksaif.net

[Qemu-devel] RE: Printed Roll-up Banners USD7 Only (GIGAPrint Overseas)

2011-03-02 Thread CarmanChow

Dear  IBT Ing. B?ro Trncik V.   

GIGAPrint Ltd. | 11/F, Fu Hop Fty Bldg, 209-211 Wai Yip St, Kwun Tong, Kowloon, 
HK | 23892088

Re: [Qemu-devel] [PATCH v2] disable sigcld handling before calling pclose()

At 12/21/2010 12:05 PM, Wen Congyang Write:
> When I use the command 'virsh save' to save the domain state,
> I receive the following error message:
> operation failed: Migration unexpectedly failed.
> 
> I debug the qemu by adding some printf(), and find the function
> pclose() returns -1.
> 
> I use strace to trace qemu, the log is as the following:
> ==
> close(17)   = 0
> --- SIGCHLD (Child exited) @ 0 (0) ---
> wait4(-1, NULL, WNOHANG, NULL)  = 22016
> rt_sigreturn(0) = 0
> wait4(22016, 0x7fff7f1034fc, 0, NULL)   = -1 ECHILD (No child processes)
> ==
> 
> We wait the child twice: one is in signal SIGCHLD handling and the other
> one is in pclose().
> 
> We should disable sigcld handling before calling pclose().
> 
> v2:
> - Add stub functions for Win32
> 
> Signed-off-by: Wen Congyang 
> 
> ---
>  os-posix.c  |   19 +++
>  qemu-os-posix.h |2 ++
>  qemu-os-win32.h |2 ++
>  savevm.c|2 ++
>  4 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/os-posix.c b/os-posix.c
> index 38c29d1..b163995 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -86,6 +86,25 @@ void os_setup_signal_handling(void)
>  sigaction(SIGCHLD, &act, NULL);
>  }
>  
> +void os_stop_sigchld_handling(void)
> +{
> +struct sigaction act;
> +
> +memset(&act, 0, sizeof(act));
> +act.sa_handler = SIG_DFL;
> +sigaction(SIGCHLD, &act, NULL);
> +}
> +
> +void os_resume_sigchld_handling(void)
> +{
> +struct sigaction act;
> +
> +memset(&act, 0, sizeof(act));
> +act.sa_handler = sigchld_handler;
> +act.sa_flags = SA_NOCLDSTOP;
> +sigaction(SIGCHLD, &act, NULL);
> +}
> +
>  /* Find a likely location for support files using the location of the binary.
> For installed binaries this will be "$bindir/../share/qemu".  When
> running from the build tree this will be "$bindir/../pc-bios".  */
> diff --git a/qemu-os-posix.h b/qemu-os-posix.h
> index 81fd9ab..1c317f1 100644
> --- a/qemu-os-posix.h
> +++ b/qemu-os-posix.h
> @@ -33,6 +33,8 @@ static inline void os_host_main_loop_wait(int *timeout)
>  void os_set_line_buffering(void);
>  void os_set_proc_name(const char *s);
>  void os_setup_signal_handling(void);
> +void os_stop_sigchld_handling(void);
> +void os_resume_sigchld_handling(void);
>  void os_daemonize(void);
>  void os_setup_post(void);
>  
> diff --git a/qemu-os-win32.h b/qemu-os-win32.h
> index 1a07e5e..f31c5ef 100644
> --- a/qemu-os-win32.h
> +++ b/qemu-os-win32.h
> @@ -43,6 +43,8 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc 
> *func, void *opaque);
>  void os_host_main_loop_wait(int *timeout);
>  
>  static inline void os_setup_signal_handling(void) {}
> +static inline void os_stop_sigchld_handling(void) {}
> +static inline void os_resume_sigchld_handling(void) {}
>  static inline void os_daemonize(void) {}
>  static inline void os_setup_post(void) {}
>  void os_set_line_buffering(void);
> diff --git a/savevm.c b/savevm.c
> index 90aa237..387b70b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -234,7 +234,9 @@ static int stdio_pclose(void *opaque)
>  {
>  QEMUFileStdio *s = opaque;
>  int ret;
> +os_stop_sigchld_handling();
>  ret = pclose(s->stdio_file);
> +os_resume_sigchld_handling();
>  qemu_free(s);
>  return ret;
>  }

Ping Again... :)
This is a bug fix.
2 months has gone, but I do not receive any comment.
Should we remove SIGCHLD handling as Paolo said?

Re: [Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0

On Thu, Mar 03, 2011 at 12:03:53PM +0900, Isaku Yamahata wrote:
> Hi. Thank you for reporting.
> Can you elaborate on the changeset that you pointed out and
> your work around?
> 
> Regarding to the changeset, it had the issue, but I suppose
> 80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
> Do you found any other issue?
> 
> Regarding to your workaround, what was the problem?
> What EFI BIOS are you using? Tiano-core derivatives that
> Tristan Gingold worked on? Or other one?
> It seems your EFI BIOS doesn't enable memor, io or master bits
> in command register.

or disableintx.

> If so, the issue is in the bios, not qemu.
> 
> thanks,
> 
> On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:
> > vagran wrote:
> >> Hi,
> >> I have noted that Qemu VGA and serial console with EFI BIOS stopped  
> >> working in
> >> 0.14.0 (and in latest development snapshot is still not working).  
> >> Everything was
> >> fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
> >> properly configured on used disk image. The only effect is that  
> >> neither VGA nor
> >> serial console is not functioning. After short investigation I have  
> >> discovered
> >> that this functionality was broken by this commit:
> >>
> >> commit 9bb3358627d87d8de25fb41b7276575539d799a7
> >> Author: Isaku Yamahata 
> >> Date:   Fri Nov 19 18:56:02 2010 +0900
> >>
> >> Do you have any idea how this change could affect EFI consoles?
> >>
> > After further investigation I have found that the following patch provides
> > a workaround for the problem, may be it could be useful for somebody who
> > is more familiar with Qemu PCI code:
> >
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 8b76cea..06dd7ab 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
> > pci_update_irq_status(dev);
> > pci_device_deassert_intx(dev);
> > /* Clear all writeable bits */
> > +#if 0
> > pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> >  pci_get_word(dev->wmask + PCI_COMMAND) |
> >  pci_get_word(dev->w1cmask + PCI_COMMAND));
> > +#endif
> > pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> >  pci_get_word(dev->wmask + PCI_STATUS) |
> >  pci_get_word(dev->w1cmask + PCI_STATUS));
> >
> > Best regards,
> > Artyom.
> >
> >
> 
> -- 
> yamahata
> 

-- 
yamahata

Re: [Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0

Hi. Thank you for reporting.
Can you elaborate on the changeset that you pointed out and
your work around?

Regarding to the changeset, it had the issue, but I suppose
80376c3fc2c38fdd45354e4b0eb45031f35587ed fixed it.
Do you found any other issue?

Regarding to your workaround, what was the problem?
What EFI BIOS are you using? Tiano-core derivatives that
Tristan Gingold worked on? Or other one?
It seems your EFI BIOS doesn't enable memor, io or master bits
in command register.
If so, the issue is in the bios, not qemu.

thanks,

On Wed, Mar 02, 2011 at 11:27:31PM +0200, vagran wrote:
> vagran wrote:
>> Hi,
>> I have noted that Qemu VGA and serial console with EFI BIOS stopped  
>> working in
>> 0.14.0 (and in latest development snapshot is still not working).  
>> Everything was
>> fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
>> properly configured on used disk image. The only effect is that  
>> neither VGA nor
>> serial console is not functioning. After short investigation I have  
>> discovered
>> that this functionality was broken by this commit:
>>
>> commit 9bb3358627d87d8de25fb41b7276575539d799a7
>> Author: Isaku Yamahata 
>> Date:   Fri Nov 19 18:56:02 2010 +0900
>>
>> Do you have any idea how this change could affect EFI consoles?
>>
> After further investigation I have found that the following patch provides
> a workaround for the problem, may be it could be useful for somebody who
> is more familiar with Qemu PCI code:
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 8b76cea..06dd7ab 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
> pci_update_irq_status(dev);
> pci_device_deassert_intx(dev);
> /* Clear all writeable bits */
> +#if 0
> pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
>  pci_get_word(dev->wmask + PCI_COMMAND) |
>  pci_get_word(dev->w1cmask + PCI_COMMAND));
> +#endif
> pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
>  pci_get_word(dev->wmask + PCI_STATUS) |
>  pci_get_word(dev->w1cmask + PCI_STATUS));
>
> Best regards,
> Artyom.
>
>

-- 
yamahata

[Qemu-devel] [PATCH 3/3] correct VNC_DIRTY_WORDS on 64 bit machine

VNC_DIRTY_WORDS is wrong on 64 bit long machine.

Signed-off-by: Wen Congyang 

---
 ui/vnc.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 8a1e7b9..239a7a7 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -81,7 +81,8 @@ typedef void VncSendHextileTile(VncState *vs,
 
 #define VNC_MAX_WIDTH 2560
 #define VNC_MAX_HEIGHT 2048
-#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))
+#define divideup(x, y)  (((x) + ((y) - 1)) / (y))
+#define VNC_DIRTY_WORDS (divideup(VNC_MAX_WIDTH, (16 * BITS_PER_LONG)))
 
 #define VNC_STAT_RECT  64
 #define VNC_STAT_COLS (VNC_MAX_WIDTH / VNC_STAT_RECT)
-- 
1.7.1

Re: [Qemu-devel] [RFC][PATCH] Preliminary BeBox support

2011-03-02 Thread François Revol


Le 2 mars 2011 à 22:59, Andreas Färber a écrit :

> Hello François,
> 
> Am 01.03.2011 um 01:15 schrieb François Revol:
> 
>> Since Natalia raised the subject I though I'd post my current patch for the 
>> BeBox support.
>> I think the loader stuff can probably be committed already with some cleanup.
>> The rest is mostly a copy of the prep file with tweaks and needs more work.
>> The boot nub images can be extracted with this script:
>> http://revolf.free.fr/beos/extract_bebox_images.sh
>> Running -M bebox -bios bootnub.image makes it try to probe the PCI bridge 
>> for now.
>> Comments ?
> 
> How does this relate to the BeBox emulation I started? I assume not at all?
> I tried to do the PEF parsing and nub extraction inside QEMU.
> http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/bebox

Eh, not at all, I didn't notice this one.
Hopefully I won't be dupping work anymore :p

> I'd prefer the BeBox machine to go into ppc_prep.c and to benefit from the 
> PReP cleanup that I started.

Yeah it needs some.

> I'll have a closer look the weekend, please cc me on future patches.

Ok.

François.

[Qemu-devel] Invitation to connect on LinkedIn

2011-03-02 Thread Starrry Han via LinkedIn

LinkedIn
Starrry Han requested to add you as a connection on LinkedIn:
--

Jiajun,

I'd like to add you to my professional network on LinkedIn.

- Starrry

Accept invitation from Starrry Han
http://www.linkedin.com/e/-kkb1ec-gkt15x24-k/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I102453064_11/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYNclYQdz0PdjgOc359bS55t6xakn5KbPsQc3sNcz8Pd38LrCBxbOYWrSlI/EML_comm_afe/

View invitation from Starrry Han
http://www.linkedin.com/e/-kkb1ec-gkt15x24-k/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I102453064_11/34NnPgSc3cRd38MckALqnpPbOYWrSlI/svi/
 
--

DID YOU KNOW that LinkedIn can find the answers to your most difficult 
questions? Post those vexing questions on LinkedIn Answers to tap into the 
knowledge of the world's foremost business experts: 
http://www.linkedin.com/e/-kkb1ec-gkt15x24-k/ask/inv-23/

 
-- 
(c) 2011, LinkedIn Corporation

Re: [Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

At 03/03/2011 06:27 AM, Stefan Weil Write:
> Am 02.03.2011 23:01, schrieb Stefan Weil:
>> Am 02.03.2011 19:47, schrieb Peter Maydell:
>>> On 2 March 2011 18:36, Stefan Weil  wrote:
 No. I dont't think that the third parameter of bitmap_clear is
 ok like that. See my patch for the correct value.
>>>
>>> Wen's patch:
>>>
>>> + const size_t width = ds_get_width(vd->ds) / 16;
>>> [...]
>>> -bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
>>> -bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
>>> - VNC_DIRTY_WORDS * BITS_PER_LONG);
>>> +bitmap_set(width_mask, 0, width);
>>> +bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG
>>> - width);
>>>
>>> Your patch:
>>>
>>> bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
>>> - VNC_DIRTY_WORDS * BITS_PER_LONG);
>>> + (VNC_MAX_WIDTH - ds_get_width(vd->ds)) / 16);
>>>
>>> Since ui/vnc.h has:
>>>
>>> #define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))
>>>
>>> the third parameter to bitmap_clear is the same value in
>>> both cases, isn't it? Or is this a rounding bug?
>>>
>>> -- PMM
>>
>> Because of rounding effects, both values can be different.
>>
>> The part missing in my patch is correct handling of another
>> rounding effect:
>>
>> VNC_DIRTY_WORDS is exact for 32 bit long values (and the
>> "old" code which used uint32_t until some weeks ago), where
>> VNC_DIRTY_WORDS = 2560/16/32 = 5.
>>
>> For 64 bit values, VNC_DIRTY_WORDS = 2560/16/64 = 2 (rounded)!
>>
>> Stefan W.
> 
> 
> Is bitmap_clear() really needed here? Meanwhile I think it is not,
> so this might be a new patch variant...

I do not know why we call bitmap_clear() hear. I only know it is
the same as the old code:

-static inline void vnc_set_bits(uint32_t *d, int n, int nb_words)
-{
-int j;
-
-j = 0;
-while (n >= 32) {
-d[j++] = -1;
-n -= 32;
-}
-if (n > 0)
-d[j++] = (1 << n) - 1;
-while (j < nb_words)   <=== bitmap_clear()
-d[j++] = 0;
-}

-vnc_set_bits(width_mask, (ds_get_width(vd->ds) / 16), VNC_DIRTY_WORDS);
+bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
+bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
+ VNC_DIRTY_WORDS * BITS_PER_LONG);


> 
> 
>

Re: [Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Peter Maydell

On 2 March 2011 22:01, Stefan Weil  wrote:
> The part missing in my patch is correct handling of another
> rounding effect:
>
> VNC_DIRTY_WORDS is exact for 32 bit long values (and the
> "old" code which used uint32_t until some weeks ago), where
> VNC_DIRTY_WORDS = 2560/16/32 = 5.
>
> For 64 bit values, VNC_DIRTY_WORDS = 2560/16/64 = 2 (rounded)!

Yes, I noticed that after I posted. Given that we have arrays
like
 unsigned long dirty[VNC_MAX_HEIGHT][VNC_DIRTY_WORDS];
rounding down rather than up looks a bit suspicious...

(Can we just make VNC_MAX_WIDTH a multiple of 64, or is it
baked into the VNC protocol?)

-- PMM

[Qemu-devel] [PATCH v4 1/2] rtl8139: add vlan tag extraction

2011-03-02 Thread Benjamin Poirier

Add support to the emulated hardware to extract vlan tags in packets
going from the network to the guest.

Signed-off-by: Benjamin Poirier 
Cc: Igor V. Kovalenko 
Cc: Jason Wang 
Cc: Michael S. Tsirkin 

--

AFAIK, extraction is optional to get vlans working. The driver
requests rx detagging but should not assume that it was done. Under
Linux, the mac layer will catch the vlan ethertype. I only added this
part for completeness (to emulate the hardware more truthfully...)
---
 hw/rtl8139.c |  100 +++---
 1 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..6e770bd 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -68,6 +68,16 @@
 #if defined(RTL8139_CALCULATE_RXCRC)
 /* For crc32 */
 #include 
+
+static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len)
+{
+return crc32(crc, buf, len);
+}
+#else
+static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len)
+{
+return 0;
+}
 #endif
 
 #define SET_MASKED(input, mask, curr) \
@@ -77,6 +87,14 @@
 #define MOD2(input, size) \
 ( ( input ) & ( size - 1 )  )
 
+#define ETHER_ADDR_LEN 6
+#define ETHER_TYPE_LEN 2
+#define ETHER_HDR_LEN (ETHER_ADDR_LEN * 2 + ETHER_TYPE_LEN)
+#define ETHERTYPE_VLAN 0x8100
+
+#define VLAN_TCI_LEN 2
+#define VLAN_HDR_LEN (ETHER_TYPE_LEN + VLAN_TCI_LEN)
+
 #if defined (DEBUG_RTL8139)
 #  define DEBUG_PRINT(x) do { printf x ; } while (0)
 #else
@@ -818,10 +836,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 {
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
 int size = size_;
+const uint8_t *dot1q_buf;
+const uint8_t *next_part;
+size_t next_part_size;
 
 uint32_t packet_header = 0;
 
-uint8_t buf1[60];
+uint8_t buf1[MIN_BUF_SIZE];
 static const uint8_t broadcast_macaddr[6] =
 { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
@@ -933,14 +954,6 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 }
 }
 
-/* if too small buffer, then expand it */
-if (size < MIN_BUF_SIZE) {
-memcpy(buf1, buf, size);
-memset(buf1 + size, 0, MIN_BUF_SIZE - size);
-buf = buf1;
-size = MIN_BUF_SIZE;
-}
-
 if (rtl8139_cp_receiver_enabled(s))
 {
 DEBUG_PRINT(("RTL8139: in C+ Rx mode \n"));
@@ -1001,6 +1014,41 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 uint32_t rx_space = rxdw0 & CP_RX_BUFFER_SIZE_MASK;
 
+/* write VLAN info to descriptor variables */
+/* next_part starts right after the vlan header (if any), at the
+ * ethertype for the payload */
+next_part = &buf[ETHER_ADDR_LEN * 2];
+if (s->CpCmd & CPlusRxVLAN && be16_to_cpup((uint16_t *)
+&buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN) {
+dot1q_buf = &buf[ETHER_ADDR_LEN * 2];
+next_part += VLAN_HDR_LEN;
+size -= VLAN_HDR_LEN;
+
+rxdw1 &= ~CP_RX_VLAN_TAG_MASK;
+/* BE + ~le_to_cpu()~ + cpu_to_le() = BE */
+rxdw1 |= CP_RX_TAVA | le16_to_cpup((uint16_t *)
+&buf[ETHER_HDR_LEN]);
+
+DEBUG_PRINT(("RTL8139: C+ Rx mode : extracted vlan tag with tci: "
+"%u\n", be16_to_cpup((uint16_t *) &buf[ETHER_HDR_LEN])));
+} else {
+/* reset VLAN tag flag */
+rxdw1 &= ~CP_RX_TAVA;
+dot1q_buf = NULL;
+}
+next_part_size = buf + size_ - next_part;
+
+/* if too small buffer, then expand it */
+if (size < MIN_BUF_SIZE) {
+size_t tmp_size = MIN_BUF_SIZE - ETHER_ADDR_LEN * 2;
+
+memcpy(buf1, next_part, next_part_size);
+memset(buf1 + next_part_size, 0, tmp_size - next_part_size);
+next_part = buf1;
+next_part_size = tmp_size;
+size = MIN_BUF_SIZE;
+}
+
 /* TODO: scatter the packet over available receive ring descriptors 
space */
 
 if (size+4 > rx_space)
@@ -1022,7 +1070,18 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
 /* receive/copy to target memory */
-cpu_physical_memory_write( rx_addr, buf, size );
+if (unlikely(dot1q_buf)) {
+cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN);
+val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
+val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
+cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, next_part,
+next_part_size);
+val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN,
+next_part_size);
+} else {
+cpu_physical_memory_write(rx_addr, buf, size);
+val = rtl8139_crc32(0, buf, size);
+}

[Qemu-devel] [PATCH v4 2/2] rtl8139: add vlan tag insertion

2011-03-02 Thread Benjamin Poirier

Add support to the emulated hardware to insert vlan tags in packets
going from the guest to the network.

Signed-off-by: Benjamin Poirier 
Cc: Igor V. Kovalenko 
Cc: Jason Wang 
Cc: Michael S. Tsirkin 
---
 hw/rtl8139.c |  102 ++
 1 files changed, 74 insertions(+), 28 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 6e770bd..0ea79e4 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -832,11 +832,14 @@ static int rtl8139_can_receive(VLANClientState *nc)
 }
 }
 
-static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size_, int do_interrupt)
+static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf,
+size_t buf_size, int do_interrupt, const uint8_t *dot1q_buf)
 {
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+/* size_ is the total length of argument buffers */
+int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0);
+/* size is the length of the buffer passed to the driver */
 int size = size_;
-const uint8_t *dot1q_buf;
 const uint8_t *next_part;
 size_t next_part_size;
 
@@ -1018,10 +1021,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 /* next_part starts right after the vlan header (if any), at the
  * ethertype for the payload */
 next_part = &buf[ETHER_ADDR_LEN * 2];
-if (s->CpCmd & CPlusRxVLAN && be16_to_cpup((uint16_t *)
-&buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN) {
-dot1q_buf = &buf[ETHER_ADDR_LEN * 2];
-next_part += VLAN_HDR_LEN;
+if (s->CpCmd & CPlusRxVLAN && (dot1q_buf || be16_to_cpup((uint16_t *)
+&buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN)) {
+if (!dot1q_buf) {
+/* the tag is in the buffer */
+dot1q_buf = &buf[ETHER_ADDR_LEN * 2];
+next_part += VLAN_HDR_LEN;
+}
 size -= VLAN_HDR_LEN;
 
 rxdw1 &= ~CP_RX_VLAN_TAG_MASK;
@@ -1034,9 +1040,8 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 } else {
 /* reset VLAN tag flag */
 rxdw1 &= ~CP_RX_TAVA;
-dot1q_buf = NULL;
 }
-next_part_size = buf + size_ - next_part;
+next_part_size = buf + buf_size - next_part;
 
 /* if too small buffer, then expand it */
 if (size < MIN_BUF_SIZE) {
@@ -1164,10 +1169,18 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* if too small buffer, then expand it */
 if (size < MIN_BUF_SIZE) {
-memcpy(buf1, buf, size);
+if (unlikely(dot1q_buf)) {
+memcpy(buf1, buf, 2 * ETHER_ADDR_LEN);
+memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN);
+memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 *
+ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN);
+} else {
+memcpy(buf1, buf, size);
+}
 memset(buf1 + size, 0, MIN_BUF_SIZE - size);
 buf = buf1;
 size = MIN_BUF_SIZE;
+dot1q_buf = NULL;
 }
 
 /* begin ring receiver mode */
@@ -1196,8 +1209,19 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 rtl8139_write_buffer(s, (uint8_t *)&val, 4);
 
 /* receive/copy to target memory */
-rtl8139_write_buffer(s, buf, size);
-val = rtl8139_crc32(0, buf, size);
+if (unlikely(dot1q_buf)) {
+rtl8139_write_buffer(s, buf, 2 * ETHER_ADDR_LEN);
+val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
+rtl8139_write_buffer(s, dot1q_buf, VLAN_HDR_LEN);
+val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
+rtl8139_write_buffer(s, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+} else {
+rtl8139_write_buffer(s, buf, size);
+val = rtl8139_crc32(0, buf, size);
+}
 
 /* write checksum */
 #if defined (RTL8139_CALCULATE_RXCRC)
@@ -1227,7 +1251,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 static ssize_t rtl8139_receive(VLANClientState *nc, const uint8_t *buf, size_t 
size)
 {
-return rtl8139_do_receive(nc, buf, size, 1);
+return rtl8139_do_receive(nc, buf, size, 1, NULL);
 }
 
 static void rtl8139_reset_rxring(RTL8139State *s, uint32_t bufferSize)
@@ -1802,7 +1826,8 @@ static uint32_t rtl8139_RxConfig_read(RTL8139State *s)
 return ret;
 }
 
-static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int 
size, int do_interrupt)
+static void rtl8139_transfer_frame(RTL8139State *s, uint8_t *buf, int size,
+int do_interru

[Qemu-devel] [PATCH v4] rtl8139: add vlan support

2011-03-02 Thread Benjamin Poirier

I've tested v4 with x86_64 host/guest. I used the same testing procedure as
before. I've tested a plain configuration as well as one with tso + vlan
offload, successfully.

I had to hack around the Linux 8139cp driver to be able to enable tso on vlan
which leads me to wonder, can someone with access to the C+ spec or a real
card confirm that it can do tso and vlan offload at the same time? The patch
I used for the kernel is at https://gist.github.com/851895.

Changes since v2:
insertion:
* moved insertion later in the process, to handle tso
* use qemu_sendv_packet() to insert the tag for us
* added dot1q_buf parameter to rtl8139_do_receive() to avoid some
  memcpy() in loopback mode. Note that the code path through that
  function is unchanged when dot1q_buf is NULL.

extraction:
* reduced the amount of copying by moving the "frame too short" logic
  after the removal of the vlan tag (as is done in e1000.c for
  example). Unfortunately, that logic can no longer be shared betwen
  C+ and C mode.

I've posted v2 of these patches back in November
http://article.gmane.org/gmane.comp.emulators.qemu/84252

I've tested v3 on the following combinations of guest and hosts:
host: x86_64, guest: x86_64
host: x86_64, guest: ppc32
host: ppc32, guest: ppc32

Testing on the x86_64 host used '-net tap' and consisted of:
* making an http transfert on the untagged interface.
* ping -s 0-1472 to another host on a vlan.
* making an scp upload to another host on a vlan.

Testing on the ppc32 host used '-net socket' connected to an x86_64 qemu-kvm
running the virtio nic and consisted of:
* establishing an ssh connection between the two using an untagged interface.
* ping -s 0-1472 between the two using a vlan.
* making an scp transfer in both directions using a vlan.

All that was successful. Nevertheless, it doesn't exercise all code paths so
care is in order.

Please note that the lack of vlan support in rtl8139 has taken a few people
aback:
https://bugzilla.redhat.com/show_bug.cgi?id=516587
http://article.gmane.org/gmane.linux.network.general/14266

Thanks,
-Ben

Re: [Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Stefan Weil


Am 02.03.2011 23:01, schrieb Stefan Weil:

Am 02.03.2011 19:47, schrieb Peter Maydell:

On 2 March 2011 18:36, Stefan Weil  wrote:

No. I dont't think that the third parameter of bitmap_clear is
ok like that. See my patch for the correct value.


Wen's patch:

+ const size_t width = ds_get_width(vd->ds) / 16;
[...]
-bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
-bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+bitmap_set(width_mask, 0, width);
+bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG 
- width);


Your patch:

bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+ (VNC_MAX_WIDTH - ds_get_width(vd->ds)) / 16);

Since ui/vnc.h has:

#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))

the third parameter to bitmap_clear is the same value in
both cases, isn't it? Or is this a rounding bug?

-- PMM


Because of rounding effects, both values can be different.

The part missing in my patch is correct handling of another
rounding effect:

VNC_DIRTY_WORDS is exact for 32 bit long values (and the
"old" code which used uint32_t until some weeks ago), where
VNC_DIRTY_WORDS = 2560/16/32 = 5.

For 64 bit values, VNC_DIRTY_WORDS = 2560/16/64 = 2 (rounded)!

Stefan W.



Is bitmap_clear() really needed here? Meanwhile I think it is not,
so this might be a new patch variant...

[Qemu-devel] [PATCH 2/2] net: fix qemu_can_send_packet logic

If any of the clients is not ready to receive (ie it has a can_receive
callback and can_receive() returns false), we don't want to start
sending, else this client may miss/discard the packet.

I got this behaviour with the following setup :
the emulated machine is using an USB-ethernet adapter, it is connected
to the network using SLIRP and I'm dumping the traffic in a .pcap file.
As per the following command line :
-net nic,model=usb,vlan=1 -net user,vlan=1 -net dump,vlan=1,file=/tmp/pkt.pcap
Every time that two packets are coming in a row from the host, the
usb-net code will receive the first one, then returns 0 to can_receive
call since it has a 1 packet long queue. But as the dump code is always
ready to receive, qemu_can_send_packet will return true and the next
packet will discard the previous one in the usb-net code.

Signed-off-by: Vincent Palatin 
---
 net.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net.c b/net.c
index ec4745d..72ac4cf 100644
--- a/net.c
+++ b/net.c
@@ -411,11 +411,11 @@ int qemu_can_send_packet(VLANClientState *sender)
 }
 
 /* no can_receive() handler, they can always receive */
-if (!vc->info->can_receive || vc->info->can_receive(vc)) {
-return 1;
+if (vc->info->can_receive && !vc->info->can_receive(vc)) {
+return 0;
 }
 }
-return 0;
+return 1;
 }
 
 static ssize_t qemu_deliver_packet(VLANClientState *sender,
-- 
1.7.3.1

[Qemu-devel] net: small fixes

Dear Qemu developers,

While debugging a machine emulation using SLIRP based user networking, I ran 
into a couple of issues.
Please find attached the patches for them :
1) fix the SLIRP compilation when the debug traces are activated.
2) avoid packet loss with several receivers on the same vlan.

Regards,
-- 
Vincent

[Qemu-devel] [PATCH 1/2] net: fix trace when debug is activated in slirp

make the code compile correctly when DEBUG is activated.

Signed-off-by: Vincent Palatin 
---
 slirp/bootp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/slirp/bootp.c b/slirp/bootp.c
index 0905c6d..1eb2ed1 100644
--- a/slirp/bootp.c
+++ b/slirp/bootp.c
@@ -284,7 +284,7 @@ static void bootp_reply(Slirp *slirp, const struct bootp_t 
*bp)
 } else {
 static const char nak_msg[] = "requested address not available";
 
-DPRINTF("nak'ed addr=%08x\n", ntohl(preq_addr->s_addr));
+DPRINTF("nak'ed addr=%08x\n", ntohl(preq_addr.s_addr));
 
 *q++ = RFC2132_MSG_TYPE;
 *q++ = 1;
-- 
1.7.3.1

Re: [Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Stefan Weil


Am 02.03.2011 19:47, schrieb Peter Maydell:

On 2 March 2011 18:36, Stefan Weil  wrote:

No. I dont't think that the third parameter of bitmap_clear is
ok like that. See my patch for the correct value.


Wen's patch:

+ const size_t width = ds_get_width(vd->ds) / 16;
[...]
-bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
-bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+bitmap_set(width_mask, 0, width);
+bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG - 
width);


Your patch:

bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+ (VNC_MAX_WIDTH - ds_get_width(vd->ds)) / 16);

Since ui/vnc.h has:

#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))

the third parameter to bitmap_clear is the same value in
both cases, isn't it? Or is this a rounding bug?

-- PMM


Because of rounding effects, both values can be different.

The part missing in my patch is correct handling of another
rounding effect:

VNC_DIRTY_WORDS is exact for 32 bit long values (and the
"old" code which used uint32_t until some weeks ago), where
VNC_DIRTY_WORDS = 2560/16/32 = 5.

For 64 bit values, VNC_DIRTY_WORDS = 2560/16/64 = 2 (rounded)!

Stefan W.

Re: [Qemu-devel] [RFC][PATCH] Preliminary BeBox support

2011-03-02 Thread Andreas Färber


Hello François,

Am 01.03.2011 um 01:15 schrieb François Revol:

Since Natalia raised the subject I though I'd post my current patch  
for the BeBox support.
I think the loader stuff can probably be committed already with some  
cleanup.
The rest is mostly a copy of the prep file with tweaks and needs  
more work.

The boot nub images can be extracted with this script:
http://revolf.free.fr/beos/extract_bebox_images.sh
Running -M bebox -bios bootnub.image makes it try to probe the PCI  
bridge for now.

Comments ?


How does this relate to the BeBox emulation I started? I assume not at  
all?

I tried to do the PEF parsing and nub extraction inside QEMU.
http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/bebox

I'd prefer the BeBox machine to go into ppc_prep.c and to benefit from  
the PReP cleanup that I started.

I'll have a closer look the weekend, please cc me on future patches.

Thanks,

Andreas

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-03-02 Thread Anthony Liguori


On 03/02/2011 11:30 AM, Avi Kivity wrote:

It's really the natural generalization
of what you're proposing.

So basically, the only differences are:

 1) always use the new RAID1 format
 2) drop the progress bitmap
 3) support multiple devices per file
 4) let drive properties be specified beyond filename

All reasonable things to do.


Well, I dislike 3, and the whole "qemu is authoritative source of 
configuration" thing.


Yeah, at this point, I think the new no-stored-state approach is 
obviously better than anything proposed.  So we'll have to revisit this 
discussion if/when there's another use-case that demands it.


Regards,

Anthony Liguori

[Qemu-devel] Re: EFI console stopped working in Qemu 0.14.0

2011-03-02 Thread vagran


vagran wrote:

Hi,
I have noted that Qemu VGA and serial console with EFI BIOS stopped 
working in
0.14.0 (and in latest development snapshot is still not working). 
Everything was

fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
properly configured on used disk image. The only effect is that 
neither VGA nor
serial console is not functioning. After short investigation I have 
discovered

that this functionality was broken by this commit:

commit 9bb3358627d87d8de25fb41b7276575539d799a7
Author: Isaku Yamahata 
Date:   Fri Nov 19 18:56:02 2010 +0900

Do you have any idea how this change could affect EFI consoles?


After further investigation I have found that the following patch provides
a workaround for the problem, may be it could be useful for somebody who
is more familiar with Qemu PCI code:

diff --git a/hw/pci.c b/hw/pci.c
index 8b76cea..06dd7ab 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -162,9 +162,11 @@ void pci_device_reset(PCIDevice *dev)
pci_update_irq_status(dev);
pci_device_deassert_intx(dev);
/* Clear all writeable bits */
+#if 0
pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
 pci_get_word(dev->wmask + PCI_COMMAND) |
 pci_get_word(dev->w1cmask + PCI_COMMAND));
+#endif
pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
 pci_get_word(dev->wmask + PCI_STATUS) |
 pci_get_word(dev->w1cmask + PCI_STATUS));

Best regards,
Artyom.

[Qemu-devel] [PATCH] fix offset for MMIO subpage access

When using a MMIO subpage not starting on a page boundary, the offset
value given to the access handler is based on the start of the MMU page
not on the subpage base.
As a consequence, if you are mapping the same subpage sized MMIO device
at different addresses, this is somewhat impractical and confusing since
the same register will be called with different "offset" depending on the
base address.

My proposal is to workaround this by recording the offset in region_offset
field.

Signed-off-by: Vincent Palatin 
---
 exec.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d611100..b59e7c9 100644
--- a/exec.c
+++ b/exec.c
@@ -2626,6 +2626,7 @@ void 
cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
 CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2,
   need_subpage);
 if (need_subpage) {
+region_offset -= (start_addr & ~TARGET_PAGE_MASK);
 if (!(orig_memory & IO_MEM_SUBPAGE)) {
 subpage = subpage_init((addr & TARGET_PAGE_MASK),
&p->phys_offset, orig_memory,
@@ -2658,6 +2659,7 @@ void 
cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
   end_addr2, need_subpage);
 
 if (need_subpage) {
+region_offset -= (start_addr & ~TARGET_PAGE_MASK);
 subpage = subpage_init((addr & TARGET_PAGE_MASK),
&p->phys_offset, IO_MEM_UNASSIGNED,
addr & TARGET_PAGE_MASK);
-- 
1.7.3.1

Re: [Qemu-devel] Memory Map

Hi,

On Wed, Mar 2, 2011 at 12:11, Salvatore Lionetti
 wrote:
> Still now, some memory region is called with base+offset.
>
> So:
>
> [0x204] <= value (write from uP register)
> cause
> read(opaque, offset=204, value)
>
> while
> [0x504] <= value (write from uP register)
> cause
> read(opaque, offset=4, value)
>
> The two opaque are different as expected.
>
> Where i am wrong?

If you mean 0x5004 and not 0x504 (as stated in your previous email),
IMO it is a current limitation of the Qemu feature called "subpage"
(which is used when you are mapping a memory area smaller than the MMU
page size as in your example).

When using subpages in the current code, the "offset" becomes the
distance to the MMU page boundary instead of the distance to the base
address of the peripheral. This is somewhat impractical and confusing
when you are mapping the same subpage sized MMIO device at different
addresses.
As the emulation I'm working on has a lot of subpage sized
peripherals, I have written a patch to workaround this limitation. I
will send it to the list for comment if you want to try it.

-- 
Vincent

Re: [Qemu-devel] Re: [PATCH] moving eeprom initialization

2011-03-02 Thread William Dauchy

On Wed, Mar 2, 2011 at 7:28 PM, Gerhard Wiesinger  wrote:
> Your patch should be based on fixes for correct EEPROM initialization, see
> for details: http://www.mail-archive.com/qemu-devel@nongnu.org/msg56414.html

This patch is not yet integrated upstream. I will correct it if needed.

-- 
William

[Qemu-devel] EFI console stopped working in Qemu 0.14.0

2011-03-02 Thread vagran


Hi,
I have noted that Qemu VGA and serial console with EFI BIOS stopped 
working in
0.14.0 (and in latest development snapshot is still not working). 
Everything was

fine in 0.13.0. However EFI BIOS itself is able to load kernel if it was
properly configured on used disk image. The only effect is that neither 
VGA nor
serial console is not functioning. After short investigation I have 
discovered

that this functionality was broken by this commit:

commit 9bb3358627d87d8de25fb41b7276575539d799a7
Author: Isaku Yamahata 
Date:   Fri Nov 19 18:56:02 2010 +0900

Do you have any idea how this change could affect EFI consoles?

--
Best regards,
Artyom.

[Qemu-devel] Re: [PATCH v3 uq/master 00/22] Win32 iothread support

2011-03-02 Thread Marcelo Tosatti

On Wed, Mar 02, 2011 at 04:36:34PM -0300, Marcelo Tosatti wrote:
> On Wed, Mar 02, 2011 at 08:03:42PM +0100, Jan Kiszka wrote:
> > On 2011-03-02 19:43, Marcelo Tosatti wrote:
> > > On Tue, Mar 01, 2011 at 02:35:56PM +0200, Avi Kivity wrote:
> > >> On 02/28/2011 04:05 PM, Paolo Bonzini wrote:
> > >>> On 02/28/2011 01:13 PM, Avi Kivity wrote:
> > >
> > 
> >  If there's a git tree of this I'll be happy to do an autotest run.
> > >>>
> > >>> Sure, it's branch iothread-win32 of git://github.com/bonzini/qemu.git
> > >>
> > >> Fails on Fedora 9 i386 install, hangs right after "Performing post
> > >> install configuration...".  The guest is processing interrupts but
> > >> the mouse won't move, and it doesn't make progress.
> > >>
> > >> Configured with --enable-io-thread.  Perhaps the problem exists even
> > >> before the patchset.
> > > 
> > > Probably unrelated, looks similar to the regression seen with qemu-kvm.
> > > 
> > 
> > Do these patches change some behavior or not? 
> 
> Yes, they change some behaviour. Autotest fails.

Sorry, i misunderstood. I don't think these patches change any
behaviour.

Avi's failure case is similar to what i've seen earlier upon
qemu->qemu-kvm merge. Conclusion is there is no new regression 
introduced by these patches.

> 
> > Is it this same effect you saw with my qemu-kvm queue?
> 
> Yes.
>

[Qemu-devel] Re: [PATCH v3 uq/master 00/22] Win32 iothread support

2011-03-02 Thread Marcelo Tosatti

On Wed, Mar 02, 2011 at 08:03:42PM +0100, Jan Kiszka wrote:
> On 2011-03-02 19:43, Marcelo Tosatti wrote:
> > On Tue, Mar 01, 2011 at 02:35:56PM +0200, Avi Kivity wrote:
> >> On 02/28/2011 04:05 PM, Paolo Bonzini wrote:
> >>> On 02/28/2011 01:13 PM, Avi Kivity wrote:
> >
> 
>  If there's a git tree of this I'll be happy to do an autotest run.
> >>>
> >>> Sure, it's branch iothread-win32 of git://github.com/bonzini/qemu.git
> >>
> >> Fails on Fedora 9 i386 install, hangs right after "Performing post
> >> install configuration...".  The guest is processing interrupts but
> >> the mouse won't move, and it doesn't make progress.
> >>
> >> Configured with --enable-io-thread.  Perhaps the problem exists even
> >> before the patchset.
> > 
> > Probably unrelated, looks similar to the regression seen with qemu-kvm.
> > 
> 
> Do these patches change some behavior or not? 

Yes, they change some behaviour. Autotest fails.

> Is it this same effect you saw with my qemu-kvm queue?

Yes.

[Qemu-devel] Re: [PATCH v3 uq/master 00/22] Win32 iothread support

On 2011-03-02 19:43, Marcelo Tosatti wrote:
> On Tue, Mar 01, 2011 at 02:35:56PM +0200, Avi Kivity wrote:
>> On 02/28/2011 04:05 PM, Paolo Bonzini wrote:
>>> On 02/28/2011 01:13 PM, Avi Kivity wrote:
>

 If there's a git tree of this I'll be happy to do an autotest run.
>>>
>>> Sure, it's branch iothread-win32 of git://github.com/bonzini/qemu.git
>>
>> Fails on Fedora 9 i386 install, hangs right after "Performing post
>> install configuration...".  The guest is processing interrupts but
>> the mouse won't move, and it doesn't make progress.
>>
>> Configured with --enable-io-thread.  Perhaps the problem exists even
>> before the patchset.
> 
> Probably unrelated, looks similar to the regression seen with qemu-kvm.
> 

Do these patches change some behavior or not? Is it this same effect you
saw with my qemu-kvm queue?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH v3 uq/master 00/22] Win32 iothread support

2011-03-02 Thread Marcelo Tosatti

On Tue, Mar 01, 2011 at 02:35:56PM +0200, Avi Kivity wrote:
> On 02/28/2011 04:05 PM, Paolo Bonzini wrote:
> >On 02/28/2011 01:13 PM, Avi Kivity wrote:
> >>>
> >>
> >>If there's a git tree of this I'll be happy to do an autotest run.
> >
> >Sure, it's branch iothread-win32 of git://github.com/bonzini/qemu.git
> 
> Fails on Fedora 9 i386 install, hangs right after "Performing post
> install configuration...".  The guest is processing interrupts but
> the mouse won't move, and it doesn't make progress.
> 
> Configured with --enable-io-thread.  Perhaps the problem exists even
> before the patchset.

Probably unrelated, looks similar to the regression seen with qemu-kvm.

Agree this is no uq/master material. Please send directly.

Re: [Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Peter Maydell

On 2 March 2011 18:36, Stefan Weil  wrote:
> No. I dont't think that the third parameter of bitmap_clear is
> ok like that. See my patch for the correct value.

Wen's patch:

+const size_t width = ds_get_width(vd->ds) / 16;
[...]
-    bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
-    bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
-                 VNC_DIRTY_WORDS * BITS_PER_LONG);
+    bitmap_set(width_mask, 0, width);
+    bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG - width);

Your patch:

 bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+ (VNC_MAX_WIDTH - ds_get_width(vd->ds)) / 16);

Since ui/vnc.h has:

#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * BITS_PER_LONG))

the third parameter to bitmap_clear is the same value in
both cases, isn't it? Or is this a rounding bug?

-- PMM

Re: [Qemu-devel] Re: [PATCH] moving eeprom initialization

2011-03-02 Thread Gerhard Wiesinger


Hello,

Your patch should be based on fixes for correct EEPROM initialization, 
see for details: 
http://www.mail-archive.com/qemu-devel@nongnu.org/msg56414.html


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 2 Mar 2011, William Dauchy wrote:


On Wed, Mar 2, 2011 at 2:36 PM, William Dauchy  wrote:

The initialization should not be only on reset but also when initializing
the device.
It resolves a bug when hot plugging a pci network device: the mac address
was always null.
---
 hw/pcnet.c   |   27 ++-
 hw/rtl8139.c |   24 
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/hw/pcnet.c b/hw/pcnet.c
index db52dc5..4e30e9c 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1557,19 +1557,6 @@ uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap)
 void pcnet_h_reset(void *opaque)
 {
    PCNetState *s = opaque;
-    int i;
-    uint16_t checksum;
-
-    /* Initialize the PROM */
-
-    memcpy(s->prom, s->conf.macaddr.a, 6);
-    s->prom[12] = s->prom[13] = 0x00;
-    s->prom[14] = s->prom[15] = 0x57;
-
-    for (i = 0,checksum = 0; i < 16; i++)
-        checksum += s->prom[i];
-    *(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
-

    s->bcr[BCR_MSRDA] = 0x0005;
    s->bcr[BCR_MSWRA] = 0x0005;
@@ -1736,6 +1723,9 @@ void pcnet_common_cleanup(PCNetState *d)

 int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
 {
+    int i;
+    uint16_t checksum;
+
    s->poll_timer = qemu_new_timer(vm_clock, pcnet_poll_timer, s);

    qemu_macaddr_default_if_unset(&s->conf.macaddr);
@@ -1744,5 +1734,16 @@ int pcnet_common_init(DeviceState *dev, PCNetState *s, 
NetClientInfo *info)

    add_boot_device_path(s->conf.bootindex, dev, "/ethernet-phy@0");

+    /* Initialize the PROM */
+
+    memcpy(s->prom, s->conf.macaddr.a, 6);
+    s->prom[12] = s->prom[13] = 0x00;
+    s->prom[14] = s->prom[15] = 0x57;
+
+    for (i = 0, checksum = 0; i < 16; i++) {
+        checksum += s->prom[i];
+    }
+    *(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
+
    return 0;
 }
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..8356d5a 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1189,18 +1189,6 @@ static void rtl8139_reset(DeviceState *d)

    rtl8139_update_irq(s);

-    /* prepare eeprom */
-    s->eeprom.contents[0] = 0x8129;
-#if 1
-    // PCI vendor and device ID should be mirrored here
-    s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
-    s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
-#endif
-
-    s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
-    s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
-    s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
-
    /* mark all status registers as owned by host */
    for (i = 0; i < 4; ++i)
    {
@@ -3392,6 +3380,18 @@ static int pci_rtl8139_init(PCIDevice *dev)

    qemu_macaddr_default_if_unset(&s->conf.macaddr);

+    /* prepare eeprom */
+    s->eeprom.contents[0] = 0x8129;
+#if 1
+    /* PCI vendor and device ID should be mirrored here */
+    s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
+    s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
+#endif
+
+    s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
+    s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
+    s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
+
    s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
                          dev->qdev.info->name, dev->qdev.id, s);
    qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
--
William


hm I just noticed your correction here
http://lists.gnu.org/archive/html/qemu-devel/2011-02/msg03232.html ;
sorry
Anyway, I tested my version and it works fine.

Regards,

--
William

[Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Stefan Weil


Am 02.03.2011 11:57, schrieb Corentin Chary:

On Wed, Mar 2, 2011 at 3:58 AM, Wen Congyang  wrote:

This bug is reported by Stefan Weil:

Commit bc2429b9174ac2d3c56b7fd35884b0d89ec7fb02 introduced
a severe bug (heap corruption).

bitmap_clear was called with a wrong argument
which caused out-of-bound writes to width_mask.

This bug was detected with QEMU running on windows.
It also occurs with wine:

*** stack smashing detected ***:  terminated
wine: Unhandled illegal instruction at address 0x6115c7 (thread 
0009), starting debugger...


The bug is not windows specific!


The third argument of bitmap_clear() is number of bits to be cleared, 
but we pass

the end bits to be cleared to bitmap_clear().

Signed-off-by: Wen Congyang 
Reported-by: Stefan Weil 


Acked-by: Corentin Chary 



No. I dont't think that the third parameter of bitmap_clear is
ok like that. See my patch for the correct value.

My own patch is also incomplete, so I'll send an update.

Stefan



---
  ui/vnc.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index e3761b0..e7d0b5b 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2390,6 +2390,7 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
 unsigned long width_mask[VNC_DIRTY_WORDS];
 VncState *vs;
 int has_dirty = 0;
+const size_t width = ds_get_width(vd->ds) / 16;

 struct timeval tv = { 0, 0 };

@@ -2403,9 +2404,8 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
  * Check and copy modified bits from guest to server surface.
  * Update server dirty map.
  */
-bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
-bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
- VNC_DIRTY_WORDS * BITS_PER_LONG);
+bitmap_set(width_mask, 0, width);
+bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG - width);
 cmp_bytes = 16 * ds_get_bytes_per_pixel(vd->ds);
 guest_row  = vd->guest.ds->data;
 server_row = vd->server->data;
--
1.7.1

Re: [Qemu-devel] Re: OMAP3 bug: disassembler disagreed with translator

2011-03-02 Thread Антон Кочков

Good day!
Right now it works, you can see source at the http://gitorious.org/droid/qemu

Don't see at the hw/motorola.c - here is only stub yet, not with real
hardware, i'm experimenting with it.
But omap3* files works ok with every other hardware.

Here you can see examples how it works:
https://www.droid-developers.org/wiki/QEMU

It works ok, but somewhere at the middle of bootrom chain it also
rewrite low bootrom location. (see attached trace files). For this
case i'm only change register for jump to the high bootrom location,
but i think it need to be more smart.

Also i'm dont know where I can add support for Secure
World/Services/Any other early TrustZone staff + efuse driver.

Best regards,
Anton Kochkov.




On Fri, Feb 18, 2011 at 15:39, Peter Maydell  wrote:
> On 18 February 2011 12:24,   wrote:
>> On Feb 18, 2011, at 14:08, ext Антон Кочков wrote:
>>
>>> Here also modified omap3_boot.c file. I'm dont know why it is
>>> successfully load bootrom at 0x40014000, but cant do this for 0x14000
>>> - it always fill by zeroes.
>>> I'm only need copy memory from already loaded rom image from
>>> 0x40014000 to 0x00014000 and nothing more.
>>
>> Our OMAP3 boot emulation is a very simplified model. For example,
>> the boot ROM is never mapped to Q0 region, only Q1 region --
>> the boot ROM execution is started at 0x40014000 rather than
>> 0x00014000. There is no memory mapped at Q0 unless you attach
>> something to the GPMC before reset.
>
> However most boards do attach something to the GPMC;
> the Beagle puts a NAND device on GPMC CS0, which means that
> the GPMC will try to map NAND read/write functions on address
> 0 [*]. If you're trying to map a boot ROM there as well this will
> clash and you need to do something clever to work however
> the hardware does (presumably mapping the ROM there at startup
> and unmapping the ROM later in response to some undocumented
> event).
>
> [*] Only true in the version of omap_gpmc.c in the meego
> tree, but if you're doing omap3 stuff then I assume you're
> using that, not upstream.
>
> -- PMM
>


qemu.log.gz
Description: GNU Zip compressed data


omap3430_boot_rom.log.gz
Description: GNU Zip compressed data


qemu-crash.log.gz
Description: GNU Zip compressed data

[Qemu-devel] [PATCH] target-arm: Set carry flag correctly for Thumb2 ORNS

2011-03-02 Thread Peter Maydell

The code for Thumb2 ORNS (or negated and set flags) was trashing
a TCG input register which was needed later for use in calculating
flags, with the effect that the carry flag was always set with
the wrong sense. Fix this by using a TCG temporary instead.

Signed-off-by: Peter Maydell 
---
 target-arm/translate.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index dbd958b..8f4e16b 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7326,10 +7326,17 @@ gen_thumb2_data_op(DisasContext *s, int op, int conds, 
uint32_t shifter_out, TCG
 logic_cc = conds;
 break;
 case 3: /* orn */
-tcg_gen_not_i32(t1, t1);
-tcg_gen_or_i32(t0, t0, t1);
+{
+/* We can't just invert t1 in place because we might need it
+ * to calculate the carry flag later.
+ */
+TCGv tmp = tcg_temp_new_i32();
+tcg_gen_not_i32(tmp, t1);
+tcg_gen_or_i32(t0, t0, tmp);
+tcg_temp_free_i32(tmp);
 logic_cc = conds;
 break;
+}
 case 4: /* eor */
 tcg_gen_xor_i32(t0, t0, t1);
 logic_cc = conds;
-- 
1.7.1

Re: [Qemu-devel] Memory Map

2011-03-02 Thread Salvatore Lionetti

Hi,

many thanks for your response.

Now i'm i've avoided the unregistering stuff, map done already at desired 
address space.

Still now, some memory region is called with base+offset.

So:

[0x204] <= value (write from uP register)
cause
read(opaque, offset=204, value)

while
[0x504] <= value (write from uP register)
cause
read(opaque, offset=4, value)

The two opaque are different as expected.

Where i am wrong?

--- Ven 25/2/11, Blue Swirl  ha scritto:

> Da: Blue Swirl 
> Oggetto: Re: [Qemu-devel] Memory Map
> A: "Salvatore Lionetti" 
> Cc: qemu-devel@nongnu.org
> Data: Venerdì 25 febbraio 2011, 17:10
> On Thu, Feb 24, 2011 at 11:08 AM,
> Salvatore Lionetti
> 
> wrote:
> > Hi,
> >
> > This is what my board do
> >
> > cpu_register_physical_memory(0, 128*1024*1024, ...)
> > cpu_register_physical_memory(0xFF80, 8*1024*1024,
> ...)
> >
> > and this layout does not change over the entire live
> (virtual) of the board.
> >
> > For the following offset (1st column) and size in
> bytes (2nd column)
> > {0x00, 512},
> > {0x000200, 16},
> > {0x000300, 32},
> > {0x000400, 32},
> > {0x000500, 64},
> > {0x000600, 64},
> > {0x000700, 128},
> > {0x000800, 30},
> > {0x000900, 256},
> > {0x000A00, 44},
> > {0x000B00, 256},
> > {0x000C00, 24},
> > {0x000F00, 20},
> > {0x001000, 20},
> > {0x001100, 20},
> > {0x001400, 168},
> > {0x001800, 24},
> > {0x002000, 4096},
> > {0x003000, 24},
> > {0x003100, 24},
> > {0x004500, 36},
> > {0x005000, 224},
> > {0x008000, 768},
> > {0x008300, 16},
> >
> > i do, for each item,
> >
> > a = cpu_register_io_memory(r, w, o,
> DEVICE_NATIVE_ENDIAN)
> > cpu_register_physical_memory(_base+offset, len, a)
> >
> > And _base could be reprogrammed at any time. So before
> to change _base i:
> >
> > cpu_unregister_io_memory(a)
> >
> > What i see is that accessing to _base+
> > _base+0x005000 => Wake up r/w with offset 0
> > _base+0x000204 => Wake up r/w with offset 0x204
> >
> > So the question
> > - Am i wrong something?
> 
> cpu_unregister_io_memory() is the counterpart of
> cpu_register_io_memory(), it does not affect mappings
> created by
> cpu_register_physical_memory(). They should be removed
> first.
> 
> > - Is possible to map address with last
> TARGET_PAGE_BITS (es 0x200) bits set?
> 
> Yes.
>

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-03-02 Thread Avi Kivity


On 03/01/2011 05:51 PM, Anthony Liguori wrote:
Do a hot unplug of a network device with upstream libvirt with 
acpiphp unloaded, consult libvirt and then consult the monitor to 
see who has the right view of the guests config.




libvirt is right and the monitor is wrong.

On real hardware, calling _EJ0 doesn't affect the configuration one 
little bit (if I understand it correctly).  It just turns off power 
to the slot.  If you power-cycle, the card will be there.


It's up to the hardware vendor.  Since it's ACPI, it can result in any 
number of operations.  Usually, there's some logic to flip on an LED 
or something.


There's nothing that prevents a vendor from ejecting the card.  My 
point is that there aren't cleanly separated lines in the real world.


We can implement out virtual hardware like real hardware, or we can do 
some new stuff, and break our management model in the process.






Unless I'm hallucinating, you're suggesting quite a bit more.  A 
revolution in how qemu is to be managed.


Let me take another route to see if I can't persuade you.

First, let's clarify your proposal.  You want to introduce a new block 
format
that references to block devices.  It may also store a dirty bitmap to 
keep

track of which blocks are out of sync.  Hopefully, it goes without saying
that the dirty bitmap is strictly optional (it's a performance 
optimization) so

let's ignore it.


(as was related elsewhere, the state is also optional)



Your format, as a text file, looks like:

[raid1]
primary=diska.img
secondary=diskb.img
active=primary

To use it, here's the sequence:

0) qemu uses disk A for a block device

1) create a raid1 block device pointing to disk A and disk B.

2) management tool asks qemu to us the new raid1 block device.

3) qemu acks (2)

4) at some point, the mirror completes, writes are going to both disks

5) qemu sends out an event indicating that the disks are in sync

6) management tool then sends a command to fail over to disk B

7) qemu acks (6)

We're making the management tool the "authoritative" source of how to 
launch
QEMU.  That means that the management tool ultimately determines which 
command

line to relaunch QEMU with.

Here are the races:

A) If QEMU crashes between (2) and (3), it may have issues a write to 
the new
   raid1 block device before the management tool sees (3).  If this 
happens,

   when the management tool restarts QEMU with disk A, we're left with a
   dangling raid1 block device.  Not a critical failure, but not ideal.


You can restart qemu with the RAID1 blockdev.



B) If QEMU crashes between (6) and (7), QEMU may have started writing 
to disk
   B before the management tool sees (7).  This means that the 
management tool
   will create the guest with the raid1 block device which no longer 
is the
   correct disk.  This could fail in subtly bad ways.  Depending on 
how read
   is implemented (if you try to do striping for instance), bad data 
could be
   returned.  You could try to implement a policy of always reading 
from B if

   the block has been copied but this gets harry really quickly.  It's
   definitely not RAID1 anymore.


As related elsewhere, you restart qemu with image B.

The trick is to partition the problem into idempotent commands; these 
allow you to recover from any failure.




You may observe that the problem is not the RAID1 mechanism, but 
changing from
using a normal device and the RAID1 mechanism.  It would then be wise 
to say,
let's always use this image format.  Since that eliminates the race, 
we don't

really need the copy bitmap anymore.

Now we're left with a simple format that just refers to two 
filenames.  However,

block devices are more than just a filename.  It needs a format, cache
settings, etc.  So let's put this all in the RAID1 block format.  We 
also need

a way to indicate which block device is selected.

Let's make it a text file for purposes of discussion.  It will look 
something

like:

[primary]
filename=diska.img
cache=none
format=raw

[secondary]
filename=diskb.img
cache=writethrough
format=qcow2

[global]
active=primary

Since we might want to mirror multiple drives at once, we should 
probablyn
support having multiple drives configured which means we need to not 
just have

a single active entry, but an entry associated with a particular device.


Or you have one file per RAID-1 image set.  This is important because 
images are not associated with a qemu instance.  You can hot-unplug an 
image from one qemu and hot-plug it into another.




[drive "diskA"]
filename=diska.img
cache=none
format=raw

[drive "diskB"]
filename=diskb.img
cache=writethrough
format=qcow2

[device "vda"]
drive=diskB

And this is exactly what I'm proposing. 


It's exactly what I'm opposing.  Making qemu manage all this stuff.


It's really the natural generalization
of what you're proposing.

So basically, the only differences are:

 1) always use the new RAID1 format
 2) drop the progress bitmap
 3) support multi

[Qemu-devel] [PATCH] allow to load android binary

2011-03-02 Thread matthieu castet


Hi,

Android binary start with a weird elf program header : the first
one is of size 0 pointing to NULL addr.

Ignore LOAD program where MemSiz is 0.

Elf file type is EXEC (Executable file)
Entry point 0xb0001000
There are 5 program headers, starting at offset 52

Program Headers:
 Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 LOAD   0xd4 0x 0xb000 0x0 0x0 R   0x1000
 LOAD   0x001000 0xb0001000 0xb0001000 0x073d4 0x073d4 R E 0x1000
 LOAD   0x009000 0xb0009000 0xb0009000 0x0068c 0x0969c RW  0x1000
 GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0
 EXIDX  0x00801c 0xb000801c 0xb000801c 0x003b8 0x003b8 R   0x4

Section to Segment mapping:
 Segment Sections...
  00
  01 .text .rodata .ARM.extab .ARM.exidx
  02 .preinit_array .init_array .fini_array .ctors .data.rel.ro .got .data 
.bss
  03
  04 .ARM.exidx
>From 4d986b66e9ae04efeabde9ad73f60d3c2d6912f9 Mon Sep 17 00:00:00 2001
From: Matthieu CASTET 
Date: Wed, 2 Mar 2011 17:04:39 +0100
Subject: [PATCH] allow to load android binary

Android binary start with a weird elf program header : the first
one is of size 0 pointing to NULL addr.

Ignore LOAD program where MemSiz is 0.

Elf file type is EXEC (Executable file)
Entry point 0xb0001000
There are 5 program headers, starting at offset 52

Program Headers:
  Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD   0xd4 0x 0xb000 0x0 0x0 R   0x1000
  LOAD   0x001000 0xb0001000 0xb0001000 0x073d4 0x073d4 R E 0x1000
  LOAD   0x009000 0xb0009000 0xb0009000 0x0068c 0x0969c RW  0x1000
  GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0
  EXIDX  0x00801c 0xb000801c 0xb000801c 0x003b8 0x003b8 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01 .text .rodata .ARM.extab .ARM.exidx
   02 .preinit_array .init_array .fini_array .ctors .data.rel.ro .got .data .bss
   03
   04 .ARM.exidx
---
 linux-user/elfload.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 33d776d..284f3be 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1201,7 +1201,7 @@ static void load_elf_image(const char *image_name, int image_fd,
amount of memory to handle that.  */
 loaddr = -1, hiaddr = 0;
 for (i = 0; i < ehdr->e_phnum; ++i) {
-if (phdr[i].p_type == PT_LOAD) {
+if (phdr[i].p_type == PT_LOAD && phdr[i].p_memsz) {
 abi_ulong a = phdr[i].p_vaddr;
 if (a < loaddr) {
 loaddr = a;
@@ -1301,7 +1301,7 @@ static void load_elf_image(const char *image_name, int image_fd,
 
 for (i = 0; i < ehdr->e_phnum; i++) {
 struct elf_phdr *eppnt = phdr + i;
-if (eppnt->p_type == PT_LOAD) {
+if (eppnt->p_type == PT_LOAD && eppnt->p_memsz) {
 abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em;
 int elf_prot = 0;
 
-- 
1.7.4.1

[Qemu-devel] Re: [PATCH] moving eeprom initialization

2011-03-02 Thread William Dauchy

On Wed, Mar 2, 2011 at 2:36 PM, William Dauchy  wrote:
> The initialization should not be only on reset but also when initializing
> the device.
> It resolves a bug when hot plugging a pci network device: the mac address
> was always null.
> ---
>  hw/pcnet.c   |   27 ++-
>  hw/rtl8139.c |   24 
>  2 files changed, 26 insertions(+), 25 deletions(-)
>
> diff --git a/hw/pcnet.c b/hw/pcnet.c
> index db52dc5..4e30e9c 100644
> --- a/hw/pcnet.c
> +++ b/hw/pcnet.c
> @@ -1557,19 +1557,6 @@ uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap)
>  void pcnet_h_reset(void *opaque)
>  {
>     PCNetState *s = opaque;
> -    int i;
> -    uint16_t checksum;
> -
> -    /* Initialize the PROM */
> -
> -    memcpy(s->prom, s->conf.macaddr.a, 6);
> -    s->prom[12] = s->prom[13] = 0x00;
> -    s->prom[14] = s->prom[15] = 0x57;
> -
> -    for (i = 0,checksum = 0; i < 16; i++)
> -        checksum += s->prom[i];
> -    *(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
> -
>
>     s->bcr[BCR_MSRDA] = 0x0005;
>     s->bcr[BCR_MSWRA] = 0x0005;
> @@ -1736,6 +1723,9 @@ void pcnet_common_cleanup(PCNetState *d)
>
>  int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
>  {
> +    int i;
> +    uint16_t checksum;
> +
>     s->poll_timer = qemu_new_timer(vm_clock, pcnet_poll_timer, s);
>
>     qemu_macaddr_default_if_unset(&s->conf.macaddr);
> @@ -1744,5 +1734,16 @@ int pcnet_common_init(DeviceState *dev, PCNetState *s, 
> NetClientInfo *info)
>
>     add_boot_device_path(s->conf.bootindex, dev, "/ethernet-phy@0");
>
> +    /* Initialize the PROM */
> +
> +    memcpy(s->prom, s->conf.macaddr.a, 6);
> +    s->prom[12] = s->prom[13] = 0x00;
> +    s->prom[14] = s->prom[15] = 0x57;
> +
> +    for (i = 0, checksum = 0; i < 16; i++) {
> +        checksum += s->prom[i];
> +    }
> +    *(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
> +
>     return 0;
>  }
> diff --git a/hw/rtl8139.c b/hw/rtl8139.c
> index a22530c..8356d5a 100644
> --- a/hw/rtl8139.c
> +++ b/hw/rtl8139.c
> @@ -1189,18 +1189,6 @@ static void rtl8139_reset(DeviceState *d)
>
>     rtl8139_update_irq(s);
>
> -    /* prepare eeprom */
> -    s->eeprom.contents[0] = 0x8129;
> -#if 1
> -    // PCI vendor and device ID should be mirrored here
> -    s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
> -    s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
> -#endif
> -
> -    s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
> -    s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
> -    s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
> -
>     /* mark all status registers as owned by host */
>     for (i = 0; i < 4; ++i)
>     {
> @@ -3392,6 +3380,18 @@ static int pci_rtl8139_init(PCIDevice *dev)
>
>     qemu_macaddr_default_if_unset(&s->conf.macaddr);
>
> +    /* prepare eeprom */
> +    s->eeprom.contents[0] = 0x8129;
> +#if 1
> +    /* PCI vendor and device ID should be mirrored here */
> +    s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
> +    s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
> +#endif
> +
> +    s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
> +    s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
> +    s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
> +
>     s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
>                           dev->qdev.info->name, dev->qdev.id, s);
>     qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
> --
> William

hm I just noticed your correction here
http://lists.gnu.org/archive/html/qemu-devel/2011-02/msg03232.html ;
sorry
Anyway, I tested my version and it works fine.

Regards,

-- 
William

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-03-02 Thread Anthony Liguori


On 03/02/2011 08:00 AM, Avi Kivity wrote:

On 03/02/2011 02:39 PM, Anthony Liguori wrote:


Here is where your race is:

2. Management sends a switch command

3. QEMU receives switch command

4. QEMU stops doubling IO and switches to the destination

5. QEMU sends acknowledgement of switch command

6. Management receives acknowledge of switch command

7. Management changes internal state definition to reflect the new 
destination


If QEMU or the management tool crashes after step 4 and before step 
6, when the management tool restarts QEMU with the source image, data 
loss will have occurred (and potentially corruption if a flush had 
happened).


No.  After step 2, any qemu restart will be with the destination 
image.  If the management tool restarts, it can query the state (or 
just re-issue the switch command, which is idempotent).


Yeah, I think you're right, although I need to think through it a bit more.

Regards,

Anthony Liguori

[Qemu-devel] [PATCH] Don't allow multiwrites against a block device without underlying medium

2011-03-02 Thread Ryan Harper

If the block device has been closed, we no longer have a medium to submit
IO against, check for this before submitting io.  This prevents a segfault
further in the code where we dereference elements of the block driver.

Signed-off-by: Ryan Harper 
---
 block.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 92dd3fe..534e1bc 100644
--- a/block.c
+++ b/block.c
@@ -2407,6 +2407,11 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 return 0;
 }
 
+/* don't submit writes if we don't have a medium */
+if (bs->drv == NULL) {
+   return -1;
+}
+
 // Create MultiwriteCB structure
 mcb = qemu_mallocz(sizeof(*mcb) + num_reqs * sizeof(*mcb->callbacks));
 mcb->num_requests = 0;
-- 
1.7.1


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH] Do not delete BlockDriverState when deleting the drive

2011-03-02 Thread Ryan Harper

When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()
drive_uninit()
bdrv_delete()

When we bdrv_delete() we end up qemu_free()'ing the BlockDriverState pointer
however, the block devices retain a copy of this pointer, see
hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.

We now have a use-after-free situation.  If the guest continues to issue IO
against the device, and we've reallocated the memory that the BlockDriverState
pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into it's own function, bdrv_remove().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.  This state will be deleted if the guest
is asked and responds to a pci device removal request.

Reported-by: Markus Armbruster 
Signed-off-by: Ryan Harper 
---
 block.c|   11 ---
 block.h|1 +
 blockdev.c |2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..92dd3fe 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,19 @@ void bdrv_close_all(void)
 }
 }
 
+void bdrv_remove(BlockDriverState *bs)
+{
+if (bs->device_name[0] != '\0') {
+QTAILQ_REMOVE(&bdrv_states, bs, list);
+}
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs->peer);
 
 /* remove from list, if necessary */
-if (bs->device_name[0] != '\0') {
-QTAILQ_REMOVE(&bdrv_states, bs, list);
-}
+bdrv_remove(bs);
 
 bdrv_close(bs);
 if (bs->file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..8447397 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_remove(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index 0690cc8..1f57b50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 }
 
 /* clean up host side */
-drive_uninit(drive_get_by_blockdev(bs));
+bdrv_remove(bs);
 
 return 0;
 }
-- 
1.7.1


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

2011-03-02 Thread Michael Roth


On 03/02/2011 07:18 AM, Jes Sorensen wrote:

On 03/02/11 14:13, Michael Roth wrote:

On 03/02/2011 04:19 AM, Jes Sorensen wrote:



It is absolutely vital for me that we do not make things much more
complicated for users with this move. I don't want to get into a
situation where we start forcing external packages or daemons in order
to run basic QEMU. If we start requiring such, we have failed! However,
I think it is a reasonable compromise to have one daemon you launch, and
then a simple client you launch straight after which will then provide
the same/similar views and interfaces that users are used to when they
launch current QEMU (monitor, vnc server etc).


I think the challenge with this approach is defining an IPC mechanism
that is robust enough to support it. For instance, on one end of the
spectrum we have Spice, where shared memory paired with some mechanism
for IO notification between the client/server might make sense. But then
we have things like the human monitor where a socket interface or pipe
is clearly more the more straight-forward route.

We may find that it more desirable to have multiple classes of client
components, each contain within a client process with it's own IPC
mechanism. Although, multiple IPC mechanisms doesn't sound particularly
enticing either...but 2 might not be so unreasonable.


Hi Michael,

I think we need two types for sure, even for the video case, we will
still need a control channel as well. However, I don't think it is
desirable to split things up more than we have to, so if we can keep it
within one client process that is good. Maybe there are cases where it
makes sense to split it into more processes, I could be convinced, but I
think we really need to be careful making it too much of a complex mess
either.


Yup, if it's doable I'd prefer a single client process as well. Just 
hard to predict how difficult it'll be to support 2 or more mechanisms. 
Although, I'd imagine we'd end up with something like qemu's io loop, 
with event-driven shmem and fd-based work, which does seem doable.




Cheers,
Jes

[Qemu-devel] [PATCH] moving eeprom initialization

2011-03-02 Thread William Dauchy

The initialization should not be only on reset but also when initializing
the device.
It resolves a bug when hot plugging a pci network device: the mac address
was always null.
---
 hw/pcnet.c   |   27 ++-
 hw/rtl8139.c |   24 
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/hw/pcnet.c b/hw/pcnet.c
index db52dc5..4e30e9c 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1557,19 +1557,6 @@ uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap)
 void pcnet_h_reset(void *opaque)
 {
 PCNetState *s = opaque;
-int i;
-uint16_t checksum;
-
-/* Initialize the PROM */
-
-memcpy(s->prom, s->conf.macaddr.a, 6);
-s->prom[12] = s->prom[13] = 0x00;
-s->prom[14] = s->prom[15] = 0x57;
-
-for (i = 0,checksum = 0; i < 16; i++)
-checksum += s->prom[i];
-*(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
-
 
 s->bcr[BCR_MSRDA] = 0x0005;
 s->bcr[BCR_MSWRA] = 0x0005;
@@ -1736,6 +1723,9 @@ void pcnet_common_cleanup(PCNetState *d)
 
 int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
 {
+int i;
+uint16_t checksum;
+
 s->poll_timer = qemu_new_timer(vm_clock, pcnet_poll_timer, s);
 
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
@@ -1744,5 +1734,16 @@ int pcnet_common_init(DeviceState *dev, PCNetState *s, 
NetClientInfo *info)
 
 add_boot_device_path(s->conf.bootindex, dev, "/ethernet-phy@0");
 
+/* Initialize the PROM */
+
+memcpy(s->prom, s->conf.macaddr.a, 6);
+s->prom[12] = s->prom[13] = 0x00;
+s->prom[14] = s->prom[15] = 0x57;
+
+for (i = 0, checksum = 0; i < 16; i++) {
+checksum += s->prom[i];
+}
+*(uint16_t *)&s->prom[12] = cpu_to_le16(checksum);
+
 return 0;
 }
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..8356d5a 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1189,18 +1189,6 @@ static void rtl8139_reset(DeviceState *d)
 
 rtl8139_update_irq(s);
 
-/* prepare eeprom */
-s->eeprom.contents[0] = 0x8129;
-#if 1
-// PCI vendor and device ID should be mirrored here
-s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
-s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
-#endif
-
-s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
-s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
-s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
-
 /* mark all status registers as owned by host */
 for (i = 0; i < 4; ++i)
 {
@@ -3392,6 +3380,18 @@ static int pci_rtl8139_init(PCIDevice *dev)
 
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
+/* prepare eeprom */
+s->eeprom.contents[0] = 0x8129;
+#if 1
+/* PCI vendor and device ID should be mirrored here */
+s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
+s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
+#endif
+
+s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
+s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
+s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
+
 s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
   dev->qdev.info->name, dev->qdev.id, s);
 qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
-- 
William

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/02/11 14:13, Michael Roth wrote:
> On 03/02/2011 04:19 AM, Jes Sorensen wrote:

>> It is absolutely vital for me that we do not make things much more
>> complicated for users with this move. I don't want to get into a
>> situation where we start forcing external packages or daemons in order
>> to run basic QEMU. If we start requiring such, we have failed! However,
>> I think it is a reasonable compromise to have one daemon you launch, and
>> then a simple client you launch straight after which will then provide
>> the same/similar views and interfaces that users are used to when they
>> launch current QEMU (monitor, vnc server etc).
> 
> I think the challenge with this approach is defining an IPC mechanism
> that is robust enough to support it. For instance, on one end of the
> spectrum we have Spice, where shared memory paired with some mechanism
> for IO notification between the client/server might make sense. But then
> we have things like the human monitor where a socket interface or pipe
> is clearly more the more straight-forward route.
> 
> We may find that it more desirable to have multiple classes of client
> components, each contain within a client process with it's own IPC
> mechanism. Although, multiple IPC mechanisms doesn't sound particularly
> enticing either...but 2 might not be so unreasonable.

Hi Michael,

I think we need two types for sure, even for the video case, we will
still need a control channel as well. However, I don't think it is
desirable to split things up more than we have to, so if we can keep it
within one client process that is good. Maybe there are cases where it
makes sense to split it into more processes, I could be convinced, but I
think we really need to be careful making it too much of a complex mess
either.

Cheers,
Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

2011-03-02 Thread Michael Roth


On 03/02/2011 04:19 AM, Jes Sorensen wrote:

On 02/28/11 18:44, Anthony Liguori wrote:

On Feb 28, 2011 10:44 AM, "Jes Sorensen"  wrote:

Separating host-side virtagent and other tasks from core QEMU
=

To improve auditing of the core QEMU code, it would be ideal to be
able to separate the core QEMU functionality from utility
functionality by  moving the utility functionality into its own
process. This process will be referred to as the QEMU client below.

Separating as in moving code outside of qemu.git, making code optionally
built in, making code optionally built in or loadable as a separate
executable that is automatically launched, or making code always built
outside the main executable?

I'm very nervous about having a large number of daemons necessary to run
QEMU.  I think a reasonable approach would be a single front-end daemond.

Once QAPI is merged, there is a very incremental approach we can take for
this sort of work.  Take your favorite subsystem (like gdbstub or SDL) and
make it only use QMP apis.  Once we're only using QMP internally in a
subsystem, then building it out of the main QEMU and using libqmp should be
fairly easy.


Hi Anthony,

Back from a day off playing with power drills and concrete walls :)

Sorry I should have made it a little more clear, it was obvious to me,
but not written down:

The idea is to keep everything as part of the QEMU package, ie. part of
qemu.git. My idea is really to have one QEMU host daemon and one QEMU
client, which provides the various services. You type make and you get
two binaries instead of one. We could allow other daemons to connect to
the host daemon, but that wouldn't be a primary target for this in my
book, and I am not sure we really want to do this.

It is absolutely vital for me that we do not make things much more
complicated for users with this move. I don't want to get into a
situation where we start forcing external packages or daemons in order
to run basic QEMU. If we start requiring such, we have failed! However,
I think it is a reasonable compromise to have one daemon you launch, and
then a simple client you launch straight after which will then provide
the same/similar views and interfaces that users are used to when they
launch current QEMU (monitor, vnc server etc).


I think the challenge with this approach is defining an IPC mechanism 
that is robust enough to support it. For instance, on one end of the 
spectrum we have Spice, where shared memory paired with some mechanism 
for IO notification between the client/server might make sense. But then 
we have things like the human monitor where a socket interface or pipe 
is clearly more the more straight-forward route.


We may find that it more desirable to have multiple classes of client 
components, each contain within a client process with it's own IPC 
mechanism. Although, multiple IPC mechanisms doesn't sound particularly 
enticing either...but 2 might not be so unreasonable.




I hope this clarifies things.

Cheers,
Jes

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-03-02 Thread Avi Kivity


On 03/02/2011 02:39 PM, Anthony Liguori wrote:


Here is where your race is:

2. Management sends a switch command

3. QEMU receives switch command

4. QEMU stops doubling IO and switches to the destination

5. QEMU sends acknowledgement of switch command

6. Management receives acknowledge of switch command

7. Management changes internal state definition to reflect the new 
destination


If QEMU or the management tool crashes after step 4 and before step 6, 
when the management tool restarts QEMU with the source image, data 
loss will have occurred (and potentially corruption if a flush had 
happened).


No.  After step 2, any qemu restart will be with the destination image.  
If the management tool restarts, it can query the state (or just 
re-issue the switch command, which is idempotent).




This all boils down to the Two Generals Problem[1].  It's simply not 
fixable without making one end reliable and that means that someone 
needs to fsync() something *after* the switchover happens but before 
the first write happens.  That can be QEMU (Avi's RAID proposal and my 
state file proposal) or it can be the management tool (if we introduce 
synchronous events).


The two problems are not equivalent.  Once the management tool receives 
acknowledgement that the switch occurred, the protocol terminates.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] spice/qxl: locking fix for qemu-kvm

On Wed, Mar 02, 2011 at 02:32:03PM +0200, Alon Levy wrote:
> From: Gerd Hoffmann 

Err, that "From" got there by mistake, and the title should of course
not say "for qemu-kvm"..

> 
> qxl needs to release the qemu lock before calling some libspice
> functions (and re-aquire it later).  In upstream qemu qxl can just
> use qemu_mutex_{unlock,lock}_iothread.  In qemu-kvm this doesn't
> work, qxl needs additionally save+restore the cpu_single_env pointer
> on unlock+lock.
> 
> This fixes the following assertion in kvm_mutex_unlock that happened in the
> released qemu-kvm 0.14.0 on gentoo when using spice's qxl device:
> 
> > /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724:
> > kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
> 
> Happening as a result of io from the guest (qxl reset):
> > (gdb) bt
> > #0  0x75daa165 in raise () from /lib/libc.so.6
> > #1  0x75dab580 in abort () from /lib/libc.so.6
> > #2  0x75da3201 in __assert_fail () from /lib/libc.so.6
> > #3  0x00436f7e in kvm_mutex_unlock ()
> > at 
> > /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724
> > #4  qemu_mutex_unlock_iothread ()
> > at 
> > /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1737
> > #5  0x005e84ee in qxl_hard_reset (d=0x15d3080, loadvm=0)
> > at 
> > /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/hw/qxl.c:665
> > #6  0x005e9f9a in ioport_write (opaque=0x15d3080, addr= 
> According to Jan, this bug (the wrong value for cpu_single_env) is also 
> present
> in qemu, but no abort is triggered because it isn't asserted.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/qxl.c   |   37 +
>  ui/spice-display.c |   12 ++--
>  ui/spice-display.h |6 ++
>  3 files changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/qxl.c b/hw/qxl.c
> index fe4212b..117f7c8 100644
> --- a/hw/qxl.c
> +++ b/hw/qxl.c
> @@ -125,6 +125,27 @@ static void qxl_reset_memslots(PCIQXLDevice *d);
>  static void qxl_reset_surfaces(PCIQXLDevice *d);
>  static void qxl_ring_set_dirty(PCIQXLDevice *qxl);
>  
> +/* qemu-kvm locking ... */
> +void qxl_unlock_iothread(SimpleSpiceDisplay *ssd)
> +{
> +if (cpu_single_env) {
> +assert(ssd->env == NULL);
> +ssd->env = cpu_single_env;
> +cpu_single_env = NULL;
> +}
> +qemu_mutex_unlock_iothread();
> +}
> +
> +void qxl_lock_iothread(SimpleSpiceDisplay *ssd)
> +{
> +qemu_mutex_lock_iothread();
> +if (ssd->env) {
> +assert(cpu_single_env == NULL);
> +cpu_single_env = ssd->env;
> +ssd->env = NULL;
> +}
> +}
> +
>  static inline uint32_t msb_mask(uint32_t val)
>  {
>  uint32_t mask;
> @@ -662,10 +683,10 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
>  dprint(d, 1, "%s: start%s\n", __FUNCTION__,
> loadvm ? " (loadvm)" : "");
>  
> -qemu_mutex_unlock_iothread();
> +qxl_unlock_iothread(&d->ssd);
>  d->ssd.worker->reset_cursor(d->ssd.worker);
>  d->ssd.worker->reset_image_cache(d->ssd.worker);
> -qemu_mutex_lock_iothread();
> +qxl_lock_iothread(&d->ssd);
>  qxl_reset_surfaces(d);
>  qxl_reset_memslots(d);
>  
> @@ -795,9 +816,9 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
>  {
>  dprint(d, 1, "%s:\n", __FUNCTION__);
>  d->mode = QXL_MODE_UNDEFINED;
> -qemu_mutex_unlock_iothread();
> +qxl_unlock_iothread(&d->ssd);
>  d->ssd.worker->destroy_surfaces(d->ssd.worker);
> -qemu_mutex_lock_iothread();
> +qxl_lock_iothread(&d->ssd);
>  memset(&d->guest_surfaces.cmds, 0, sizeof(d->guest_surfaces.cmds));
>  }
>  
> @@ -866,9 +887,9 @@ static void qxl_destroy_primary(PCIQXLDevice *d)
>  dprint(d, 1, "%s\n", __FUNCTION__);
>  
>  d->mode = QXL_MODE_UNDEFINED;
> -qemu_mutex_unlock_iothread();
> +qxl_unlock_iothread(&d->ssd);
>  d->ssd.worker->destroy_primary_surface(d->ssd.worker, 0);
> -qemu_mutex_lock_iothread();
> +qxl_lock_iothread(&d->ssd);
>  }
>  
>  static void qxl_set_mode(PCIQXLDevice *d, int modenr, int loadvm)
> @@ -938,10 +959,10 @@ static void ioport_write(void *opaque, uint32_t addr, 
> uint32_t val)
>  case QXL_IO_UPDATE_AREA:
>  {
>  QXLRect update = d->ram->update_area;
> -qemu_mutex_unlock_iothread();
> +qxl_unlock_iothread(&d->ssd);
>  d->ssd.worker->update_area(d->ssd.worker, d->ram->update_surface,
> &update, NULL, 0, 0);
> -qemu_mutex_lock_iothread();
> +qxl_lock_iothread(&d->ssd);
>  break;
>  }
>  case QXL_IO_NOTIFY_CMD:
> diff --git a/ui/spice-display.c b/ui/spice-display.c
> index 020b423..defe652 100644
> --- a/ui/spice-display.c
> +++ b/ui/spice-display.c
> @@ -186,18 +186,18 @@ void qemu_spice_create_host_primary(SimpleSpiceDisplay 
> *ssd)
>  surface.mem= (i

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On Wed, Mar 02, 2011 at 01:04:58PM +0200, Dor Laor wrote:
> On 03/02/2011 12:58 PM, Alon Levy wrote:
> >On Wed, Mar 02, 2011 at 11:25:44AM +0100, Jes Sorensen wrote:
> >>On 03/01/11 15:25, Dor Laor wrote:
> >>>On 03/01/2011 02:40 PM, Anthony Liguori wrote:
> 
> On Mar 1, 2011 7:07 AM, "Dor Laor"   >  Qemu is the one that should spawn them and they should be transparent
> from the management. This way running qemu stays the same and qemu just
> need to add the logic to get a SIGCHILD and potentially re-execute an
> dead son process.
> 
> Spice is the logical place to start, no?  It's the largest single
> dependency we have and it does some scary things with qemu_mutex.  I
> would use spice as a way to prove the concept.
> >>>
> >>>I agree it is desirable to the this for spice but it is allot more
> >>>complex than virtagent isolation. Spice is performance sensitive and
> >>>contains much more state. It needs to access the guest memory for
> >>>reading the surfaces. It can be solved but needs some major changes.
> >>>Adding spice-devel to the discussion.
> >>
> >>I had a few thoughts about this already, which I think will work for
> >>both spice and vnc. What we could do is to expose the video memory via
> >>shared memory. That way a spice or vnc daemon could get direct access to
> >>the memory, this would limit communication to keyboard/mouse events, as
> >>well as video mode info, and possibly notifications to the client about
> >>which ranges of memory have been updated.
> >>
> >>Using shared memory this way should allow us to implement the video
> >>clients without performance loss, in fact it should be beneficial since
> >>it would allow them to run fully separate from the host daemon.
> >>
> >
> >I think that would work well for spice. Spice uses shared memory from the
> >pci device for both the framebuffer and surfaces/commands, but this is
> 
> Is that the only DMA do you do? That's good for this model.
> 

Yes. We have two memory bars (another for rom) and an io bar. The bars contain
the framebuffer (primary surface backing store, also used in vga mode), the
non primary surfaces backing store and the rings (display, cursor, and release).

In qxl (native) mode the rings are read/written to all the time, the rest by 
demand.
In vga mode it's identical to vnc, the framebuffer is written from the guest and
we track the dirty regions.

> >not really relevant at this level. What about IO and irq? that would add
> >additional latencies, no? because each io exit would need to be ipc'ed over
> >to the spice/vnc process? and same way in the other direction, requesting
> >qemu to trigger an interrupt in the next vm entry.
> 
> The qxl device can be in the privileged qemu (as a start) and it
> will handle irqs directly. Even today you need to notify the spice
> server thread, so nothing will change

ok, so qxl remains in qemu process. But all those io exits (well, most) end
up as calls to spice-server, now in another process. So latency wise it's the
same.

> >
> >>Cheers,
> >>Jes
> >>
> 
>

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-03-02 Thread Anthony Liguori

On 03/01/2011 03:59 AM, Dor Laor wrote:

On 02/28/2011 08:12 PM, Anthony Liguori wrote:

On Feb 28, 2011 11:47 AM, "Avi Kivity" mailto:a...@redhat.com>> wrote:
>
> On 02/28/2011 07:33 PM, Anthony Liguori wrote:
>>
>>
>> >
>> > You're just ignoring what I've written.
>>
>> No, you're just impervious to my subtle attempt to refocus the
discussion on solving a practical problem.
>>
>> There's a lot of good, reasonably straight forward changes we can
make that have a high return on investment.
>>
>
> Is making qemu the authoritative source of configuration information
a straightforward change?  Is the return on it high?  Is the 
investment low?

I think this is where we fundamentally disagree.  My position is that
QEMU is already the authoritative source.  Having a state file doesn't
change anything.

Do a hot unplug of a network device with upstream libvirt with acpiphp
unloaded, consult libvirt and then consult the monitor to see who has
the right view of the guests config.

To me, that's the definition of authoritative.

> "No" to all three (ignoring for the moment whether it is good or not,
which we were debating).
>
>
>> The only suggestion I'm making beyond Marcelo's original patch is
that we use a structured format and that we make it possible to use the
same file to solve this problem in multiple places.
>>
>
> No, you're suggesting a lot more than that.

That's exactly what I'm suggesting from a technical perspective.

>> I don't think this creates a fundamental break in how management
tools interact with QEMU.  I don't think introducing RAID support in the
block layer is a reasonable alternative.
>>
>>
>
> Why not?

Because its a lot of complexity and code that can go wrong while only
solving the race for one specific case.  Not to mention that we double
the iop rate.

> Something that avoids the whole state thing altogether:
>
> - instead of atomically switching when live copy is done, keep on
issuing writes to both the origin and the live copy
> - issue a notification to management
> - management receives the notification, and issues an atomic blockdev
switch command

> this is really the RAID-1 solution but without the state file (credit
Dor).  An advantage is that there is no additional latency when trying
to catch up to the dirty bitmap.

It still suffers from the two generals problem.  You cannot solve this
without making one node reliable and that takes us back to it being
either QEMU (posted event and state file) or the management tool (sync
event).

It is safe w/o a state file by changing the basic live copy algorithm:

1. Live copy in progress stage
   Once live copy command is issued, a dirty bit map is created for
   tracking. There is a single pass over the entire image where we copy
   blocks from the src to the dst.

   Write commands for blocks that were already copied will be done
   twice for the src and dst.

   Once the full copy single pass ends, we trigger a QMP event that
   this stage can end.

   The live copy stage keeps running till the management issue a switch
   command. When it will happen, the switch is immediate and no need to
   copy additional blocks (but flush pending IOs).

2. Management sends a switch command.
   Qemu stops the doubling the IO and switches to the destination.
   End.

Here is where your race is:

2. Management sends a switch command

3. QEMU receives switch command

4. QEMU stops doubling IO and switches to the destination

5. QEMU sends acknowledgement of switch command

6. Management receives acknowledge of switch command

7. Management changes internal state definition to reflect the new 
destination

If QEMU or the management tool crashes after step 4 and before step 6, 
when the management tool restarts QEMU with the source image, data loss 
will have occurred (and potentially corruption if a flush had happened).

This all boils down to the Two Generals Problem[1].  It's simply not 
fixable without making one end reliable and that means that someone 
needs to fsync() something *after* the switchover happens but before the 
first write happens.  That can be QEMU (Avi's RAID proposal and my state 
file proposal) or it can be the management tool (if we introduce 
synchronous events).

[1] http://en.wikipedia.org/wiki/Two_Generals%27_Problem

Regards,

Anthony Liguori

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

On Wed, Mar 02, 2011 at 12:34:24PM +0100, Jan Kiszka wrote:
> On 2011-03-02 11:56, Alon Levy wrote:
> > On Wed, Mar 02, 2011 at 09:22:35AM +0100, Jan Kiszka wrote:
> >> On 2011-03-01 13:58, Alon Levy wrote:
> >>> On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
>  On 2011-02-27 20:03, Alon Levy wrote:
> > On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
> >> On 2011-02-26 12:43, xming wrote:
> >>> When trying to start X (and it loads qxl driver) the kvm process just 
> >>> crashes.
> >
> > This is fixed by Gerd's attached patch (taken from rhel repository, 
> > don't know
> > why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
> > well (separate email).
> 
>  Patch looks OK on first glance, but the changelog is misleading: This
>  was broken for _both_ trees, but upstream didn't detect the bug.
> >>>
> >>> So I didn't test with qemu not having this patch, but according to the 
> >>> discussion in the
> >>> launchpad bug the problem only happens with qemu-kvm. This doesn't rule 
> >>> out it being a
> >>> bug, perhaps it is just triggered much less frequently I guess.
> >>
> >> Again: qemu-kvm has the instrumentation to detect the bug, qemu is
> >> lacking this, but both trees will break subtly if cpu_current_env is not
> >> properly restored.
> > 
> > ok, so what do you want to be done further before this patch is applied?
> 
> The patch posted to qemu-devel just requires a changelog that correctly
> reflects what it addresses (and where).

Just sent,

Alon

> 
> Jan
>

[Qemu-devel] [PATCH] spice/qxl: locking fix for qemu-kvm

From: Gerd Hoffmann 

qxl needs to release the qemu lock before calling some libspice
functions (and re-aquire it later).  In upstream qemu qxl can just
use qemu_mutex_{unlock,lock}_iothread.  In qemu-kvm this doesn't
work, qxl needs additionally save+restore the cpu_single_env pointer
on unlock+lock.

This fixes the following assertion in kvm_mutex_unlock that happened in the
released qemu-kvm 0.14.0 on gentoo when using spice's qxl device:

> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724:
> kvm_mutex_unlock: Assertion `!cpu_single_env' failed.

Happening as a result of io from the guest (qxl reset):
> (gdb) bt
> #0  0x75daa165 in raise () from /lib/libc.so.6
> #1  0x75dab580 in abort () from /lib/libc.so.6
> #2  0x75da3201 in __assert_fail () from /lib/libc.so.6
> #3  0x00436f7e in kvm_mutex_unlock ()
> at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724
> #4  qemu_mutex_unlock_iothread ()
> at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1737
> #5  0x005e84ee in qxl_hard_reset (d=0x15d3080, loadvm=0)
> at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/hw/qxl.c:665
> #6  0x005e9f9a in ioport_write (opaque=0x15d3080, addr=
---
 hw/qxl.c   |   37 +
 ui/spice-display.c |   12 ++--
 ui/spice-display.h |6 ++
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index fe4212b..117f7c8 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -125,6 +125,27 @@ static void qxl_reset_memslots(PCIQXLDevice *d);
 static void qxl_reset_surfaces(PCIQXLDevice *d);
 static void qxl_ring_set_dirty(PCIQXLDevice *qxl);
 
+/* qemu-kvm locking ... */
+void qxl_unlock_iothread(SimpleSpiceDisplay *ssd)
+{
+if (cpu_single_env) {
+assert(ssd->env == NULL);
+ssd->env = cpu_single_env;
+cpu_single_env = NULL;
+}
+qemu_mutex_unlock_iothread();
+}
+
+void qxl_lock_iothread(SimpleSpiceDisplay *ssd)
+{
+qemu_mutex_lock_iothread();
+if (ssd->env) {
+assert(cpu_single_env == NULL);
+cpu_single_env = ssd->env;
+ssd->env = NULL;
+}
+}
+
 static inline uint32_t msb_mask(uint32_t val)
 {
 uint32_t mask;
@@ -662,10 +683,10 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
 dprint(d, 1, "%s: start%s\n", __FUNCTION__,
loadvm ? " (loadvm)" : "");
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(&d->ssd);
 d->ssd.worker->reset_cursor(d->ssd.worker);
 d->ssd.worker->reset_image_cache(d->ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(&d->ssd);
 qxl_reset_surfaces(d);
 qxl_reset_memslots(d);
 
@@ -795,9 +816,9 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
 {
 dprint(d, 1, "%s:\n", __FUNCTION__);
 d->mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(&d->ssd);
 d->ssd.worker->destroy_surfaces(d->ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(&d->ssd);
 memset(&d->guest_surfaces.cmds, 0, sizeof(d->guest_surfaces.cmds));
 }
 
@@ -866,9 +887,9 @@ static void qxl_destroy_primary(PCIQXLDevice *d)
 dprint(d, 1, "%s\n", __FUNCTION__);
 
 d->mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(&d->ssd);
 d->ssd.worker->destroy_primary_surface(d->ssd.worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(&d->ssd);
 }
 
 static void qxl_set_mode(PCIQXLDevice *d, int modenr, int loadvm)
@@ -938,10 +959,10 @@ static void ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 case QXL_IO_UPDATE_AREA:
 {
 QXLRect update = d->ram->update_area;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(&d->ssd);
 d->ssd.worker->update_area(d->ssd.worker, d->ram->update_surface,
&update, NULL, 0, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(&d->ssd);
 break;
 }
 case QXL_IO_NOTIFY_CMD:
diff --git a/ui/spice-display.c b/ui/spice-display.c
index 020b423..defe652 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -186,18 +186,18 @@ void qemu_spice_create_host_primary(SimpleSpiceDisplay 
*ssd)
 surface.mem= (intptr_t)ssd->buf;
 surface.group_id   = MEMSLOT_GROUP_HOST;
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd->worker->create_primary_surface(ssd->worker, 0, &surface);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_destroy_host_primary(SimpleSpiceDisplay *ssd)
 {
 dprint(1, "%s:\n", __FUNCTION__);
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd->worker->destroy_primary_surface(ssd->worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_vm_chang

[Qemu-devel] [Request for inputs]Qemu parameters that need runtime change.

2011-03-02 Thread Prerna Saxena


Hi,
QEMU at present can be started with a huge list of parameters, and only 
a subset of these can be changed at runtime. For the remaining ones, one 
needs to restart the qemu instance.
I've been trying to put together a list of some such parameters, which 
would make good candidates for a runtime change. Request inputs on more 
such parameters that could make it here, and also whether the following 
are good-to-have features:
1. Allowing a runtime change from one chardev backend to another ( Eg, 
from TCP socket to unix, and vice-versa )

2. Changing the network interfaces (from -net user to -net tap ? )

I'm presently aware of these; it would be good to get more inputs on 
what more can be done here.


--
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

On 2011-03-02 11:56, Alon Levy wrote:
> On Wed, Mar 02, 2011 at 09:22:35AM +0100, Jan Kiszka wrote:
>> On 2011-03-01 13:58, Alon Levy wrote:
>>> On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:03, Alon Levy wrote:
> On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
>> On 2011-02-26 12:43, xming wrote:
>>> When trying to start X (and it loads qxl driver) the kvm process just 
>>> crashes.
>
> This is fixed by Gerd's attached patch (taken from rhel repository, don't 
> know
> why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
> well (separate email).

 Patch looks OK on first glance, but the changelog is misleading: This
 was broken for _both_ trees, but upstream didn't detect the bug.
>>>
>>> So I didn't test with qemu not having this patch, but according to the 
>>> discussion in the
>>> launchpad bug the problem only happens with qemu-kvm. This doesn't rule out 
>>> it being a
>>> bug, perhaps it is just triggered much less frequently I guess.
>>
>> Again: qemu-kvm has the instrumentation to detect the bug, qemu is
>> lacking this, but both trees will break subtly if cpu_current_env is not
>> properly restored.
> 
> ok, so what do you want to be done further before this patch is applied?

The patch posted to qemu-devel just requires a changelog that correctly
reflects what it addresses (and where).

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH -V2 4/6] hw/9pfs: Implement syncfs

2011-03-02 Thread Aneesh Kumar K. V

On Wed, 2 Mar 2011 10:20:41 +, Stefan Hajnoczi  wrote:
> On Wed, Mar 2, 2011 at 5:05 AM, Aneesh Kumar K. V
>  wrote:
> > On Tue, 1 Mar 2011 20:27:19 +, Stefan Hajnoczi  
> > wrote:
> >> On Tue, Mar 1, 2011 at 6:02 PM, Aneesh Kumar K. V
> >>  wrote:
> >> > On Tue, 1 Mar 2011 15:59:19 +, Stefan Hajnoczi  
> >> > wrote:
> >> >> >> Please explain the semantics of P9_TSYNCFS.  Won't returning success
> >> >> >> without doing anything lead to data integrity issues?
> >> >> >
> >> >> > I should actually do the 9P Operation format as commit message. Will
> >> >> > add in the next update. Whether returning here would cause a data
> >> >> > integrity issue, it depends what sort of guarantee we want to
> >> >> > provide. So calling sync on the guest will cause all the dirty pages 
> >> >> > in
> >> >> > the guest to be flushed to host. Now all those changes are in the host
> >> >> > page cache and it would be nice to flush them  as a part of sync but
> >> >> > then since we don't have a per file system sync, the above would imply
> >> >> > we flush all dirty pages on the host which can result in large
> >> >> > performance impact.
> >> >>
> >> >> You get the define the semantics of P9_TSYNCFS?  I thought this is
> >> >> part of a well-defined protocol :).  If this is a .L extension then
> >> >> it's probably a bad design and shouldn't be added to the protocol if
> >> >> we can't implement it.
> >> >
> >> > It is a part of .L extension and we can definitely implement it. There
> >> > is patch out there which is yet to be merged
> >> >
> >> > http://thread.gmane.org/gmane.linux.file-systems/44628
> >>
> >> A future Linux-only ioctl :/.
> >>
> >> >> Is this operation supposed to flush the disk write cache too?
> >> >
> >> > I am not sure we need to specify that as a part of 9p operation. I guess
> >> > we can only say maximum possible data integrity. Whether a sync will
> >> > cause disk write cache flush depends on the file system. For ext* that
> >> > can be controlled by mount option barrier.
> >>
> >> So on a host with a safe configuration this operation should put data
> >> on stable storage?
> >>
> >> >>
> >> >> I think virtio-9p has a file descriptor cache.  Would it be possible
> >> >> to fsync() those file descriptors?
> >> >>
> >> >
> >> > Ideally we should. But that would involve a large number of fsync calls.
> >>
> >> Yep, that's why this is a weird operation to support, especially since
> >> it's a .L add-on and not original 9P.  What's the use-case since
> >> today's Linux userland cannot directly make use of this operation?  I
> >> guess it has been added in order to pass-through a Linux internal vfs
> >> super block sync function?
> >
> > IMHO it would be nice to have a syncfs 9p operation because that enables
> > the client to say "if possible" flush the dirty data on the server. I
> > guess we should consider this as something server can choose to
> > ignore. In a cloud setup even doing a per file system sync can imply
> > performance impact because VirtFS export may not 1:1 map to mount point
> > on host. There is also plan to add a new option to the VirtFs export point
> > which enable write to exported files to be either O_SYNC or
> > O_DIRECT, similar to the way done for image files. That would imply
> > Tsyncfs doesn't have much to do because we don't have dirty data on host
> > pagecache anymore.
> >
> > So from 9p .L protocol point of view, it is a valid operation which
> > enables client to request a flush of server cache if possible. And qemu
> > 9p server choose to ignore because of the performance impact. If you are
> > not comfortable with not doing anything specific in Tsyncfs
> > operation, we can add sync(2) call as part of this 9p operation and
> > later switch to  FS_IOC_SYNCFS when it become available.
> 
> The case we need to prevent is where applications running on virtfs
> think they are getting guarantees that the implementation does not
> provide.  Silently noping a sync operation is a step in that direction
> so I agree that sync(2) would be safer.

I should capture as a part of the commit message that some
servers can chose to ignore the request and return success.

> 
> I'm not sure I understand your 1:1 mount point mapping argument.  The
> FS_IOC_SYNCFS ioctl does not help us there since it syncs the entire
> filesystem, not the directory tree that virtfs is mapped to.  It will
> do a bunch of extra I/O similar to how sync(2) does this across all
> filesystems today.  This suggests again that P9_TSYNCFS is hard to
> implement because FS_IOC_SYNCFS ends up not being useful.  Did I miss
> something?

What i meant was that even with 1:1  mount point mapping some
servers can choose to not implement cache flush due to the performance
implication associated with TSYNCFS. (That argument also justifies Qemu
9p server ignoring the TSYNCFS request.) So from the client perspective
it is just a hint to the server to flush the server cache "if possible".

> 
> I'm looking for

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

2011-03-02 Thread Dor Laor

On 03/02/2011 12:58 PM, Alon Levy wrote:

On Wed, Mar 02, 2011 at 11:25:44AM +0100, Jes Sorensen wrote:

On 03/01/11 15:25, Dor Laor wrote:

On 03/01/2011 02:40 PM, Anthony Liguori wrote:

On Mar 1, 2011 7:07 AM, "Dor Laor" Qemu is the one that should spawn them and they should be transparent
from the management. This way running qemu stays the same and qemu just
need to add the logic to get a SIGCHILD and potentially re-execute an
dead son process.

Spice is the logical place to start, no? It's the largest single
dependency we have and it does some scary things with qemu_mutex. I
would use spice as a way to prove the concept.

I agree it is desirable to the this for spice but it is allot more
complex than virtagent isolation. Spice is performance sensitive and
contains much more state. It needs to access the guest memory for
reading the surfaces. It can be solved but needs some major changes.
Adding spice-devel to the discussion.

I had a few thoughts about this already, which I think will work for
both spice and vnc. What we could do is to expose the video memory via
shared memory. That way a spice or vnc daemon could get direct access to
the memory, this would limit communication to keyboard/mouse events, as
well as video mode info, and possibly notifications to the client about
which ranges of memory have been updated.

Using shared memory this way should allow us to implement the video
clients without performance loss, in fact it should be beneficial since
it would allow them to run fully separate from the host daemon.

I think that would work well for spice. Spice uses shared memory from the
pci device for both the framebuffer and surfaces/commands, but this is

Is that the only DMA do you do? That's good for this model.

not really relevant at this level. What about IO and irq? that would add
additional latencies, no? because each io exit would need to be ipc'ed over
to the spice/vnc process? and same way in the other direction, requesting
qemu to trigger an interrupt in the next vm entry.

The qxl device can be in the privileged qemu (as a start) and it will
handle irqs directly. Even today you need to notify the spice server
thread, so nothing will change

Cheers,
Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/02/11 11:58, Alon Levy wrote:
> On Wed, Mar 02, 2011 at 11:25:44AM +0100, Jes Sorensen wrote:
>> I had a few thoughts about this already, which I think will work for
>> both spice and vnc. What we could do is to expose the video memory via
>> shared memory. That way a spice or vnc daemon could get direct access to
>> the memory, this would limit communication to keyboard/mouse events, as
>> well as video mode info, and possibly notifications to the client about
>> which ranges of memory have been updated.
>>
>> Using shared memory this way should allow us to implement the video
>> clients without performance loss, in fact it should be beneficial since
>> it would allow them to run fully separate from the host daemon.
>>
> 
> I think that would work well for spice. Spice uses shared memory from the
> pci device for both the framebuffer and surfaces/commands, but this is
> not really relevant at this level. What about IO and irq? that would add
> additional latencies, no? because each io exit would need to be ipc'ed over
> to the spice/vnc process? and same way in the other direction, requesting
> qemu to trigger an interrupt in the next vm entry.

I am glad the shmem approach will work for Spice. There would be
something there with IRQs etc, I agree. but there are methods to help
that too, we could use a shared memory event ring for example.

Cheers,
Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/02/11 11:56, Dor Laor wrote:
> On 03/02/2011 12:25 PM, Jes Sorensen wrote:
>> On 03/01/11 15:25, Dor Laor wrote:
>> Using shared memory this way should allow us to implement the video
>> clients without performance loss, in fact it should be beneficial since
>> it would allow them to run fully separate from the host daemon.
> 
> Why do you call it a daemon? Each VM instance should have only one, the
> 'host daemon' naming is misleading.

I refer to it as a daemon because it is something the client(s) will
connect to. But yes, there will be a daemon per VM.

> The proper solution long term is to sandbox qemu in a way that there
> privileged mode and non privileged mode. It might be implemented using
> separate address space or not. Most operations like vnc/rpc/spice/usb
> should be run with less privileges.
> 
> The main issue is that doing it right will take time and we'll want
> virt-agent be merged before the long term solution is ready. The best
> approach would be gradual development

Yes I agree, I don't think this will happen overnight, and blocking
virtagent with this would be bad.

Cheers,
Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On Wed, Mar 02, 2011 at 11:25:44AM +0100, Jes Sorensen wrote:
> On 03/01/11 15:25, Dor Laor wrote:
> > On 03/01/2011 02:40 PM, Anthony Liguori wrote:
> >>
> >> On Mar 1, 2011 7:07 AM, "Dor Laor"  >>  > Qemu is the one that should spawn them and they should be transparent
> >> from the management. This way running qemu stays the same and qemu just
> >> need to add the logic to get a SIGCHILD and potentially re-execute an
> >> dead son process.
> >>
> >> Spice is the logical place to start, no?  It's the largest single
> >> dependency we have and it does some scary things with qemu_mutex.  I
> >> would use spice as a way to prove the concept.
> > 
> > I agree it is desirable to the this for spice but it is allot more
> > complex than virtagent isolation. Spice is performance sensitive and
> > contains much more state. It needs to access the guest memory for
> > reading the surfaces. It can be solved but needs some major changes.
> > Adding spice-devel to the discussion.
> 
> I had a few thoughts about this already, which I think will work for
> both spice and vnc. What we could do is to expose the video memory via
> shared memory. That way a spice or vnc daemon could get direct access to
> the memory, this would limit communication to keyboard/mouse events, as
> well as video mode info, and possibly notifications to the client about
> which ranges of memory have been updated.
> 
> Using shared memory this way should allow us to implement the video
> clients without performance loss, in fact it should be beneficial since
> it would allow them to run fully separate from the host daemon.
> 

I think that would work well for spice. Spice uses shared memory from the
pci device for both the framebuffer and surfaces/commands, but this is
not really relevant at this level. What about IO and irq? that would add
additional latencies, no? because each io exit would need to be ipc'ed over
to the spice/vnc process? and same way in the other direction, requesting
qemu to trigger an interrupt in the next vm entry.

> Cheers,
> Jes
>

[Qemu-devel] Re: [PATCH RESEND v2 1/2] fix vnc regression

2011-03-02 Thread Corentin Chary

On Wed, Mar 2, 2011 at 3:46 AM, Wen Congyang  wrote:
> This patch fix the following two regressions:
> 1. we should use bitmap_set() and bitmap_clear() to replace vnc_set_bits().
> 2. The unit of bitmap_intersects()'third parameter is bit, not words.
>   But we pass the num of words to bitmap_intersects().
>
> Changes from v1 to v2:
> 1. fix the third argument of bitmap_clear()
>
> Signed-off-by: Wen Congyang 

Acked-by: Corentin Chary 

> ---
>  ui/vnc.c |   11 ---
>  1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/ui/vnc.c b/ui/vnc.c
> index 610f884..e3761b0 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -1645,17 +1645,21 @@ static void framebuffer_update_request(VncState *vs, 
> int incremental,
>                                        int x_position, int y_position,
>                                        int w, int h)
>  {
> +    int i;
> +    const size_t width = ds_get_width(vs->ds) / 16;
> +
>     if (y_position > ds_get_height(vs->ds))
>         y_position = ds_get_height(vs->ds);
>     if (y_position + h >= ds_get_height(vs->ds))
>         h = ds_get_height(vs->ds) - y_position;
>
> -    int i;
>     vs->need_update = 1;
>     if (!incremental) {
>         vs->force_update = 1;
>         for (i = 0; i < h; i++) {
> -            bitmap_set(vs->dirty[y_position + i], x_position / 16, w / 16);
> +            bitmap_set(vs->dirty[y_position + i], 0, width);
> +            bitmap_clear(vs->dirty[y_position + i], width,
> +                         VNC_DIRTY_WORDS * BITS_PER_LONG - width);
>         }
>     }
>  }
> @@ -2406,7 +2410,8 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
>     guest_row  = vd->guest.ds->data;
>     server_row = vd->server->data;
>     for (y = 0; y < vd->guest.ds->height; y++) {
> -        if (bitmap_intersects(vd->guest.dirty[y], width_mask, 
> VNC_DIRTY_WORDS)) {
> +        if (bitmap_intersects(vd->guest.dirty[y], width_mask,
> +            VNC_DIRTY_WORDS * BITS_PER_LONG)) {
>             int x;
>             uint8_t *guest_ptr;
>             uint8_t *server_ptr;
> --
> 1.7.1
>



-- 
Corentin Chary
http://xf.iksaif.net

[Qemu-devel] Re: [PATCH RESEND 2/2] vnc: Fix heap corruption

2011-03-02 Thread Corentin Chary

On Wed, Mar 2, 2011 at 3:58 AM, Wen Congyang  wrote:
> This bug is reported by Stefan Weil:
> 
> Commit bc2429b9174ac2d3c56b7fd35884b0d89ec7fb02 introduced
> a severe bug (heap corruption).
>
> bitmap_clear was called with a wrong argument
> which caused out-of-bound writes to width_mask.
>
> This bug was detected with QEMU running on windows.
> It also occurs with wine:
>
> *** stack smashing detected ***:  terminated
> wine: Unhandled illegal instruction at address 0x6115c7 (thread 0009), 
> starting debugger...
>
> The bug is not windows specific!
> 
>
> The third argument of bitmap_clear() is number of bits to be cleared, but we 
> pass
> the end bits to be cleared to bitmap_clear().
>
> Signed-off-by: Wen Congyang 
> Reported-by: Stefan Weil 

Acked-by: Corentin Chary 

> ---
>  ui/vnc.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/ui/vnc.c b/ui/vnc.c
> index e3761b0..e7d0b5b 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -2390,6 +2390,7 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
>     unsigned long width_mask[VNC_DIRTY_WORDS];
>     VncState *vs;
>     int has_dirty = 0;
> +    const size_t width = ds_get_width(vd->ds) / 16;
>
>     struct timeval tv = { 0, 0 };
>
> @@ -2403,9 +2404,8 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
>      * Check and copy modified bits from guest to server surface.
>      * Update server dirty map.
>      */
> -    bitmap_set(width_mask, 0, (ds_get_width(vd->ds) / 16));
> -    bitmap_clear(width_mask, (ds_get_width(vd->ds) / 16),
> -                 VNC_DIRTY_WORDS * BITS_PER_LONG);
> +    bitmap_set(width_mask, 0, width);
> +    bitmap_clear(width_mask, width, VNC_DIRTY_WORDS * BITS_PER_LONG - width);
>     cmp_bytes = 16 * ds_get_bytes_per_pixel(vd->ds);
>     guest_row  = vd->guest.ds->data;
>     server_row = vd->server->data;
> --
> 1.7.1
>
>



-- 
Corentin Chary
http://xf.iksaif.net

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

On Wed, Mar 02, 2011 at 09:22:35AM +0100, Jan Kiszka wrote:
> On 2011-03-01 13:58, Alon Levy wrote:
> > On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
> >> On 2011-02-27 20:03, Alon Levy wrote:
> >>> On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
>  On 2011-02-26 12:43, xming wrote:
> > When trying to start X (and it loads qxl driver) the kvm process just 
> > crashes.
> >>>
> >>> This is fixed by Gerd's attached patch (taken from rhel repository, don't 
> >>> know
> >>> why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
> >>> well (separate email).
> >>
> >> Patch looks OK on first glance, but the changelog is misleading: This
> >> was broken for _both_ trees, but upstream didn't detect the bug.
> > 
> > So I didn't test with qemu not having this patch, but according to the 
> > discussion in the
> > launchpad bug the problem only happens with qemu-kvm. This doesn't rule out 
> > it being a
> > bug, perhaps it is just triggered much less frequently I guess.
> 
> Again: qemu-kvm has the instrumentation to detect the bug, qemu is
> lacking this, but both trees will break subtly if cpu_current_env is not
> properly restored.

ok, so what do you want to be done further before this patch is applied?

> 
> Jan
>

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

2011-03-02 Thread Dor Laor

On 03/02/2011 12:25 PM, Jes Sorensen wrote:

On 03/01/11 15:25, Dor Laor wrote:

On 03/01/2011 02:40 PM, Anthony Liguori wrote:

Spice is the logical place to start, no? It's the largest single
dependency we have and it does some scary things with qemu_mutex. I
would use spice as a way to prove the concept.

Why do you call it a daemon? Each VM instance should have only one, the
'host daemon' naming is misleading.

The proper solution long term is to sandbox qemu in a way that there
privileged mode and non privileged mode. It might be implemented using
separate address space or not. Most operations like vnc/rpc/spice/usb
should be run with less privileges.

The main issue is that doing it right will take time and we'll want
virt-agent be merged before the long term solution is ready. The best
approach would be gradual development

Cheers,
Jes

Re: [Qemu-devel] [Bug 723871] Re: qemu-kvm-0.14.0 Aborts with -vga qxl

On Tue, Mar 01, 2011 at 12:53:40PM -0600, Rick Vernam wrote:
> On Tuesday 01 March 2011 12:29:14 Serge Hallyn wrote:
> > @Rick,
> > 
> > would you expect a fedora guest to reproduce this?  Would it have the
> > qxl driver?  Or must it be Windows?
> 
> I don't have a fedora guest to test on, and I don't know the implementation 
> details well enough to postulate.
> -Rick
> 

This is guest independent, as long as the guest uses a qxl driver.

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/02/11 11:42, Dor Laor wrote:
> On 03/02/2011 12:28 PM, Jes Sorensen wrote:
>> On 03/01/11 15:25, Dor Laor wrote:
>>> I agree it is desirable to the this for spice but it is allot more
>>> complex than virtagent isolation. Spice is performance sensitive and
>>> contains much more state. It needs to access the guest memory for
>>> reading the surfaces. It can be solved but needs some major changes.
>>> Adding spice-devel to the discussion.
>>
>> Please don't add broken misconfigured mailing lists, which require
>> moderator or subscription, to discussions on public lists.
> 
> Isn't it simpler to ask the spice list maintainer (Alon Levy, CCed) to
> do it?

We cannot have a discussion on a mailing list if we constantly have to
wait for the list maintainer to approve it. There is really zero
justification for closing a development list of an open source project.
Spam is handled in other ways, this doesn't solve it.

Cheers,
Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

2011-03-02 Thread Dor Laor


On 03/02/2011 12:28 PM, Jes Sorensen wrote:

On 03/01/11 15:25, Dor Laor wrote:

On 03/01/2011 02:40 PM, Anthony Liguori wrote:

Spice is the logical place to start, no?  It's the largest single
dependency we have and it does some scary things with qemu_mutex.  I
would use spice as a way to prove the concept.


I agree it is desirable to the this for spice but it is allot more
complex than virtagent isolation. Spice is performance sensitive and
contains much more state. It needs to access the guest memory for
reading the surfaces. It can be solved but needs some major changes.
Adding spice-devel to the discussion.


Please don't add broken misconfigured mailing lists, which require
moderator or subscription, to discussions on public lists.


Isn't it simpler to ask the spice list maintainer (Alon Levy, CCed) to 
do it?




Jes

Re: [Qemu-devel] REPOST: [PATCH v3] tracetool: Add optional argument to specify dtrace probe names

2011-03-02 Thread Stefan Hajnoczi

On Wed, Mar 2, 2011 at 8:22 AM,   wrote:
> From: Jes Sorensen 
>
> Optional feature allowing a user to generate the probe list to match
> the name of the binary, in case they wish to install qemu under a
> different name than qemu-{system,user},
>
> Signed-off-by: Jes Sorensen 
> ---
>  scripts/tracetool |   19 +--
>  1 files changed, 13 insertions(+), 6 deletions(-)

Yes, please merge.

Acked-by: Stefan Hajnoczi

[Qemu-devel] [Bug 723871] Re: qemu-kvm-0.14.0 Aborts with -vga qxl

2011-03-02 Thread Dave Walker

@Serge, I had to re-target your branch (and merge) against 0.14.0~rc1
+noroms-0ubuntu4 as *ubuntu3 had already been uploaded for a different
fix, and the package-importer failed to suck it in.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/723871

Title:
  qemu-kvm-0.14.0 Aborts with -vga qxl

Status in QEMU:
  Confirmed
Status in “qemu-kvm” package in Ubuntu:
  Fix Released

Bug description:
  Host CPU is Core i7 Q820.  KVM is from 2.6.35-gentoo-r5 kernel (x86_64).
  Host has spice-0.7.2 and spice-protocol-0.7.0.
  Guest is Windows XP SP3 with qxl driver 0.6.1, virtio-serial 1.1.6 and 
vdagent 0.6.3.

  qemu-kvm is started like so:
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive 
file=/home/rick/qemu/hds/wxp.raw,if=virtio,media=disk,aio=native,snapshot=on -m 
768 -name WinXP -net nic,model=virtio -net user -localtime -usb -vga qxl 
-device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent -device 
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice 
port=1234,disable-ticketing -monitor stdio
  and crashes with:
  qemu-system-x86_64: /home/rick/qemu/src/qemu-kvm-0.14.0/qemu-kvm.c:1724: 
kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
  Aborted

  If I use -no-kvm, it works fine.  If I use -vga std, it works fine.
  -enable-kvm and -vga qxl crashes.

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/01/11 15:25, Dor Laor wrote:
> On 03/01/2011 02:40 PM, Anthony Liguori wrote:
>> Spice is the logical place to start, no?  It's the largest single
>> dependency we have and it does some scary things with qemu_mutex.  I
>> would use spice as a way to prove the concept.
> 
> I agree it is desirable to the this for spice but it is allot more
> complex than virtagent isolation. Spice is performance sensitive and
> contains much more state. It needs to access the guest memory for
> reading the surfaces. It can be solved but needs some major changes.
> Adding spice-devel to the discussion.

Please don't add broken misconfigured mailing lists, which require
moderator or subscription, to discussions on public lists.

Jes

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/01/11 15:25, Dor Laor wrote:
> On 03/01/2011 02:40 PM, Anthony Liguori wrote:
>>
>> On Mar 1, 2011 7:07 AM, "Dor Laor" >  > Qemu is the one that should spawn them and they should be transparent
>> from the management. This way running qemu stays the same and qemu just
>> need to add the logic to get a SIGCHILD and potentially re-execute an
>> dead son process.
>>
>> Spice is the logical place to start, no?  It's the largest single
>> dependency we have and it does some scary things with qemu_mutex.  I
>> would use spice as a way to prove the concept.
> 
> I agree it is desirable to the this for spice but it is allot more
> complex than virtagent isolation. Spice is performance sensitive and
> contains much more state. It needs to access the guest memory for
> reading the surfaces. It can be solved but needs some major changes.
> Adding spice-devel to the discussion.

I had a few thoughts about this already, which I think will work for
both spice and vnc. What we could do is to expose the video memory via
shared memory. That way a spice or vnc daemon could get direct access to
the memory, this would limit communication to keyboard/mouse events, as
well as video mode info, and possibly notifications to the client about
which ranges of memory have been updated.

Using shared memory this way should allow us to implement the video
clients without performance loss, in fact it should be beneficial since
it would allow them to run fully separate from the host daemon.

Cheers,
Jes

[Qemu-devel] [Bug 723871] Re: qemu-kvm-0.14.0 Aborts with -vga qxl

2011-03-02 Thread Launchpad Bug Tracker

This bug was fixed in the package qemu-kvm - 0.14.0~rc1+noroms-0ubuntu4

---
qemu-kvm (0.14.0~rc1+noroms-0ubuntu4) natty; urgency=low

  * Apply spice-qxl-locking-fix-for-qemu-kvm.patch to fix bug with -qxl.
(LP: #723871)
 -- Serge HallynTue, 01 Mar 2011 11:12:44 -0600

** Changed in: qemu-kvm (Ubuntu)
   Status: In Progress => Fix Released

** Branch linked: lp:ubuntu/qemu-kvm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/723871

Title:
  qemu-kvm-0.14.0 Aborts with -vga qxl

Status in QEMU:
  Confirmed
Status in “qemu-kvm” package in Ubuntu:
  Fix Released

Bug description:
  Host CPU is Core i7 Q820.  KVM is from 2.6.35-gentoo-r5 kernel (x86_64).
  Host has spice-0.7.2 and spice-protocol-0.7.0.
  Guest is Windows XP SP3 with qxl driver 0.6.1, virtio-serial 1.1.6 and 
vdagent 0.6.3.

  qemu-kvm is started like so:
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive 
file=/home/rick/qemu/hds/wxp.raw,if=virtio,media=disk,aio=native,snapshot=on -m 
768 -name WinXP -net nic,model=virtio -net user -localtime -usb -vga qxl 
-device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent -device 
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice 
port=1234,disable-ticketing -monitor stdio
  and crashes with:
  qemu-system-x86_64: /home/rick/qemu/src/qemu-kvm-0.14.0/qemu-kvm.c:1724: 
kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
  Aborted

  If I use -no-kvm, it works fine.  If I use -vga std, it works fine.
  -enable-kvm and -vga qxl crashes.

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 03/01/11 13:07, Dor Laor wrote:
> On 02/28/2011 07:44 PM, Anthony Liguori wrote:
>> I'm very nervous about having a large number of daemons necessary to run
>> QEMU.  I think a reasonable approach would be a single front-end daemond.
> 
> s/daemon/son processes/
> Qemu is the one that should spawn them and they should be transparent
> from the management. This way running qemu stays the same and qemu just
> need to add the logic to get a SIGCHILD and potentially re-execute an
> dead son process.

I disagree here, I do not want to require QEMU to spawn the new
processes. Having a daemon you can use to connect will provide more
flexibility and isn't unreasonably complex.

Cheers,
Jes

Re: [Qemu-devel] [PATCH -V2 4/6] hw/9pfs: Implement syncfs

2011-03-02 Thread Stefan Hajnoczi

On Wed, Mar 2, 2011 at 5:05 AM, Aneesh Kumar K. V
 wrote:
> On Tue, 1 Mar 2011 20:27:19 +, Stefan Hajnoczi  wrote:
>> On Tue, Mar 1, 2011 at 6:02 PM, Aneesh Kumar K. V
>>  wrote:
>> > On Tue, 1 Mar 2011 15:59:19 +, Stefan Hajnoczi  
>> > wrote:
>> >> >> Please explain the semantics of P9_TSYNCFS.  Won't returning success
>> >> >> without doing anything lead to data integrity issues?
>> >> >
>> >> > I should actually do the 9P Operation format as commit message. Will
>> >> > add in the next update. Whether returning here would cause a data
>> >> > integrity issue, it depends what sort of guarantee we want to
>> >> > provide. So calling sync on the guest will cause all the dirty pages in
>> >> > the guest to be flushed to host. Now all those changes are in the host
>> >> > page cache and it would be nice to flush them  as a part of sync but
>> >> > then since we don't have a per file system sync, the above would imply
>> >> > we flush all dirty pages on the host which can result in large
>> >> > performance impact.
>> >>
>> >> You get the define the semantics of P9_TSYNCFS?  I thought this is
>> >> part of a well-defined protocol :).  If this is a .L extension then
>> >> it's probably a bad design and shouldn't be added to the protocol if
>> >> we can't implement it.
>> >
>> > It is a part of .L extension and we can definitely implement it. There
>> > is patch out there which is yet to be merged
>> >
>> > http://thread.gmane.org/gmane.linux.file-systems/44628
>>
>> A future Linux-only ioctl :/.
>>
>> >> Is this operation supposed to flush the disk write cache too?
>> >
>> > I am not sure we need to specify that as a part of 9p operation. I guess
>> > we can only say maximum possible data integrity. Whether a sync will
>> > cause disk write cache flush depends on the file system. For ext* that
>> > can be controlled by mount option barrier.
>>
>> So on a host with a safe configuration this operation should put data
>> on stable storage?
>>
>> >>
>> >> I think virtio-9p has a file descriptor cache.  Would it be possible
>> >> to fsync() those file descriptors?
>> >>
>> >
>> > Ideally we should. But that would involve a large number of fsync calls.
>>
>> Yep, that's why this is a weird operation to support, especially since
>> it's a .L add-on and not original 9P.  What's the use-case since
>> today's Linux userland cannot directly make use of this operation?  I
>> guess it has been added in order to pass-through a Linux internal vfs
>> super block sync function?
>
> IMHO it would be nice to have a syncfs 9p operation because that enables
> the client to say "if possible" flush the dirty data on the server. I
> guess we should consider this as something server can choose to
> ignore. In a cloud setup even doing a per file system sync can imply
> performance impact because VirtFS export may not 1:1 map to mount point
> on host. There is also plan to add a new option to the VirtFs export point
> which enable write to exported files to be either O_SYNC or
> O_DIRECT, similar to the way done for image files. That would imply
> Tsyncfs doesn't have much to do because we don't have dirty data on host
> pagecache anymore.
>
> So from 9p .L protocol point of view, it is a valid operation which
> enables client to request a flush of server cache if possible. And qemu
> 9p server choose to ignore because of the performance impact. If you are
> not comfortable with not doing anything specific in Tsyncfs
> operation, we can add sync(2) call as part of this 9p operation and
> later switch to  FS_IOC_SYNCFS when it become available.

The case we need to prevent is where applications running on virtfs
think they are getting guarantees that the implementation does not
provide.  Silently noping a sync operation is a step in that direction
so I agree that sync(2) would be safer.

I'm not sure I understand your 1:1 mount point mapping argument.  The
FS_IOC_SYNCFS ioctl does not help us there since it syncs the entire
filesystem, not the directory tree that virtfs is mapped to.  It will
do a bunch of extra I/O similar to how sync(2) does this across all
filesystems today.  This suggests again that P9_TSYNCFS is hard to
implement because FS_IOC_SYNCFS ends up not being useful.  Did I miss
something?

I'm looking for a use case where guests need a P9_TSYNCFS operation.
P9_TSYNCFS is not in linux-2.6 yet so I don't have any example code
that exploits it.  Can you point me to something that shows off why
this operation is necessary?  It must an optimization if 9P and NFS
make do without an equivalent?

Stefan

Re: [Qemu-devel] QEMU: Discussion of separating core functionality vs supportive features

On 02/28/11 18:44, Anthony Liguori wrote:
> On Feb 28, 2011 10:44 AM, "Jes Sorensen"  wrote:
>> > Separating host-side virtagent and other tasks from core QEMU
>> > =
>> >
>> > To improve auditing of the core QEMU code, it would be ideal to be
>> > able to separate the core QEMU functionality from utility
>> > functionality by  moving the utility functionality into its own
>> > process. This process will be referred to as the QEMU client below.
> Separating as in moving code outside of qemu.git, making code optionally
> built in, making code optionally built in or loadable as a separate
> executable that is automatically launched, or making code always built
> outside the main executable?
> 
> I'm very nervous about having a large number of daemons necessary to run
> QEMU.  I think a reasonable approach would be a single front-end daemond.
> 
> Once QAPI is merged, there is a very incremental approach we can take for
> this sort of work.  Take your favorite subsystem (like gdbstub or SDL) and
> make it only use QMP apis.  Once we're only using QMP internally in a
> subsystem, then building it out of the main QEMU and using libqmp should be
> fairly easy.

Hi Anthony,

Back from a day off playing with power drills and concrete walls :)

Sorry I should have made it a little more clear, it was obvious to me,
but not written down:

The idea is to keep everything as part of the QEMU package, ie. part of
qemu.git. My idea is really to have one QEMU host daemon and one QEMU
client, which provides the various services. You type make and you get
two binaries instead of one. We could allow other daemons to connect to
the host daemon, but that wouldn't be a primary target for this in my
book, and I am not sure we really want to do this.

It is absolutely vital for me that we do not make things much more
complicated for users with this move. I don't want to get into a
situation where we start forcing external packages or daemons in order
to run basic QEMU. If we start requiring such, we have failed! However,
I think it is a reasonable compromise to have one daemon you launch, and
then a simple client you launch straight after which will then provide
the same/similar views and interfaces that users are used to when they
launch current QEMU (monitor, vnc server etc).

I hope this clarifies things.

Cheers,
Jes

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

On 2011-03-01 13:58, Alon Levy wrote:
> On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
>> On 2011-02-27 20:03, Alon Levy wrote:
>>> On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
> When trying to start X (and it loads qxl driver) the kvm process just 
> crashes.
>>>
>>> This is fixed by Gerd's attached patch (taken from rhel repository, don't 
>>> know
>>> why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
>>> well (separate email).
>>
>> Patch looks OK on first glance, but the changelog is misleading: This
>> was broken for _both_ trees, but upstream didn't detect the bug.
> 
> So I didn't test with qemu not having this patch, but according to the 
> discussion in the
> launchpad bug the problem only happens with qemu-kvm. This doesn't rule out 
> it being a
> bug, perhaps it is just triggered much less frequently I guess.

Again: qemu-kvm has the instrumentation to detect the bug, qemu is
lacking this, but both trees will break subtly if cpu_current_env is not
properly restored.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v3 11/17] kvm: x86: Inject pending MCE events on state writeback

The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.

As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).

This will also allow to unify TCG and KVM injection code.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/kvm.c |   60 +
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a416554..939edc8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -467,6 +467,38 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t 
status,
 #endif /* !KVM_CAP_MCE*/
 }
 
+static int kvm_inject_mce_oldstyle(CPUState *env)
+{
+#ifdef KVM_CAP_MCE
+if (!kvm_has_vcpu_events() && env->exception_injected == EXCP12_MCHK) {
+unsigned int bank, bank_num = env->mcg_cap & 0xff;
+struct kvm_x86_mce mce;
+
+env->exception_injected = -1;
+
+/*
+ * There must be at least one bank in use if an MCE is pending.
+ * Find it and use its values for the event injection.
+ */
+for (bank = 0; bank < bank_num; bank++) {
+if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
+break;
+}
+}
+assert(bank < bank_num);
+
+mce.bank = bank;
+mce.status = env->mce_banks[bank * 4 + 1];
+mce.mcg_status = env->mcg_status;
+mce.addr = env->mce_banks[bank * 4 + 2];
+mce.misc = env->mce_banks[bank * 4 + 3];
+
+return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
+}
+#endif /* KVM_CAP_MCE */
+return 0;
+}
+
 static void cpu_update_state(void *opaque, int running, int reason)
 {
 CPUState *env = opaque;
@@ -1539,6 +1571,11 @@ int kvm_arch_put_registers(CPUState *env, int level)
 if (ret < 0) {
 return ret;
 }
+/* must be before kvm_put_msrs */
+ret = kvm_inject_mce_oldstyle(env);
+if (ret < 0) {
+return ret;
+}
 ret = kvm_put_msrs(env, level);
 if (ret < 0) {
 return ret;
@@ -1677,6 +1714,29 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run 
*run)
 
 int kvm_arch_process_async_events(CPUState *env)
 {
+if (env->interrupt_request & CPU_INTERRUPT_MCE) {
+/* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
+assert(env->mcg_cap);
+
+env->interrupt_request &= ~CPU_INTERRUPT_MCE;
+
+kvm_cpu_synchronize_state(env);
+
+if (env->exception_injected == EXCP08_DBLE) {
+/* this means triple fault */
+qemu_system_reset_request();
+env->exit_request = 1;
+return 0;
+}
+env->exception_injected = EXCP12_MCHK;
+env->has_error_code = 0;
+
+env->halted = 0;
+if (kvm_irqchip_in_kernel() && env->mp_state == KVM_MP_STATE_HALTED) {
+env->mp_state = KVM_MP_STATE_RUNNABLE;
+}
+}
+
 if (kvm_irqchip_in_kernel()) {
 return 0;
 }
-- 
1.7.1

[Qemu-devel] [PATCH v3 02/17] kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking

Original fix by David Gibson.

CC: David Gibson 
Signed-off-by: Jan Kiszka 
---
 kvm-all.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index e6a7de4..7753c8a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -998,7 +998,9 @@ int kvm_cpu_exec(CPUState *env)
 }
 ret = EXCP_INTERRUPT;
 
+#ifdef KVM_CAP_SET_GUEST_DEBUG
 out:
+#endif
 env->exit_request = 0;
 cpu_single_env = NULL;
 return ret;
-- 
1.7.1

[Qemu-devel] [PATCH v3 13/17] kvm: x86: Consolidate TCG and KVM MCE injection code

This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
in the VCPUState also for KVM.

We have to drop the MSRs writeback restrictions for this purpose which
is now safe as every uncoordinated MSR injection is removed with this
patch.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/helper.c  |   34 +++-
 target-i386/kvm.c |  238 +---
 target-i386/kvm_x86.h |   25 -
 3 files changed, 37 insertions(+), 260 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

diff --git a/target-i386/helper.c b/target-i386/helper.c
index a32960c..a08309f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -27,7 +27,6 @@
 #include "exec-all.h"
 #include "qemu-common.h"
 #include "kvm.h"
-#include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
 #include "monitor.h"
@@ -1167,7 +1166,6 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int 
bank,
 };
 unsigned bank_num = cenv->mcg_cap & 0xff;
 CPUState *env;
-int flag = 0;
 
 if (!cenv->mcg_cap) {
 monitor_printf(mon, "MCE injection not supported\n");
@@ -1187,27 +1185,19 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 return;
 }
 
-if (kvm_enabled()) {
-if (flags & MCE_INJECT_BROADCAST) {
-flag |= MCE_BROADCAST;
-}
-
-kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
-} else {
-run_on_cpu(cenv, do_inject_x86_mce, ¶ms);
-if (flags & MCE_INJECT_BROADCAST) {
-params.bank = 1;
-params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
-params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
-params.addr = 0;
-params.misc = 0;
-for (env = first_cpu; env != NULL; env = env->next_cpu) {
-if (cenv == env) {
-continue;
-}
-params.env = env;
-run_on_cpu(cenv, do_inject_x86_mce, ¶ms);
+run_on_cpu(cenv, do_inject_x86_mce, ¶ms);
+if (flags & MCE_INJECT_BROADCAST) {
+params.bank = 1;
+params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
+params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
+params.addr = 0;
+params.misc = 0;
+for (env = first_cpu; env != NULL; env = env->next_cpu) {
+if (cenv == env) {
+continue;
 }
+params.env = env;
+run_on_cpu(cenv, do_inject_x86_mce, ¶ms);
 }
 }
 }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 939edc8..be896dd 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -28,7 +28,6 @@
 #include "hw/pc.h"
 #include "hw/apic.h"
 #include "ioport.h"
-#include "kvm_x86.h"
 
 #ifdef CONFIG_KVM_PARA
 #include 
@@ -193,164 +192,23 @@ static int kvm_setup_mce(CPUState *env, uint64_t 
*mcg_cap)
 return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
 }
 
-static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m)
+static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
-return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m);
-}
-
-static int kvm_get_msr(CPUState *env, struct kvm_msr_entry *msrs, int n)
-{
-struct kvm_msrs *kmsrs = qemu_malloc(sizeof *kmsrs + n * sizeof *msrs);
-int r;
-
-kmsrs->nmsrs = n;
-memcpy(kmsrs->entries, msrs, n * sizeof *msrs);
-r = kvm_vcpu_ioctl(env, KVM_GET_MSRS, kmsrs);
-memcpy(msrs, kmsrs->entries, n * sizeof *msrs);
-free(kmsrs);
-return r;
-}
-
-/* FIXME: kill this and kvm_get_msr, use env->mcg_status instead */
-static int kvm_mce_in_progress(CPUState *env)
-{
-struct kvm_msr_entry msr_mcg_status = {
-.index = MSR_MCG_STATUS,
-};
-int r;
-
-r = kvm_get_msr(env, &msr_mcg_status, 1);
-if (r == -1 || r == 0) {
-fprintf(stderr, "Failed to get MCE status\n");
-return 0;
-}
-return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
-}
-
-struct kvm_x86_mce_data
-{
-CPUState *env;
-struct kvm_x86_mce *mce;
-int abort_on_error;
-};
-
-static void kvm_do_inject_x86_mce(void *_data)
-{
-struct kvm_x86_mce_data *data = _data;
-int r;
-
-/* If there is an MCE exception being processed, ignore this SRAO MCE */
-if ((data->env->mcg_cap & MCG_SER_P) &&
-!(data->mce->status & MCI_STATUS_AR)) {
-if (kvm_mce_in_progress(data->env)) {
-return;
-}
-}
-
-r = kvm_set_mce(data->env, data->mce);
-if (r < 0) {
-perror("kvm_set_mce FAILED");
-if (data->abort_on_error) {
-abort();
-}
-}
-}
-
-static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
-  int flag)
-{
-struct kvm_x86_mce_data data = {
-.env = env,
-.mce = mce,
-.ab

[Qemu-devel] REPOST: [PATCH v3] tracetool: Add optional argument to specify dtrace probe names

2011-03-02 Thread Jes . Sorensen

From: Jes Sorensen 

Optional feature allowing a user to generate the probe list to match
the name of the binary, in case they wish to install qemu under a
different name than qemu-{system,user},

Signed-off-by: Jes Sorensen 
---
 scripts/tracetool |   19 +--
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index e046683..412f695 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -30,9 +30,11 @@ Output formats:
   --stap Generate .stp file (DTrace with SystemTAP only)
 
 Options:
-  --binary  [path]  Full path to QEMU binary
-  --target-arch [arch]  QEMU emulator target arch
-  --target-type [type]  QEMU emulator target type ('system' or 'user')
+  --binary   [path]Full path to QEMU binary
+  --target-arch  [arch]QEMU emulator target arch
+  --target-type  [type]QEMU emulator target type ('system' or 'user')
+  --probe-prefix [prefix]  Prefix for dtrace probe names
+   (default: qemu-\$targettype-\$targetarch)
 
 EOF
 exit 1
@@ -472,7 +474,7 @@ linetostap_dtrace()
 
 # Define prototype for probe arguments
 cat <

[Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition)

This is mostly a rebase of the previous round, just including one
additional ppc build fix (patch 2) for a regression in a previous part.
Please merge.

CC: David Gibson 
CC: Hidetoshi Seto 
CC: Huang Ying 
CC: Jin Dongming 

Huang Ying (2):
  Add qemu_ram_remap
  KVM, MCE, unpoison memory address across reboot

Jan Kiszka (15):
  kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events
  kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking
  x86: Account for MCE in cpu_has_work
  x86: Perform implicit mcg_status reset
  x86: Small cleanups of MCE helpers
  x86: Refine error reporting of MCE injection services
  x86: Optionally avoid injecting AO MCEs while others are pending
  Synchronize VCPU states before reset
  kvm: x86: Move MCE functions together
  kvm: Rename kvm_arch_process_irqchip_events to async_events
  kvm: x86: Inject pending MCE events on state writeback
  x86: Run qemu_inject_x86_mce on target VCPU
  kvm: x86: Consolidate TCG and KVM MCE injection code
  kvm: x86: Clean up kvm_setup_mce
  kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails

 cpu-all.h |8 +-
 cpu-common.h  |1 +
 exec.c|   63 +++-
 kvm-all.c |4 +-
 kvm.h |2 +-
 monitor.c |   11 +-
 qemu-common.h |6 +-
 target-i386/cpu.h |   11 +-
 target-i386/exec.h|   15 +-
 target-i386/helper.c  |  185 +---
 target-i386/kvm.c |  463 -
 target-i386/kvm_x86.h |   25 ---
 target-ppc/kvm.c  |6 +-
 target-s390x/kvm.c|2 +-
 vl.c  |1 +
 15 files changed, 403 insertions(+), 400 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

[Qemu-devel] [PATCH v3 09/17] kvm: x86: Move MCE functions together

Pure function suffling to avoid multiple #ifdef KVM_CAP_MCE sections,
no functional changes. While at it, annotate some #ifdef sections.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/kvm.c |  346 ++---
 1 files changed, 171 insertions(+), 175 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0aa0a41..f909661 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -172,7 +172,7 @@ static int get_para_features(CPUState *env)
 #endif
 return features;
 }
-#endif
+#endif /* CONFIG_KVM_PARA */
 
 #ifdef KVM_CAP_MCE
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
@@ -273,8 +273,174 @@ static void kvm_inject_x86_mce_on(CPUState *env, struct 
kvm_x86_mce *mce,
 run_on_cpu(env, kvm_do_inject_x86_mce, &data);
 }
 
-static void kvm_mce_broadcast_rest(CPUState *env);
-#endif
+static void kvm_mce_broadcast_rest(CPUState *env)
+{
+struct kvm_x86_mce mce = {
+.bank = 1,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = 0,
+.misc = 0,
+};
+CPUState *cenv;
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+if (cpu_x86_support_mca_broadcast(env)) {
+for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
+if (cenv == env) {
+continue;
+}
+kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
+}
+}
+}
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
+{
+struct kvm_x86_mce mce = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | MCI_STATUS_AR | 0x134,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS << 6) | 0xc,
+};
+int r;
+
+r = kvm_set_mce(env, &mce);
+if (r < 0) {
+fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+abort();
+}
+kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
+{
+struct kvm_x86_mce mce = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | 0xc0,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS << 6) | 0xc,
+};
+int r;
+
+r = kvm_set_mce(env, &mce);
+if (r < 0) {
+fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+abort();
+}
+kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
+{
+struct kvm_x86_mce mce = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | 0xc0,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS << 6) | 0xc,
+};
+
+kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
+kvm_mce_broadcast_rest(env);
+}
+#endif /* KVM_CAP_MCE */
+
+static void hardware_memory_error(void)
+{
+fprintf(stderr, "Hardware memory error!\n");
+exit(1);
+}
+
+int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
+{
+#ifdef KVM_CAP_MCE
+void *vaddr;
+ram_addr_t ram_addr;
+target_phys_addr_t paddr;
+
+if ((env->mcg_cap & MCG_SER_P) && addr
+&& (code == BUS_MCEERR_AR
+|| code == BUS_MCEERR_AO)) {
+vaddr = (void *)addr;
+if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+!kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, 
&paddr)) {
+fprintf(stderr, "Hardware memory error for memory used by "
+"QEMU itself instead of guest system!\n");
+/* Hope we are lucky for AO MCE */
+if (code == BUS_MCEERR_AO) {
+return 0;
+} else {
+hardware_memory_error();
+}
+}
+
+if (code == BUS_MCEERR_AR) {
+/* Fake an Intel architectural Data Load SRAR UCR */
+kvm_mce_inj_srar_dataload(env, paddr);
+} else {
+/*
+ * If there is an MCE excpetion being processed, ignore
+ * this SRAO MCE
+ */
+if (!kvm_mce_in_progress(env)) {
+/* Fake an Intel architectural Memory scrubbing UCR */
+kvm_mce_inj_srao_memscrub(env, paddr);
+}
+}
+} else
+#endif /* KVM_CAP_MCE */
+{
+if (code == BUS_MCEERR_AO) {
+return 0;
+} else if (code == BUS_MCEERR_AR) {
+hardware_memo

[Qemu-devel] [PATCH v3 06/17] x86: Refine error reporting of MCE injection services

As this service is used by the human monitor, make sure that errors get
reported to the right channel, and also raise the verbosity.

This requires to move Monitor typedef in qemu-common.h to resolve the
include dependency.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 monitor.c|4 +-
 qemu-common.h|6 ++--
 target-i386/cpu.h|6 ++--
 target-i386/helper.c |   79 +-
 4 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/monitor.c b/monitor.c
index 45b0cc2..662df7c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2712,8 +2712,8 @@ static void do_inject_mce(Monitor *mon, const QDict 
*qdict)
 int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
 
 for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
+if (cenv->cpu_index == cpu_index) {
+cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
broadcast);
 break;
 }
diff --git a/qemu-common.h b/qemu-common.h
index 40dad52..3ce6b6c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -18,6 +18,9 @@ typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 
+struct Monitor;
+typedef struct Monitor Monitor;
+
 /* we put basic includes here to avoid repeating them in device drivers */
 #include 
 #include 
@@ -326,9 +329,6 @@ void qemu_iovec_memset(QEMUIOVector *qiov, int c, size_t 
count);
 void qemu_iovec_memset_skip(QEMUIOVector *qiov, int c, size_t count,
 size_t skip);
 
-struct Monitor;
-typedef struct Monitor Monitor;
-
 /* Convert a byte between binary and BCD.  */
 static inline uint8_t to_bcd(uint8_t val)
 {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 52bb48e..486af1d 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,8 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, 
target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
-void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
-uint64_t mcg_status, uint64_t addr, uint64_t misc,
-int broadcast);
+void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+uint64_t status, uint64_t mcg_status, uint64_t addr,
+uint64_t misc, int broadcast);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index ba3bed9..462d332 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -30,6 +30,7 @@
 #include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
+#include "monitor.h"
 #endif
 
 //#define DEBUG_MMU
@@ -1067,33 +1068,38 @@ static void breakpoint_handler(CPUState *env)
 }
 
 static void
-qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
 uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
 uint64_t mcg_cap = cenv->mcg_cap;
-uint64_t *banks = cenv->mce_banks;
-
-/*
- * if MSR_MCG_CTL is not all 1s, the uncorrected error
- * reporting is disabled
- */
-if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-cenv->mcg_ctl != ~(uint64_t)0) {
-return;
-}
-banks += 4 * bank;
-/*
- * if MSR_MCi_CTL is not all 1s, the uncorrected error
- * reporting is disabled for the bank
- */
-if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
-return;
-}
+uint64_t *banks = cenv->mce_banks + 4 * bank;
+
 if (status & MCI_STATUS_UC) {
+/*
+ * if MSR_MCG_CTL is not all 1s, the uncorrected error
+ * reporting is disabled
+ */
+if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+monitor_printf(mon,
+   "CPU %d: Uncorrected error reporting disabled\n",
+   cenv->cpu_index);
+return;
+}
+
+/*
+ * if MSR_MCi_CTL is not all 1s, the uncorrected error
+ * reporting is disabled for the bank
+ */
+if (banks[0] != ~(uint64_t)0) {
+monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
+   "for bank %d\n", cenv->cpu_index, bank);
+return;
+}
+
 if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
 !(cenv->cr[4] & CR4_MCE_MASK)) {
-fprintf(stderr, "injects mce exception while previous "
-"one is in progress!\n");
+monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
+"raising triple fault\n", cenv->cpu_index);
 qemu_log_mask(C

[Qemu-devel] [PATCH v3 07/17] x86: Optionally avoid injecting AO MCEs while others are pending

Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 monitor.c|7 +--
 target-i386/cpu.h|5 -
 target-i386/helper.c |   26 +++---
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/monitor.c b/monitor.c
index 662df7c..ae20927 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2709,12 +2709,15 @@ static void do_inject_mce(Monitor *mon, const QDict 
*qdict)
 uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
 uint64_t addr = qdict_get_int(qdict, "addr");
 uint64_t misc = qdict_get_int(qdict, "misc");
-int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
+int flags = MCE_INJECT_UNCOND_AO;
 
+if (qdict_get_try_bool(qdict, "broadcast", 0)) {
+flags |= MCE_INJECT_BROADCAST;
+}
 for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
 if (cenv->cpu_index == cpu_index) {
 cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
-   broadcast);
+   flags);
 break;
 }
 }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 486af1d..d0eae75 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,11 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, 
target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
+#define MCE_INJECT_BROADCAST1
+#define MCE_INJECT_UNCOND_AO2
+
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
 uint64_t status, uint64_t mcg_status, uint64_t addr,
-uint64_t misc, int broadcast);
+uint64_t misc, int flags);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 462d332..e3ef40c 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1069,11 +1069,20 @@ static void breakpoint_handler(CPUState *env)
 
 static void
 qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-uint64_t mcg_status, uint64_t addr, uint64_t misc)
+uint64_t mcg_status, uint64_t addr, uint64_t misc,
+int flags)
 {
 uint64_t mcg_cap = cenv->mcg_cap;
 uint64_t *banks = cenv->mce_banks + 4 * bank;
 
+/*
+ * If there is an MCE exception being processed, ignore this SRAO MCE
+ * unless unconditional injection was requested.
+ */
+if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+&& (cenv->mcg_status & MCG_STATUS_MCIP)) {
+return;
+}
 if (status & MCI_STATUS_UC) {
 /*
  * if MSR_MCG_CTL is not all 1s, the uncorrected error
@@ -1127,7 +1136,7 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int 
bank, uint64_t status,
 
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
 uint64_t status, uint64_t mcg_status, uint64_t addr,
-uint64_t misc, int broadcast)
+uint64_t misc, int flags)
 {
 unsigned bank_num = cenv->mcg_cap & 0xff;
 CPUState *env;
@@ -1145,27 +1154,30 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 monitor_printf(mon, "Invalid MCE status code\n");
 return;
 }
-if (broadcast && !cpu_x86_support_mca_broadcast(cenv)) {
+if ((flags & MCE_INJECT_BROADCAST)
+&& !cpu_x86_support_mca_broadcast(cenv)) {
 monitor_printf(mon, "Guest CPU does not support MCA broadcast\n");
 return;
 }
 
 if (kvm_enabled()) {
-if (broadcast) {
+if (flags & MCE_INJECT_BROADCAST) {
 flag |= MCE_BROADCAST;
 }
 
 kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
 } else {
-qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc);
-if (broadcast) {
+qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc,
+flags);
+if (flags & MCE_INJECT_BROADCAST) {
 for (env = first_cpu; env != NULL; env = env->next_cpu) {
 if (cenv == env) {
 continue;
 }
 qemu_inject_x86_mce(mon, env, 1,
 MCI_STATUS_VAL | MCI_STATUS_UC,
-MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
+MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
+flags);
 }
 }
 }
-- 
1.7.1

[Qemu-devel] [PATCH v3 05/17] x86: Small cleanups of MCE helpers

Fix some code style issues, use proper headers, and align to cpu_x86
naming scheme. No functional changes.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 cpu-all.h|4 
 monitor.c|2 +-
 target-i386/cpu.h|5 +
 target-i386/helper.c |   41 -
 4 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 87b0f86..caf5e6c 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -971,8 +971,4 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
 uint8_t *buf, int len, int is_write);
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-uint64_t mcg_status, uint64_t addr, uint64_t misc,
-int broadcast);
-
 #endif /* CPU_ALL_H */
diff --git a/monitor.c b/monitor.c
index 22ae3bb..45b0cc2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2713,7 +2713,7 @@ static void do_inject_mce(Monitor *mon, const QDict 
*qdict)
 
 for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
 if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
+cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
broadcast);
 break;
 }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 75156e7..52bb48e 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -986,4 +986,9 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, 
target_ulong *pc,
 
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
+
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc,
+int broadcast);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f41416f..ba3bed9 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -28,6 +28,9 @@
 #include "qemu-common.h"
 #include "kvm.h"
 #include "kvm_x86.h"
+#ifndef CONFIG_USER_ONLY
+#include "sysemu.h"
+#endif
 
 //#define DEBUG_MMU
 
@@ -1063,11 +1066,9 @@ static void breakpoint_handler(CPUState *env)
 prev_debug_excp_handler(env);
 }
 
-/* This should come from sysemu.h - if we could include it here... */
-void qemu_system_reset_request(void);
-
-static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-uint64_t mcg_status, uint64_t addr, uint64_t misc)
+static void
+qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
 uint64_t mcg_cap = cenv->mcg_cap;
 uint64_t *banks = cenv->mce_banks;
@@ -1077,15 +1078,17 @@ static void qemu_inject_x86_mce(CPUState *cenv, int 
bank, uint64_t status,
  * reporting is disabled
  */
 if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-cenv->mcg_ctl != ~(uint64_t)0)
+cenv->mcg_ctl != ~(uint64_t)0) {
 return;
+}
 banks += 4 * bank;
 /*
  * if MSR_MCi_CTL is not all 1s, the uncorrected error
  * reporting is disabled for the bank
  */
-if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0)
+if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
 return;
+}
 if (status & MCI_STATUS_UC) {
 if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
 !(cenv->cr[4] & CR4_MCE_MASK)) {
@@ -1095,8 +1098,9 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, 
uint64_t status,
 qemu_system_reset_request();
 return;
 }
-if (banks[1] & MCI_STATUS_VAL)
+if (banks[1] & MCI_STATUS_VAL) {
 status |= MCI_STATUS_OVER;
+}
 banks[2] = addr;
 banks[3] = misc;
 cenv->mcg_status = mcg_status;
@@ -1104,16 +1108,18 @@ static void qemu_inject_x86_mce(CPUState *cenv, int 
bank, uint64_t status,
 cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
 } else if (!(banks[1] & MCI_STATUS_VAL)
|| !(banks[1] & MCI_STATUS_UC)) {
-if (banks[1] & MCI_STATUS_VAL)
+if (banks[1] & MCI_STATUS_VAL) {
 status |= MCI_STATUS_OVER;
+}
 banks[2] = addr;
 banks[3] = misc;
 banks[1] = status;
-} else
+} else {
 banks[1] |= MCI_STATUS_OVER;
+}
 }
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
 uint64_t mcg_status, uint64_t addr, uint64_t misc,
 int broadcast)
 {
@@ -1155,15 +1161,16 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, 
uint64_t status,
 
 static void mce_init(CPUX86State *cenv)
 {
-unsigned int bank, bank_num;
+unsigned int

[Qemu-devel] [PATCH v3 17/17] KVM, MCE, unpoison memory address across reboot

From: Huang Ying 

In Linux kernel HWPoison processing implementation, the virtual
address in processes mapping the error physical memory page is marked
as HWPoison.  So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.

If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS.  If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system.  But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated.  That is, guest system can not recover via rebooting.

In fact, across rebooting, the contents of guest physical memory page
need not to be kept.  We can allocate a new host physical page to
back the corresponding guest physical address.

This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
to clear the corresponding page table entry, so that make it possible
to allocate a new page to recover the issue.

[ Jan: rebasing and tiny cleanups]

Signed-off-by: Huang Ying 
Signed-off-by: Jan Kiszka 
---
 target-i386/kvm.c |   36 
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 44e5504..7b7105d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -173,7 +173,40 @@ static int get_para_features(CPUState *env)
 }
 #endif /* CONFIG_KVM_PARA */
 
+typedef struct HWPoisonPage {
+ram_addr_t ram_addr;
+QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+HWPoisonPage *page, *next_page;
+
+QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+QLIST_REMOVE(page, list);
+qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+qemu_free(page);
+}
+}
+
 #ifdef KVM_CAP_MCE
+static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+HWPoisonPage *page;
+
+QLIST_FOREACH(page, &hwpoison_page_list, list) {
+if (page->ram_addr == ram_addr) {
+return;
+}
+}
+page = qemu_malloc(sizeof(HWPoisonPage));
+page->ram_addr = ram_addr;
+QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
  int *max_banks)
 {
@@ -233,6 +266,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void 
*addr)
 hardware_memory_error();
 }
 }
+kvm_hwpoison_page_add(ram_addr);
 kvm_mce_inject(env, paddr, code);
 } else
 #endif /* KVM_CAP_MCE */
@@ -263,6 +297,7 @@ int kvm_arch_on_sigbus(int code, void *addr)
 "QEMU itself instead of guest system!: %p\n", addr);
 return 0;
 }
+kvm_hwpoison_page_add(ram_addr);
 kvm_mce_inject(first_cpu, paddr, code);
 } else
 #endif /* KVM_CAP_MCE */
@@ -571,6 +606,7 @@ int kvm_arch_init(KVMState *s)
 fprintf(stderr, "e820_add_entry() table is full\n");
 return ret;
 }
+qemu_register_reset(kvm_unpoison_all, NULL);
 
 return 0;
 }
-- 
1.7.1

[Qemu-devel] [PATCH v3 16/17] Add qemu_ram_remap

From: Huang Ying 

qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again.  This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.

[ Jan: style fixlets, WIN32 fix ]

Signed-off-by: Huang Ying 
Signed-off-by: Jan Kiszka 
---
 cpu-all.h|4 +++
 cpu-common.h |1 +
 exec.c   |   63 +-
 3 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index caf5e6c..4f4631d 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -863,10 +863,14 @@ target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, 
target_ulong addr);
 extern int phys_ram_fd;
 extern ram_addr_t ram_size;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC_MASK   (1 << 0)
+
 typedef struct RAMBlock {
 uint8_t *host;
 ram_addr_t offset;
 ram_addr_t length;
+uint32_t flags;
 char idstr[256];
 QLIST_ENTRY(RAMBlock) next;
 #if defined(__linux__) && !defined(TARGET_S390X)
diff --git a/cpu-common.h b/cpu-common.h
index 54d21d4..ef4e8da 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -50,6 +50,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const 
char *name,
 ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
 void qemu_ram_free(ram_addr_t addr);
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
 /* Same but slower, to use for migration, where the order of
diff --git a/exec.c b/exec.c
index d611100..9308a97 100644
--- a/exec.c
+++ b/exec.c
@@ -2867,6 +2867,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, 
const char *name,
 
 if (host) {
 new_block->host = host;
+new_block->flags |= RAM_PREALLOC_MASK;
 } else {
 if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
@@ -2920,7 +2921,9 @@ void qemu_ram_free(ram_addr_t addr)
 QLIST_FOREACH(block, &ram_list.blocks, next) {
 if (addr == block->offset) {
 QLIST_REMOVE(block, next);
-if (mem_path) {
+if (block->flags & RAM_PREALLOC_MASK) {
+;
+} else if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
 if (block->fd) {
 munmap(block->host, block->length);
@@ -2943,6 +2946,64 @@ void qemu_ram_free(ram_addr_t addr)
 
 }
 
+#ifndef _WIN32
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
+{
+RAMBlock *block;
+ram_addr_t offset;
+int flags;
+void *area, *vaddr;
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+offset = addr - block->offset;
+if (offset < block->length) {
+vaddr = block->host + offset;
+if (block->flags & RAM_PREALLOC_MASK) {
+;
+} else {
+flags = MAP_FIXED;
+munmap(vaddr, length);
+if (mem_path) {
+#if defined(__linux__) && !defined(TARGET_S390X)
+if (block->fd) {
+#ifdef MAP_POPULATE
+flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
+MAP_PRIVATE;
+#else
+flags |= MAP_PRIVATE;
+#endif
+area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+flags, block->fd, offset);
+} else {
+flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+flags, -1, 0);
+}
+#endif
+} else {
+#if defined(TARGET_S390X) && defined(CONFIG_KVM)
+flags |= MAP_SHARED | MAP_ANONYMOUS;
+area = mmap(vaddr, length, PROT_EXEC|PROT_READ|PROT_WRITE,
+flags, -1, 0);
+#else
+flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+flags, -1, 0);
+#endif
+}
+if (area != vaddr) {
+fprintf(stderr, "Could not remap addr: %lx@%lx\n",
+length, addr);
+exit(1);
+}
+qemu_madvise(vaddr, length, QEMU_MADV_MERGEABLE);
+}
+return;
+}
+}
+}
+#endif /* !_WIN32 */
+
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
With the exception of the softmmu code in this file, this should
only be used for local memory (e.g. video ram) that the device owns,
-- 
1.7.1

[Qemu-devel] [PATCH v3 04/17] x86: Perform implicit mcg_status reset

Reorder mcg_status in CPUState to achieve automatic clearing on reset.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/cpu.h|3 ++-
 target-i386/helper.c |2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5f1df8b..75156e7 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -687,6 +687,8 @@ typedef struct CPUX86State {
 
 uint64_t pat;
 
+uint64_t mcg_status;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
@@ -741,7 +743,6 @@ typedef struct CPUX86State {
 struct DeviceState *apic_state;
 
 uint64_t mcg_cap;
-uint64_t mcg_status;
 uint64_t mcg_ctl;
 uint64_t mce_banks[MCE_BANKS_DEF*4];
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f0c546d..f41416f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -101,8 +101,6 @@ void cpu_reset(CPUX86State *env)
 env->dr[7] = DR7_FIXED_1;
 cpu_breakpoint_remove_all(env, BP_CPU);
 cpu_watchpoint_remove_all(env, BP_CPU);
-
-env->mcg_status = 0;
 }
 
 void cpu_x86_close(CPUX86State *env)
-- 
1.7.1

[Qemu-devel] [PATCH v3 14/17] kvm: x86: Clean up kvm_setup_mce

There is nothing to abstract here. Fold kvm_setup_mce into its caller
and fix up the error reporting (return code of kvm_vcpu_ioctl holds the
error value).

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/kvm.c |   11 ---
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index be896dd..486efb9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -187,11 +187,6 @@ static int kvm_get_mce_cap_supported(KVMState *s, uint64_t 
*mce_cap,
 return -ENOSYS;
 }
 
-static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
-{
-return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
-}
-
 static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
@@ -440,6 +435,7 @@ int kvm_arch_init_vcpu(CPUState *env)
 && kvm_check_extension(env->kvm_state, KVM_CAP_MCE) > 0) {
 uint64_t mcg_cap;
 int banks;
+int ret;
 
 if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
 perror("kvm_get_mce_cap_supported FAILED");
@@ -448,8 +444,9 @@ int kvm_arch_init_vcpu(CPUState *env)
 banks = MCE_BANKS_DEF;
 mcg_cap &= MCE_CAP_DEF;
 mcg_cap |= banks;
-if (kvm_setup_mce(env, &mcg_cap)) {
-perror("kvm_setup_mce FAILED");
+ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
+if (ret < 0) {
+fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
 } else {
 env->mcg_cap = mcg_cap;
 }
-- 
1.7.1

[Qemu-devel] [PATCH v3 03/17] x86: Account for MCE in cpu_has_work

MCEs can be injected asynchronously, so they can also terminate the halt
state.

Signed-off-by: Jan Kiszka 
CC: Huang Ying 
CC: Hidetoshi Seto 
CC: Jin Dongming 
---
 target-i386/exec.h |   15 ++-
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target-i386/exec.h b/target-i386/exec.h
index fc8945b..d050dd0 100644
--- a/target-i386/exec.h
+++ b/target-i386/exec.h
@@ -293,15 +293,12 @@ static inline void load_eflags(int eflags, int 
update_mask)
 
 static inline int cpu_has_work(CPUState *env)
 {
-int work;
-
-work = (env->interrupt_request & CPU_INTERRUPT_HARD) &&
-   (env->eflags & IF_MASK);
-work |= env->interrupt_request & CPU_INTERRUPT_NMI;
-work |= env->interrupt_request & CPU_INTERRUPT_INIT;
-work |= env->interrupt_request & CPU_INTERRUPT_SIPI;
-
-return work;
+return ((env->interrupt_request & CPU_INTERRUPT_HARD) &&
+(env->eflags & IF_MASK)) ||
+   (env->interrupt_request & (CPU_INTERRUPT_NMI |
+  CPU_INTERRUPT_INIT |
+  CPU_INTERRUPT_SIPI |
+  CPU_INTERRUPT_MCE));
 }
 
 static inline int cpu_halted(CPUState *env) {
-- 
1.7.1

[Qemu-devel] [PATCH v3 12/17] x86: Run qemu_inject_x86_mce on target VCPU

We will use the current TCG-only MCE injection path for KVM as well, and
then this read-modify-write of the target VCPU state has to be performed
synchronously in the corresponding thread.

Signed-off-by: Jan Kiszka 
---
 target-i386/helper.c |   87 +
 1 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index e3ef40c..a32960c 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1067,29 +1067,42 @@ static void breakpoint_handler(CPUState *env)
 prev_debug_excp_handler(env);
 }
 
-static void
-qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-uint64_t mcg_status, uint64_t addr, uint64_t misc,
-int flags)
+typedef struct MCEInjectionParams {
+Monitor *mon;
+CPUState *env;
+int bank;
+uint64_t status;
+uint64_t mcg_status;
+uint64_t addr;
+uint64_t misc;
+int flags;
+} MCEInjectionParams;
+
+static void do_inject_x86_mce(void *data)
 {
-uint64_t mcg_cap = cenv->mcg_cap;
-uint64_t *banks = cenv->mce_banks + 4 * bank;
+MCEInjectionParams *params = data;
+CPUState *cenv = params->env;
+uint64_t *banks = cenv->mce_banks + 4 * params->bank;
+
+cpu_synchronize_state(cenv);
 
 /*
  * If there is an MCE exception being processed, ignore this SRAO MCE
  * unless unconditional injection was requested.
  */
-if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+if (!(params->flags & MCE_INJECT_UNCOND_AO)
+&& !(params->status & MCI_STATUS_AR)
 && (cenv->mcg_status & MCG_STATUS_MCIP)) {
 return;
 }
-if (status & MCI_STATUS_UC) {
+
+if (params->status & MCI_STATUS_UC) {
 /*
  * if MSR_MCG_CTL is not all 1s, the uncorrected error
  * reporting is disabled
  */
-if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
-monitor_printf(mon,
+if ((cenv->mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+monitor_printf(params->mon,
"CPU %d: Uncorrected error reporting disabled\n",
cenv->cpu_index);
 return;
@@ -1100,35 +1113,39 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int 
bank, uint64_t status,
  * reporting is disabled for the bank
  */
 if (banks[0] != ~(uint64_t)0) {
-monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
-   "for bank %d\n", cenv->cpu_index, bank);
+monitor_printf(params->mon,
+   "CPU %d: Uncorrected error reporting disabled for"
+   " bank %d\n",
+   cenv->cpu_index, params->bank);
 return;
 }
 
 if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
 !(cenv->cr[4] & CR4_MCE_MASK)) {
-monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
-"raising triple fault\n", cenv->cpu_index);
+monitor_printf(params->mon,
+   "CPU %d: Previous MCE still in progress, raising"
+   " triple fault\n",
+   cenv->cpu_index);
 qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
 qemu_system_reset_request();
 return;
 }
 if (banks[1] & MCI_STATUS_VAL) {
-status |= MCI_STATUS_OVER;
+params->status |= MCI_STATUS_OVER;
 }
-banks[2] = addr;
-banks[3] = misc;
-cenv->mcg_status = mcg_status;
-banks[1] = status;
+banks[2] = params->addr;
+banks[3] = params->misc;
+cenv->mcg_status = params->mcg_status;
+banks[1] = params->status;
 cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
 } else if (!(banks[1] & MCI_STATUS_VAL)
|| !(banks[1] & MCI_STATUS_UC)) {
 if (banks[1] & MCI_STATUS_VAL) {
-status |= MCI_STATUS_OVER;
+params->status |= MCI_STATUS_OVER;
 }
-banks[2] = addr;
-banks[3] = misc;
-banks[1] = status;
+banks[2] = params->addr;
+banks[3] = params->misc;
+banks[1] = params->status;
 } else {
 banks[1] |= MCI_STATUS_OVER;
 }
@@ -1138,6 +1155,16 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 uint64_t status, uint64_t mcg_status, uint64_t addr,
 uint64_t misc, int flags)
 {
+MCEInjectionParams params = {
+.mon = mon,
+.env = cenv,
+.bank = bank,
+.status = status,
+.mcg_status = mcg_status,
+.addr = addr,
+.misc = misc,
+.flags = flags,
+};
 unsigned bank_num = cenv->mcg_cap & 0xff;
 CPUState *env;
 int flag = 0;
@@ -1167,17 +1

[Qemu-devel] [PATCH v3 10/17] kvm: Rename kvm_arch_process_irqchip_events to async_events