date:20110225

[Qemu-devel] [PATCH] net: Add the missing option declaration of vhostforce

2011-02-25 Thread Jason Wang

Signed-off-by: Jason Wang jasow...@redhat.com
---
 net.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net.c b/net.c
index 9ba5be2..21d4443 100644
--- a/net.c
+++ b/net.c
@@ -1025,7 +1025,11 @@ static const struct {
 .name = vhostfd,
 .type = QEMU_OPT_STRING,
 .help = file descriptor of an already opened vhost net 
device,
-},
+}, {
+.name = vhostforce,
+.type = QEMU_OPT_BOOL,
+.help = force vhost on for non-MSIX virtio guests,
+},
 #endif /* _WIN32 */
 { /* end of list */ }
 },

Re: [Qemu-devel] [PATCH] mst_fpga: correct irq level settings

2011-02-25 Thread andrzej zaborowski

On 16 February 2011 14:22, Dmitry Eremin-Solenikov dbarysh...@gmail.com wrote:
 Final corrections for IRQ levels that are set by mst_fpga:

 * Don't retranslate IRQ if previously IRQ was masked.
 * After setting or clearing IRQs through register, apply mask
  before setting parent IRQ level.

Thanks, applied this change.  However now to have a completely correct
behaviour, I think we need something like the following, what do you
think? (prev_level is now unused, but the main change is not masking
1u  irq)

diff --git a/hw/mst_fpga.c b/hw/mst_fpga.c
index 407bac9..f66de69 100644
--- a/hw/mst_fpga.c
+++ b/hw/mst_fpga.c
@@ -31,7 +31,6 @@ typedef struct mst_irq_state{

qemu_irq parent;

-   uint32_t prev_level;
uint32_t leddat1;
uint32_t leddat2;
uint32_t ledctrl;
@@ -53,11 +52,6 @@ mst_fpga_set_irq(void *opaque, int irq, int level)
uint32_t oldint = s-intsetclr  s-intmskena;

if (level)
-   s-prev_level |= 1u  irq;
-   else
-   s-prev_level = ~(1u  irq);
-
-   if ((s-intmskena  (1u  irq))  level)
s-intsetclr |= 1u  irq;

if (oldint != (s-intsetclr  s-intmskena))
@@ -193,12 +187,11 @@ static int mst_fpga_init(SysBusDevice *dev)

 static VMStateDescription vmstate_mst_fpga_regs = {
.name = mainstone_fpga,
-   .version_id = 0,
-   .minimum_version_id = 0,
-   .minimum_version_id_old = 0,
+   .version_id = 1,
+   .minimum_version_id = 1,
+   .minimum_version_id_old = 1,
.post_load = mst_fpga_post_load,
.fields = (VMStateField []) {
-   VMSTATE_UINT32(prev_level, mst_irq_state),
VMSTATE_UINT32(leddat1, mst_irq_state),
VMSTATE_UINT32(leddat2, mst_irq_state),
VMSTATE_UINT32(ledctrl, mst_irq_state),

Cheers

[Qemu-devel] Re: [PATCH] net: Add the missing option declaration of vhostforce

2011-02-25 Thread Michael S. Tsirkin

On Fri, Feb 25, 2011 at 04:11:27PM +0800, Jason Wang wrote:
 Signed-off-by: Jason Wang jasow...@redhat.com


Acked-by: Michael S. Tsirkin m...@redhat.com

 ---
  net.c |6 +-
  1 files changed, 5 insertions(+), 1 deletions(-)
 
 diff --git a/net.c b/net.c
 index 9ba5be2..21d4443 100644
 --- a/net.c
 +++ b/net.c
 @@ -1025,7 +1025,11 @@ static const struct {
  .name = vhostfd,
  .type = QEMU_OPT_STRING,
  .help = file descriptor of an already opened vhost net 
 device,
 -},
 +}, {
 +.name = vhostforce,
 +.type = QEMU_OPT_BOOL,
 +.help = force vhost on for non-MSIX virtio guests,
 +},
  #endif /* _WIN32 */
  { /* end of list */ }
  },

Re: [Qemu-devel] [PATCH] Remove a detached device from qemu_device_opts.

2011-02-25 Thread Markus Armbruster

Minoru Usui u...@mxm.nes.nec.co.jp writes:

 Hi, William, Markus and other people.

 On Wed, 23 Feb 2011 10:42:02 +0100
 William Dauchy wdau...@gmail.com wrote:

 Hi Minoru,
 
 On Tue, Feb 15, 2011 at 3:32 AM, Minoru Usui u...@mxm.nes.nec.co.jp wrote:
  I can reproduce, too.
  But strangely, it don't occur in case of loading acpiphp driver
  to the guest VM on below environment.
 
   Host : RHEL6.0
   Guest: RHEL5.5
 
  Unfortunately, I'm not familiar with qemu-kvm.
  I investigated below questions about this problem, but I couldn't resolve 
  them.
 
   - How to call qdev_free() asynchronously. (How should we fix this problem)
   - Why it don't occur with acpiphp driver
 
  If anyone knows answer of above questions or its clue, please let me know.
 
 If fact this is not a bug.
 `qdev_free` is called when the acpi detach succeed in `pciej_write`.
 The virtual machine has to correctly support acpi signals.
 Please read the explanation from Markus Armbruster on
 http://lists.nongnu.org/archive/html/qemu-devel/2011-02/msg02637.html

 William, Thank you for your help and telling me about it.

 Markus, Thank you for your detailed explanation.
 Basically, I understand behaviour of device_del command.
 The result of pci hotunplug depends on behaviour of guest OS,
 but device_del command doesn't wait hotunplug's result.

 May I ask you a question?
 Which device does qemu_device_opts manage?
 just hotplugged to virtual machine? Or hotplugged to guest OS?

 By the present implementation, device_add command adds qemu_device_opts 
 immediately, 
 whether guest OS can hotplug the device or not.
 Nevertheless, device_del command waits for the device appropriately 
 until it is hotunplugged by the guest OS.

 By Markus's explanation, device_del command can't wait for the device
 which hotunplugged from guest OS.
 So, I feel it's better that qemu_device_opts manages the device
 which hotplugged to guest OS.

 If I am wrong, please let me know.

qemu_device_opts holds the currently defined device configurations.  A
device configuration becomes defined the moment its QemuOpts get created
(for -device and device_add: right when the argument gets parsed, which
is *before* the device gets created, let alone plugged).  It ceases to
be defined when device creation fails, or when the device is deleted
after unplug completed.

qemu_device_opts is *not* the set of devices currently plugged in.  That
information is encoded in the device tree.

Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject

2011-02-25 Thread Markus Armbruster

Anthony Liguori aligu...@linux.vnet.ibm.com writes:

 On 02/24/2011 10:20 AM, Markus Armbruster wrote:
 Anthony Liguorialigu...@linux.vnet.ibm.com  writes:


 On 02/24/2011 02:33 AM, Markus Armbruster wrote:
  
 Anthony Liguorianth...@codemonkey.ws   writes:
[...]
 Please describe all expected errors.

  
 Quoting qmp-commands.hx:

   3. Errors, in special, are not documented. Applications should NOT 
 check
  for specific errors classes or data (it's strongly recommended to 
 only
  check for the error key)

 Indeed, not a single error is documented there.  This is intentional.


 Yeah, but we're not 0.14 anymore and for 0.15, we need to document
 errors.  If you are suggesting I send a patch to remove that section,
 I'm more than happy to.
  
 Two separate issues here: 1. Are we ready to commit to the current
 design of errors, and 2. Is it fair to reject Lai's patch now because he
 doesn't document his errors.

 I'm not commenting on 1. here.

 Regarding 2.: rejecting a patch because it doesn't document an aspect
 that current master intentionally leaves undocumented is not how you
 treat contributors.  At least not if you want any other than certified
 masochists who enjoy pain, and professionals who get adequately
 compensated for it.

 Lead by example, not by fiat.


 http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json

 I am in the process of documenting the errors of every command.  It's
 a royal pain but I'm going to document everything we have right now.
 It's actually the last bit of work I need to finish before sending
 QAPI out.

 So for new commands being added, it would be hugely helpful for the
 authors to document the errors such that I don't have to reverse
 engineer all of the possible error conditions.

The moment this lands in master, you can begin to demand error
descriptions from contributors.  Until then, I'll NAK error descriptions
in qmp-commands.txt.  We left them undocumented there for good reasons:

 Once we have an error design in place that has a reasonable hope to
 stand the test of time, and have errors documented for at least some of
 the commands here, we can start to require proper error documentation
 for new commands.  But not now.

I won't NAK non-normative error descriptions, say in commit messages, or
in comments.  And I won't object to you asking for them.  But I feel you
really shouldn't make it a condition for committing patches.  Especially
not for simple patches that have been on list for months.

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device

2011-02-25 Thread Ian Campbell

On Thu, 2011-02-24 at 17:36 +, Paolo Bonzini wrote:
  +/* Send bytes to syslog */
  +static void log_writeb(PCIXenPlatformState *s, char val)
  +{
  + if (val == '\n' || s-log_buffer_off == sizeof(s-log_buffer) -
 1) {
  + /* Flush buffer */
  + s-log_buffer[s-log_buffer_off] = 0;
  + DPRINTF(%s\n, s-log_buffer);
 
  This should go to a chardev.
 
 Or it should just go away.  Guests can already write to 0xe9 and see
 the output on the host's xm dmesg ring and serial console. 

Only true if you have configured the guest log level to include debug
messages.

In any case host dmesg is not really the same as going to a file in dom0
from a supportability PoV.

Ian.

Re: [Qemu-devel] [PATCH v3 01/16] vnc: qemu can die if the client is disconnected while updating screen

2011-02-25 Thread Corentin Chary

On Wed, Feb 23, 2011 at 11:23 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 On 02/04/2011 02:05 AM, Corentin Chary wrote:

 agraf reported that qemu_mutex_destroy(vs-output_mutex) while failing
 in vnc_disconnect_finish().

 It's because vnc_worker_thread_loop() tries to unlock the mutex while
 not locked. The unlocking call doesn't fail (pthread bug ?), but
 the destroy call does.

 Signed-off-by: Corentin Charycorenti...@iksaif.net


 Applied 2/16.  Thanks!

 Regards,

 Anthony Liguori

Great, Thanks !

Please also merge these two patchs:
http://patchwork.ozlabs.org/patch/84517/
http://patchwork.ozlabs.org/patch/84496/

-- 
Corentin Chary
http://xf.iksaif.net

[Qemu-devel] [PATCH] linux-user: Fix unlock_user() call in return from poll()

2011-02-25 Thread Peter Maydell

Correct the broken attempt to calculate the third argument
to unlock_user() in the code path which unlocked the pollfd
array on return from poll() and ppoll() emulation. (This
only caused a problem if unlock_user() wasn't a no-op, eg
if DEBUG_REMAP is defined.)

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 linux-user/syscall.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cf8a4c3..822b863 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6314,10 +6314,8 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 for(i = 0; i  nfds; i++) {
 target_pfd[i].revents = tswap16(pfd[i].revents);
 }
-ret += nfds * (sizeof(struct target_pollfd)
-   - sizeof(struct pollfd));
 }
-unlock_user(target_pfd, arg1, ret);
+unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds);
 }
 break;
 #endif
-- 
1.7.1

Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device

2011-02-25 Thread Paolo Bonzini


On 02/25/2011 10:58 AM, Ian Campbell wrote:

  Or it should just go away.  Guests can already write to 0xe9 and see
  the output on the host's xm dmesg ring and serial console.

Only true if you have configured the guest log level to include debug
messages.


If you can recompile QEMU to add DEBUG_PLATFORM, you can usually do that 
too.  To avoid recompilation, rather than a chardev it would be even 
better to keep it as a trace event.


Paolo

RE: [Qemu-devel] Re: Strategic decision: COW format

2011-02-25 Thread Pavel Dovgaluk


 On 02/23/2011 05:50 PM, Anthony Liguori wrote:
  I still don't see.  What would you do with thousands of checkpoints?
 
 
  For reverse debugging, if you store checkpoints at a rate of save,
  every 10ms, and then degrade to storing every 100ms after 1 second,
  etc. you'll have quite a large number of snapshots pretty quickly.
  The idea of snapshotting with reverse debugging is that instead of
  undoing every instruction, you can revert to the snapshot before, and
  then replay the instruction stream until you get to the desired point
  in time.
 
 You cannot replay the instruction stream since inputs (interrupts, rdtsc
 or other timers, I/O) will be different.  You need Kemari for this.

  I've created the technology for replaying instruction stream and all of the 
inputs. This technology is similar to deterministic replay in VMWare.
  Now I need something to save machine state in many checkpoints to
implement reverse debugging.
  I think COW2 may be useful for it (or I should create something like this).


Pavel Dovgaluk

Re: [Qemu-devel] [PATCH 02/10] pxa2xx_pic: update to use qdev and arm-pic

2011-02-25 Thread andrzej zaborowski

Hi Dmitry,

On 20 February 2011 14:50, Dmitry Eremin-Solenikov dbarysh...@gmail.com wrote:
 Use qdev/sysbus framework to handle pxa2xx-pic. Instead of exposing IRQs
 via array, reference them via qdev_get_gpio_in(). Also pxa2xx_pic duplicated
 some code from arm-pic. Drop it, replacing with references to arm-pic,
 as all other ARM SoCs do for their PIC code.

As I said earlier not using arm-pic was deliberate (and I also asked
what the gain was from converting the pic to a separate sysbus device
from the CPU) so I skipped this part of the patch and pushed the rest
of it, please check that everything works.


 Signed-off-by: Dmitry Eremin-Solenikov dbarysh...@gmail.com
 ---
  hw/mainstone.c    |    2 +-
  hw/pxa.h          |   12 +++--
  hw/pxa2xx.c       |   84 +++
  hw/pxa2xx_gpio.c  |   11 +++--
  hw/pxa2xx_pic.c   |  126 
 -
  hw/pxa2xx_timer.c |   16 +++---
  6 files changed, 144 insertions(+), 107 deletions(-)

 diff --git a/hw/mainstone.c b/hw/mainstone.c
 index aec8d34..4eabdb9 100644
 --- a/hw/mainstone.c
 +++ b/hw/mainstone.c
 @@ -140,7 +140,7 @@ static void mainstone_common_init(ram_addr_t ram_size,
     }

     mst_irq = sysbus_create_simple(mainstone-fpga, MST_FPGA_PHYS,
 -                    cpu-pic[PXA2XX_PIC_GPIO_0]);
 +                    qdev_get_gpio_in(cpu-pic, PXA2XX_PIC_GPIO_0));

I'm also wondering if this device should really use the interrupt line
instead of using a GPIO.  It seems wrong that both the fpga and the
gpio module are connected to the same line.


     /* setup keypad */
     printf(map addr %p\n, map);
 diff --git a/hw/pxa.h b/hw/pxa.h
 index f73d33b..7c6fd44 100644
 --- a/hw/pxa.h
 +++ b/hw/pxa.h
 @@ -63,15 +63,16 @@
  # define PXA2XX_INTERNAL_SIZE  0x4

  /* pxa2xx_pic.c */
 -qemu_irq *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env);
 +DeviceState *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env,
 +        qemu_irq *arm_pic);

  /* pxa2xx_timer.c */
 -void pxa25x_timer_init(target_phys_addr_t base, qemu_irq *irqs);
 -void pxa27x_timer_init(target_phys_addr_t base, qemu_irq *irqs, qemu_irq 
 irq4);
 +void pxa25x_timer_init(target_phys_addr_t base, DeviceState *pic);
 +void pxa27x_timer_init(target_phys_addr_t base, DeviceState *pic);

  /* pxa2xx_gpio.c */
  DeviceState *pxa2xx_gpio_init(target_phys_addr_t base,
 -                CPUState *env, qemu_irq *pic, int lines);
 +                CPUState *env, DeviceState *pic, int lines);
  void pxa2xx_gpio_read_notifier(DeviceState *dev, qemu_irq handler);

  /* pxa2xx_dma.c */
 @@ -125,7 +126,7 @@ typedef struct PXA2xxFIrState PXA2xxFIrState;

  typedef struct {
     CPUState *env;
 -    qemu_irq *pic;
 +    DeviceState *pic;
     qemu_irq reset;
     PXA2xxDMAState *dma;
     DeviceState *gpio;
 @@ -180,6 +181,7 @@ typedef struct {
     QEMUTimer *rtc_swal1;
     QEMUTimer *rtc_swal2;
     QEMUTimer *rtc_pi;
 +    qemu_irq rtc_irq;
  } PXA2xxState;

  struct PXA2xxI2SState {
 diff --git a/hw/pxa2xx.c b/hw/pxa2xx.c
 index 9ebbce6..58e6e7b 100644
 --- a/hw/pxa2xx.c
 +++ b/hw/pxa2xx.c
 @@ -16,6 +16,7 @@
  #include qemu-timer.h
  #include qemu-char.h
  #include blockdev.h
 +#include arm-misc.h

  static struct {
     target_phys_addr_t io_base;
 @@ -888,7 +889,7 @@ static int pxa2xx_ssp_init(SysBusDevice *dev)

  static inline void pxa2xx_rtc_int_update(PXA2xxState *s)
  {
 -    qemu_set_irq(s-pic[PXA2XX_PIC_RTCALARM], !!(s-rtsr  0x2553));
 +    qemu_set_irq(s-rtc_irq, !!(s-rtsr  0x2553));
  }

  static void pxa2xx_rtc_hzupdate(PXA2xxState *s)
 @@ -1197,6 +1198,8 @@ static void pxa2xx_rtc_init(PXA2xxState *s)
     s-rtc_swal1 = qemu_new_timer(rt_clock, pxa2xx_rtc_swal1_tick, s);
     s-rtc_swal2 = qemu_new_timer(rt_clock, pxa2xx_rtc_swal2_tick, s);
     s-rtc_pi    = qemu_new_timer(rt_clock, pxa2xx_rtc_pi_tick,    s);
 +
 +    s-rtc_irq = qdev_get_gpio_in(s-pic, PXA2XX_PIC_RTCALARM);
  }

  static void pxa2xx_rtc_save(QEMUFile *f, void *opaque)
 @@ -2069,6 +2072,8 @@ PXA2xxState *pxa270_init(unsigned int sdram_size, const 
 char *revision)
     PXA2xxState *s;
     int iomemtype, i;
     DriveInfo *dinfo;
 +    qemu_irq *arm_pic;
 +
     s = (PXA2xxState *) qemu_mallocz(sizeof(PXA2xxState));

     if (revision  strncmp(revision, pxa27, 5)) {
 @@ -2093,12 +2098,13 @@ PXA2xxState *pxa270_init(unsigned int sdram_size, 
 const char *revision)
                     0x4, qemu_ram_alloc(NULL, pxa270.internal,
                                             0x4) | IO_MEM_RAM);

 -    s-pic = pxa2xx_pic_init(0x40d0, s-env);
 +    arm_pic = arm_pic_init_cpu(s-env);
 +    s-pic = pxa2xx_pic_init(0x40d0, s-env, arm_pic);

 -    s-dma = pxa27x_dma_init(0x4000, s-pic[PXA2XX_PIC_DMA]);
 +    s-dma = pxa27x_dma_init(0x4000,
 +            qdev_get_gpio_in(s-pic, PXA2XX_PIC_DMA));

 -    pxa27x_timer_init(0x40a0, s-pic[PXA2XX_PIC_OST_0],
 -                    s-pic[PXA27X_PIC_OST_4_11]);
 +

[Qemu-devel] EXPLORE: Lifesciences in India!

2011-02-25 Thread Tushara S. Nair

Dear Sir,
 
I am Tushara  Nair,  the Industry Relationship Manager at Atharva
Lifesciences Consulting Pvt. Ltd.  Atharva Lifesciences Consulting is a
lifesciences consulting firm tracking the industry in India and in certain
territories internationally. Atharva Lifesciences Consulting Pvt. Ltd is the
leading consulting firm delivering  reports and information on biopharma in
India.
 
We publish e-newspapers in 6 editions. Each edition is published once a
week.
 
EXPLORE BioPharma: Tracks the Science of Biotech  Pharmaceuticals. Click
here to see the sample (http://bit.ly/fvGJ5W )
EXPLORE Agri  Vet: Tracks the realm of Agribusiness and Veterinary
Biotechnology. Click here to see the sample (http://bit.ly/ek2uRG )
EXPLORE Ayurveda: Looks at the world of natural medicine in India. Click
here to see the sample (http://bit.ly/eb8HI7 )
EXPLORE BioPharma Alliances: Looks at joint ventures and agreements in
India. Click here to see the sample (http://bit.ly/gPR4KO )
EXPLORE Aquaculture  Marine Biotechnology: Looks at the marine  fisheries
area in India. Click here to see the sample (http://bit.ly/eVaYZs  )
EXPLORE BioFuels: Tracks the Alternative Energy Industry in India. Click
here to see the sample (http://bit.ly/e0aL4C )
 
Please let me know which e-newspaper are you interested to receive.
 
With Kind Regards,
 
Tushara
 
--
Tushara S. Nair
Industry Relationship Manager
Atharva Lifesciences Consulting Pvt. Ltd.
Bangalore, INDIA
Tel No: +91-80-42140007, 42140016 (Ext:  31)
Skype: atharvalife
 
 http://alc...@atharvalife.com alc...@atharvalife.com
www.atharvalife.com http://www.atharvalife.com/

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-25 Thread Stefan Hajnoczi

On Fri, Feb 25, 2011 at 11:20 AM, Pavel Dovgaluk
pavel.dovga...@ispras.ru wrote:

 On 02/23/2011 05:50 PM, Anthony Liguori wrote:
  I still don't see.  What would you do with thousands of checkpoints?
 
 
  For reverse debugging, if you store checkpoints at a rate of save,
  every 10ms, and then degrade to storing every 100ms after 1 second,
  etc. you'll have quite a large number of snapshots pretty quickly.
  The idea of snapshotting with reverse debugging is that instead of
  undoing every instruction, you can revert to the snapshot before, and
  then replay the instruction stream until you get to the desired point
  in time.

 You cannot replay the instruction stream since inputs (interrupts, rdtsc
 or other timers, I/O) will be different.  You need Kemari for this.

  I've created the technology for replaying instruction stream and all of the
 inputs. This technology is similar to deterministic replay in VMWare.
  Now I need something to save machine state in many checkpoints to
 implement reverse debugging.
  I think COW2 may be useful for it (or I should create something like this).

Or the BTRFS_IOC_CLONE ioctl on the btrfs filesystem.  You can
copy-on-write clone a file using it.

Stefan

Re: [Qemu-devel] [PATCH 02/10] pxa2xx_pic: update to use qdev and arm-pic

2011-02-25 Thread Dmitry Eremin-Solenikov

On 2/25/11, andrzej zaborowski balr...@gmail.com wrote:
 Hi Dmitry,

 On 20 February 2011 14:50, Dmitry Eremin-Solenikov dbarysh...@gmail.com
 wrote:
 Use qdev/sysbus framework to handle pxa2xx-pic. Instead of exposing IRQs
 via array, reference them via qdev_get_gpio_in(). Also pxa2xx_pic
 duplicated
 some code from arm-pic. Drop it, replacing with references to arm-pic,
 as all other ARM SoCs do for their PIC code.

 As I said earlier not using arm-pic was deliberate (and I also asked
 what the gain was from converting the pic to a separate sysbus device
 from the CPU) so I skipped this part of the patch and pushed the rest
 of it, please check that everything works.

The primary goal was using arm-pic IRQs in pxa2xx-gpio and not having to
mess with passing CPUEnv around. Moreover all other ARM SoCs use
arm-pic w/o any references to performance gains/loses.

I can still provide a patch that will use arm-pic only for
pxa2xx-gpio, will that
be suitable for you?

BTW: it seems that your version won't work: using of sysbus_init_mmio()
is hackish and there is no place where that mmio region will be mapped to base.

About mapping pic to a separate device from CPU. Initially I wanted to reuse
somehow pxa2xx-pic for sa-11[0-1]0 emulation. It doesn't seem reasonable
for me anymore anyway. Second, the pic is already in separate
module, so I didn't want to disturb main pxa2xx.c with it.
I might still later use pxa2xx-pic for allocating main CPU structure and
making all other device hang on ot.

 diff --git a/hw/mainstone.c b/hw/mainstone.c
 index aec8d34..4eabdb9 100644
 --- a/hw/mainstone.c
 +++ b/hw/mainstone.c
 @@ -140,7 +140,7 @@ static void mainstone_common_init(ram_addr_t ram_size,
 }

 mst_irq = sysbus_create_simple(mainstone-fpga, MST_FPGA_PHYS,
 -cpu-pic[PXA2XX_PIC_GPIO_0]);
 +qdev_get_gpio_in(cpu-pic, PXA2XX_PIC_GPIO_0));

 I'm also wondering if this device should really use the interrupt line
 instead of using a GPIO.  It seems wrong that both the fpga and the
 gpio module are connected to the same line.

Fixed, will submit a fix soon.

 @@ -241,53 +239,33 @@ static CPUWriteMemoryFunc * const
 pxa2xx_pic_writefn[] = {
 pxa2xx_pic_mem_write,
  };

 -static void pxa2xx_pic_save(QEMUFile *f, void *opaque)
 +static int pxa2xx_pic_post_load(void *opaque, int version_id)
  {
 -PXA2xxPICState *s = (PXA2xxPICState *) opaque;
 -int i;
 -
 -for (i = 0; i  2; i ++)
 -qemu_put_be32s(f, s-int_enabled[i]);
 -for (i = 0; i  2; i ++)
 -qemu_put_be32s(f, s-int_pending[i]);
 -for (i = 0; i  2; i ++)
 -qemu_put_be32s(f, s-is_fiq[i]);
 -qemu_put_be32s(f, s-int_idle);
 -for (i = 0; i  PXA2XX_PIC_SRCS; i ++)
 -qemu_put_be32s(f, s-priority[i]);
 +pxa2xx_pic_update(opaque);
 +return 0;
  }

 -static int pxa2xx_pic_load(QEMUFile *f, void *opaque, int version_id)
 +DeviceState *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env,
 +qemu_irq *arm_pic)
  {
 -PXA2xxPICState *s = (PXA2xxPICState *) opaque;
 -int i;
 -
 -for (i = 0; i  2; i ++)
 -qemu_get_be32s(f, s-int_enabled[i]);
 -for (i = 0; i  2; i ++)
 -qemu_get_be32s(f, s-int_pending[i]);
 -for (i = 0; i  2; i ++)
 -qemu_get_be32s(f, s-is_fiq[i]);
 -qemu_get_be32s(f, s-int_idle);
 -for (i = 0; i  PXA2XX_PIC_SRCS; i ++)
 -qemu_get_be32s(f, s-priority[i]);
 +DeviceState *dev;

 -pxa2xx_pic_update(opaque);
 -return 0;
 +dev = sysbus_create_varargs(pxa2xx_pic, base,
 +arm_pic[ARM_PIC_CPU_IRQ],
 +arm_pic[ARM_PIC_CPU_FIQ],
 +arm_pic[ARM_PIC_CPU_WAKE],
 +NULL);
 +
 +/* Enable IC coprocessor access.  */
 +cpu_arm_set_cp_io(env, 6, pxa2xx_pic_cp_read, pxa2xx_pic_cp_write,
 dev);

 I changed the last parameter to s as passing dev here was hacky.


Fine with me.


BTW: what about all other patches?

-- 
With best wishes
Dmitry

Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine

2011-02-25 Thread Anthony PERARD

On Thu, Feb 24, 2011 at 17:31, Anthony Liguori anth...@codemonkey.ws wrote:
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 7b74473..0ab8907 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -36,6 +36,10 @@
  #include sysbus.h
  #include arch_init.h
  #include blockdev.h
 +#include xen.h
 +#ifdef CONFIG_XEN
 +#  include xen/hvm/hvm_info_table.h
 +#endif


 Admittedly a nit, but isn't this a system header?

It belongs to Xen. I use it for HVM_MAX_VCPUS.

I can put it in xen.h, if you prefer.

Regards,

-- 
Anthony PERARD

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.

2011-02-25 Thread Anthony PERARD

On Thu, Feb 24, 2011 at 17:29, Anthony Liguori anth...@codemonkey.ws wrote:
 On 02/02/2011 08:49 AM, anthony.per...@citrix.com wrote:

 From: Anthony PERARDanthony.per...@citrix.com

 This patch adds a generic layer for xc calls, allowing us to choose
 between the
 xenner and xen implementations at runtime.

 It also update the libxenctrl calls in Qemu to use the new interface,
 otherwise Qemu wouldn't be able to build against new versions of the
 library.

 We check libxenctrl version in configure, from Xen 3.3.0 to Xen
 unstable.

 Signed-off-by: Anthony PERARDanthony.per...@citrix.com
 Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com
 Acked-by: Alexander Grafag...@suse.de
 ---
  Makefile.target      |    3 +
  configure            |   62 +++-
  hw/xen_backend.c     |   74 ++-
  hw/xen_backend.h     |    7 +-
  hw/xen_common.h      |   38 ++
  hw/xen_console.c     |   10 +-
  hw/xen_devconfig.c   |   10 +-
  hw/xen_disk.c        |   28 ---
  hw/xen_domainbuild.c |   29 
  hw/xen_interfaces.c  |  191
 
  hw/xen_interfaces.h  |  198
 ++
  hw/xen_nic.c         |   36 +-
  hw/xenfb.c           |   14 ++--
  13 files changed, 584 insertions(+), 116 deletions(-)
  create mode 100644 hw/xen_interfaces.c
  create mode 100644 hw/xen_interfaces.h

 diff --git a/Makefile.target b/Makefile.target
 index db29e96..d09719f 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -205,6 +205,9 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
  QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
  QEMU_CFLAGS += $(VNC_PNG_CFLAGS)

 +# xen support
 +obj-$(CONFIG_XEN) += xen_interfaces.o
 +
  # xen backend driver support
  obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o
  obj-$(CONFIG_XEN) += xen_console.o xenfb.o xen_disk.o xen_nic.o
 diff --git a/configure b/configure
 index 5a9121d..fde9bad 100755
 --- a/configure
 +++ b/configure
 @@ -126,6 +126,7 @@ vnc_jpeg=
  vnc_png=
  vnc_thread=no
  xen=
 +xen_ctrl_version=
  linux_aio=
  attr=
  vhost_net=
 @@ -1144,13 +1145,71 @@ fi

  if test $xen != no ; then
    xen_libs=-lxenstore -lxenctrl -lxenguest
 +
 +  # Xen unstable
    cat  $TMPCEOF
  #includexenctrl.h
  #includexs.h
 -int main(void) { xs_daemon_open(); xc_interface_open(); return 0; }
 +#includestdint.h
 +#includexen/hvm/hvm_info_table.h
 +#if !defined(HVM_MAX_VCPUS)
 +# error HVM_MAX_VCPUS not defined
 +#endif
 +int main(void) {
 +  xc_interface *xc;
 +  xs_daemon_open();
 +  xc = xc_interface_open(0, 0, 0);
 +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
 +  xc_gnttab_open(NULL, 0);
 +  return 0;
 +}
  EOF
    if compile_prog  $xen_libs ; then
 +    xen_ctrl_version=410
 +    xen=yes
 +
 +  # Xen 4.0.0
 +  elif (
 +      cat  $TMPCEOF
 +#includexenctrl.h
 +#includexs.h
 +#includestdint.h
 +#includexen/hvm/hvm_info_table.h
 +#if !defined(HVM_MAX_VCPUS)
 +# error HVM_MAX_VCPUS not defined
 +#endif
 +int main(void) {
 +  xs_daemon_open();
 +  xc_interface_open();
 +  xc_gnttab_open();
 +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
 +  return 0;
 +}
 +EOF
 +      compile_prog  $xen_libs
 +    ) ; then
 +    xen_ctrl_version=400
 +    xen=yes
 +
 +  # Xen 3.3.0, 3.4.0
 +  elif (
 +      cat  $TMPCEOF
 +#includexenctrl.h
 +#includexs.h
 +int main(void) {
 +  xs_daemon_open();
 +  xc_interface_open();
 +  xc_gnttab_open();
 +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
 +  return 0;
 +}
 +EOF
 +      compile_prog  $xen_libs
 +    ) ; then
 +    xen_ctrl_version=330
      xen=yes
 +
 +  # Xen not found or unsupported
    else
      if test $xen = yes ; then
        feature_not_found xen
 @@ -3009,6 +3068,7 @@ case $target_arch2 in
      if test $xen = yes -a $target_softmmu = yes ; then
        echo CONFIG_XEN=y  $config_target_mak
        echo LIBS+=$xen_libs  $config_target_mak
 +      echo CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version
  $config_target_mak
      fi
  esac
  case $target_arch2 in
 diff --git a/hw/xen_backend.c b/hw/xen_backend.c
 index 860b038..cf081e1 100644
 --- a/hw/xen_backend.c
 +++ b/hw/xen_backend.c
 @@ -43,7 +43,8 @@
  /* - */

  /* public */
 -int xen_xc;
 +XenXC xen_xc = XC_HANDLER_INITIAL_VALUE;
 +XenGnttab xen_xcg = XC_HANDLER_INITIAL_VALUE;
  struct xs_handle *xenstore = NULL;
  const char *xen_protocol;

 @@ -58,7 +59,7 @@ int xenstore_write_str(const char *base, const char
 *node, const char *val)
      char abspath[XEN_BUFSIZE];

      snprintf(abspath, sizeof(abspath), %s/%s, base, node);
 -    if (!xs_write(xenstore, 0, abspath, val, strlen(val)))
 +    if (!xs_ops.write(xenstore, 0, abspath, val, strlen(val)))
          return -1;
      return 0;
  }
 @@ -70,7 +71,7 @@ char *xenstore_read_str(const char *base, const char
 *node)
      char *str, *ret = NULL;

      snprintf(abspath, sizeof(abspath), %s/%s, base, node);
 -    str = xs_read(xenstore, 0, abspath,len);

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.

2011-02-25 Thread Anthony Liguori


On 02/25/2011 08:06 AM, Anthony PERARD wrote:

On Thu, Feb 24, 2011 at 17:29, Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 02/02/2011 08:49 AM, anthony.per...@citrix.com wrote:
 

From: Anthony PERARDanthony.per...@citrix.com

This patch adds a generic layer for xc calls, allowing us to choose
between the
xenner and xen implementations at runtime.

It also update the libxenctrl calls in Qemu to use the new interface,
otherwise Qemu wouldn't be able to build against new versions of the
library.

We check libxenctrl version in configure, from Xen 3.3.0 to Xen
unstable.

Signed-off-by: Anthony PERARDanthony.per...@citrix.com
Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com
Acked-by: Alexander Grafag...@suse.de
---
  Makefile.target  |3 +
  configure|   62 +++-
  hw/xen_backend.c |   74 ++-
  hw/xen_backend.h |7 +-
  hw/xen_common.h  |   38 ++
  hw/xen_console.c |   10 +-
  hw/xen_devconfig.c   |   10 +-
  hw/xen_disk.c|   28 ---
  hw/xen_domainbuild.c |   29 
  hw/xen_interfaces.c  |  191

  hw/xen_interfaces.h  |  198
++
  hw/xen_nic.c |   36 +-
  hw/xenfb.c   |   14 ++--
  13 files changed, 584 insertions(+), 116 deletions(-)
  create mode 100644 hw/xen_interfaces.c
  create mode 100644 hw/xen_interfaces.h

diff --git a/Makefile.target b/Makefile.target
index db29e96..d09719f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -205,6 +205,9 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
  QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
  QEMU_CFLAGS += $(VNC_PNG_CFLAGS)

+# xen support
+obj-$(CONFIG_XEN) += xen_interfaces.o
+
  # xen backend driver support
  obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o
  obj-$(CONFIG_XEN) += xen_console.o xenfb.o xen_disk.o xen_nic.o
diff --git a/configure b/configure
index 5a9121d..fde9bad 100755
--- a/configure
+++ b/configure
@@ -126,6 +126,7 @@ vnc_jpeg=
  vnc_png=
  vnc_thread=no
  xen=
+xen_ctrl_version=
  linux_aio=
  attr=
  vhost_net=
@@ -1144,13 +1145,71 @@ fi

  if test $xen != no ; then
xen_libs=-lxenstore -lxenctrl -lxenguest
+
+  # Xen unstable
cat$TMPCEOF
  #includexenctrl.h
  #includexs.h
-int main(void) { xs_daemon_open(); xc_interface_open(); return 0; }
+#includestdint.h
+#includexen/hvm/hvm_info_table.h
+#if !defined(HVM_MAX_VCPUS)
+# error HVM_MAX_VCPUS not defined
+#endif
+int main(void) {
+  xc_interface *xc;
+  xs_daemon_open();
+  xc = xc_interface_open(0, 0, 0);
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  xc_gnttab_open(NULL, 0);
+  return 0;
+}
  EOF
if compile_prog  $xen_libs ; then
+xen_ctrl_version=410
+xen=yes
+
+  # Xen 4.0.0
+  elif (
+  cat$TMPCEOF
+#includexenctrl.h
+#includexs.h
+#includestdint.h
+#includexen/hvm/hvm_info_table.h
+#if !defined(HVM_MAX_VCPUS)
+# error HVM_MAX_VCPUS not defined
+#endif
+int main(void) {
+  xs_daemon_open();
+  xc_interface_open();
+  xc_gnttab_open();
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  return 0;
+}
+EOF
+  compile_prog  $xen_libs
+) ; then
+xen_ctrl_version=400
+xen=yes
+
+  # Xen 3.3.0, 3.4.0
+  elif (
+  cat$TMPCEOF
+#includexenctrl.h
+#includexs.h
+int main(void) {
+  xs_daemon_open();
+  xc_interface_open();
+  xc_gnttab_open();
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  return 0;
+}
+EOF
+  compile_prog  $xen_libs
+) ; then
+xen_ctrl_version=330
  xen=yes
+
+  # Xen not found or unsupported
else
  if test $xen = yes ; then
feature_not_found xen
@@ -3009,6 +3068,7 @@ case $target_arch2 in
  if test $xen = yes -a $target_softmmu = yes ; then
echo CONFIG_XEN=y$config_target_mak
echo LIBS+=$xen_libs$config_target_mak
+  echo CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version
  $config_target_mak
  fi
  esac
  case $target_arch2 in
diff --git a/hw/xen_backend.c b/hw/xen_backend.c
index 860b038..cf081e1 100644
--- a/hw/xen_backend.c
+++ b/hw/xen_backend.c
@@ -43,7 +43,8 @@
  /* - */

  /* public */
-int xen_xc;
+XenXC xen_xc = XC_HANDLER_INITIAL_VALUE;
+XenGnttab xen_xcg = XC_HANDLER_INITIAL_VALUE;
  struct xs_handle *xenstore = NULL;
  const char *xen_protocol;

@@ -58,7 +59,7 @@ int xenstore_write_str(const char *base, const char
*node, const char *val)
  char abspath[XEN_BUFSIZE];

  snprintf(abspath, sizeof(abspath), %s/%s, base, node);
-if (!xs_write(xenstore, 0, abspath, val, strlen(val)))
+if (!xs_ops.write(xenstore, 0, abspath, val, strlen(val)))
  return -1;
  return 0;
  }
@@ -70,7 +71,7 @@ char *xenstore_read_str(const char *base, const char
*node)
  char *str, *ret = NULL;

  snprintf(abspath, sizeof(abspath), %s/%s, base, node);
-str = xs_read(xenstore, 0, abspath,len);
+str = xs_ops.read(xenstore, 0,

Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine

2011-02-25 Thread Anthony Liguori


On 02/25/2011 07:55 AM, Anthony PERARD wrote:

On Thu, Feb 24, 2011 at 17:31, Anthony Liguorianth...@codemonkey.ws  wrote:
   

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 7b74473..0ab8907 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -36,6 +36,10 @@
  #include sysbus.h
  #include arch_init.h
  #include blockdev.h
+#include xen.h
+#ifdef CONFIG_XEN
+#  include xen/hvm/hvm_info_table.h
+#endif

   

Admittedly a nit, but isn't this a system header?
 

It belongs to Xen. I use it for HVM_MAX_VCPUS.

I can put it in xen.h, if you prefer.
   


I meant, you should use:

#include xen/hvm/hvm_info_table.h

Regards,

Anthony Liguori


Regards,

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device

2011-02-25 Thread Anthony PERARD

On Fri, Feb 25, 2011 at 10:54, Paolo Bonzini pbonz...@redhat.com wrote:
 On 02/25/2011 10:58 AM, Ian Campbell wrote:

   Or it should just go away.  Guests can already write to 0xe9 and see
   the output on the host's xm dmesg ring and serial console.

 Only true if you have configured the guest log level to include debug
 messages.

 If you can recompile QEMU to add DEBUG_PLATFORM, you can usually do that
 too.  To avoid recompilation, rather than a chardev it would be even better
 to keep it as a trace event.

The trace event seems a good idea, let's go for that!

Regards,

-- 
Anthony PERARD

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine

2011-02-25 Thread Anthony PERARD

On Fri, Feb 25, 2011 at 14:09, Anthony Liguori anth...@codemonkey.ws wrote:
 On 02/25/2011 07:55 AM, Anthony PERARD wrote:

 On Thu, Feb 24, 2011 at 17:31, Anthony Liguorianth...@codemonkey.ws
  wrote:


 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 7b74473..0ab8907 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -36,6 +36,10 @@
  #include sysbus.h
  #include arch_init.h
  #include blockdev.h
 +#include xen.h
 +#ifdef CONFIG_XEN
 +#  include xen/hvm/hvm_info_table.h
 +#endif



 Admittedly a nit, but isn't this a system header?


 It belongs to Xen. I use it for HVM_MAX_VCPUS.

 I can put it in xen.h, if you prefer.


 I meant, you should use:

 #include xen/hvm/hvm_info_table.h

Sure, I will do that.

Thanks,

-- 
Anthony PERARD

[Qemu-devel] [PATCH] target-arm: Don't decode old cp15 WFI instructions on v7 cores

2011-02-25 Thread Peter Maydell

In v7 of the ARM architecture, WFI (wait for interrupt) is a first-class
instruction, but in previous versions this functionality was provided
via a cp15 coprocessor register. Add correct feature checks to the
decoding of the cp15 WFI instructions so that they behave correctly
for newer cores. In particular, the old 0,c7,c8,2 encoding used on
ARM940 has been reused for VA-to-PA translation in v6 and v7.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
This patch stands alone as a fix to target-arm; it's a prerequisite
for Adam's VA-PA translation patch, because otherwise attempting a
user-read translation will get you a WFI instead...

 target-arm/translate.c |   35 ++-
 1 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index dbd958b..baa1256 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2538,13 +2538,38 @@ static int disas_cp15_insn(CPUState *env, DisasContext 
*s, uint32_t insn)
 if (IS_USER(s)  !cp15_user_ok(insn)) {
 return 1;
 }
-if ((insn  0x0fff0fff) == 0x0e070f90
-|| (insn  0x0fff0fff) == 0x0e070f58) {
-/* Wait for interrupt.  */
-gen_set_pc_im(s-pc);
-s-is_jmp = DISAS_WFI;
+
+/* Pre-v7 versions of the architecture implemented WFI via coprocessor
+ * instructions rather than a separate instruction.
+ */
+if ((insn  0x0fff0fff) == 0x0e070f90) {
+/* 0,c7,c0,4: Standard v6 WFI (also used in some pre-v6 cores).
+ * In v7, this must NOP.
+ */
+if (!arm_feature(env, ARM_FEATURE_V7)) {
+/* Wait for interrupt.  */
+gen_set_pc_im(s-pc);
+s-is_jmp = DISAS_WFI;
+}
 return 0;
 }
+
+if ((insn  0x0fff0fff) == 0x0e070f58) {
+/* 0,c7,c8,2: Not all pre-v6 cores implemented this WFI,
+ * so this is slightly over-broad.
+ */
+if (!arm_feature(env, ARM_FEATURE_V6)) {
+/* Wait for interrupt.  */
+gen_set_pc_im(s-pc);
+s-is_jmp = DISAS_WFI;
+return 0;
+}
+/* Otherwise fall through to handle via helper function.
+ * In particular, on v7 and some v6 cores this is one of
+ * the VA-PA registers.
+ */
+}
+
 rd = (insn  12)  0xf;
 
 if (cp15_tls_load_store(env, s, insn, rd))
-- 
1.7.1

Re: [Qemu-devel] [PATCH 2/2] microblaze: Allow targeting little-endian mb

2011-02-25 Thread Blue Swirl

On Mon, Feb 21, 2011 at 3:44 PM, Edgar E. Iglesias
edgar.igles...@petalogix.com wrote:
 Signed-off-by: Edgar E. Iglesias edgar.igles...@petalogix.com
 ---
  configure                                   |    7 +--
  default-configs/microblazeel-linux-user.mak |    1 +
  default-configs/microblazeel-softmmu.mak    |    4 
  3 files changed, 10 insertions(+), 2 deletions(-)
  create mode 100644 default-configs/microblazeel-linux-user.mak
  create mode 100644 default-configs/microblazeel-softmmu.mak

 diff --git a/configure b/configure
 index 791b71d..3036faf 100755
 --- a/configure
 +++ b/configure
 @@ -984,6 +984,7 @@ arm-softmmu \
  cris-softmmu \
  m68k-softmmu \
  microblaze-softmmu \
 +microblazeel-softmmu \
  mips-softmmu \
  mipsel-softmmu \
  mips64-softmmu \
 @@ -1008,6 +1009,7 @@ armeb-linux-user \
  cris-linux-user \
  m68k-linux-user \
  microblaze-linux-user \
 +microblazeel-linux-user \
  mips-linux-user \
  mipsel-linux-user \
  ppc-linux-user \
 @@ -3005,7 +3007,8 @@ case $target_arch2 in
     target_long_alignment=2
     target_llong_alignment=2
   ;;
 -  microblaze)
 +  microblaze|microblazeel)
 +    TARGET_ARCH=microblaze
     bflt=yes
     target_nptl=yes
     target_phys_bits=32
 @@ -3231,7 +3234,7 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
     echo CONFIG_M68K_DIS=y   $config_target_mak
     echo CONFIG_M68K_DIS=y   $libdis_config_mak
   ;;
 -  microblaze)
 +  microblaze*)
     echo CONFIG_MICROBLAZE_DIS=y   $config_target_mak
     echo CONFIG_MICROBLAZE_DIS=y   $libdis_config_mak
   ;;
 diff --git a/default-configs/microblazeel-linux-user.mak 
 b/default-configs/microblazeel-linux-user.mak
 new file mode 100644
 index 000..566fdc0
 --- /dev/null
 +++ b/default-configs/microblazeel-linux-user.mak
 @@ -0,0 +1 @@
 +# Default configuration for microblaze-linux-user

microblazeel-linux-user?

 diff --git a/default-configs/microblazeel-softmmu.mak 
 b/default-configs/microblazeel-softmmu.mak
 new file mode 100644
 index 000..4399b8b
 --- /dev/null
 +++ b/default-configs/microblazeel-softmmu.mak
 @@ -0,0 +1,4 @@
 +# Default configuration for microblaze-softmmu

microblazeel-softmmu?

Re: [Qemu-devel] when to check external interrupt request ? or what is the timing to check and arise external interrupt ?

2011-02-25 Thread Blue Swirl

On Tue, Feb 22, 2011 at 6:47 AM, wang sheng wans...@gmail.com wrote:
 I'm porting qemu to an new architecture. I come across some difficulty
 that I can't define the timing that enable qemu's main-thread to be
 interrupt and check external interrupt .

 I understand the way that mips used to check external interrupt .

 in qemu-system-mips ,   during do translation ,  if there is an
 instruction that access CP0's Status register and Cause register,  the
 target-mips/translate.c will add  a calling to function 
 helper_interrupt_restart  in the end of the translation_block.

 But in my architecture which use load/st instruction to access  the
 contr register in interrupt controller .  Because   I can't
 distinguish the access for normal memory   and  access for  interrupt
 controller's register ,  I  can't add  interrupt_restart function
 calling in the end of translation block.

 How can I do  to enable qemu have chance to check external interrupt ?

Please try something similar to how cpu_request_exit function and
signal is used by hw/dma.c and hw/pc.c.

Re: [Qemu-devel] [PATCH 3/3] target-arm: Use TCG temporary leak debugging facilities

2011-02-25 Thread Blue Swirl

On Wed, Feb 23, 2011 at 5:19 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 Use the new TCG temporary leak debugging facilities to
 check that each ARM instruction does not leak temporaries.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  target-arm/translate.c |    7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)

 diff --git a/target-arm/translate.c b/target-arm/translate.c
 index 31067d5..b96a136 100644
 --- a/target-arm/translate.c
 +++ b/target-arm/translate.c
 @@ -9125,6 +9125,8 @@ static inline void 
 gen_intermediate_code_internal(CPUState *env,

     gen_icount_start();

 +    tcg_clear_temp_count();
 +
     /* A note on handling of the condexec (IT) bits:
      *
      * We want to avoid the overhead of having to write the updated condexec
 @@ -9234,6 +9236,11 @@ static inline void 
 gen_intermediate_code_internal(CPUState *env,
             gen_set_label(dc-condlabel);
             dc-condjmp = 0;
         }
 +
 +        if (tcg_check_temp_count()) {
 +            fprintf(stderr, TCG temporary leak before %08x\n, dc-pc);
 +        }

Perhaps this check and tcg_clear_temp_count() calls should be added
instead to tb_gen_code() in exec.c, to benefit all targets at once. PC
information will not be as accurate, though.

Re: [Qemu-devel] checkpatch.pl false positive: wants braces on #if

2011-02-25 Thread Blue Swirl

On Wed, Feb 23, 2011 at 6:07 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 If you run checkpatch.pl on this patch:
 http://patchwork.ozlabs.org/patch/84189/

 it complains:
 WARNING: braces {} are necessary even for single statement blocks
 #29: FILE: tcg/tcg.c:454:
 +#if defined(CONFIG_DEBUG_TCG)
 +    s-temps_in_use++;


 ...but braces on a cpp conditional are a bit tricky :-)

 The script is sufficiently hairy perl that I'm afraid I
 can't suggest a solution, only report the problem.

Maybe this helps:
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 075b614..4b1e2c2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2537,7 +2537,7 @@ sub process {
}
if (!defined $suppress_ifbraces{$linenr - 1} 
$line =~ /\b(if|while|for|else)\b/ 
-   $line !~ /\#\s*else/) {
+   $line !~ /\#\s*(if|else|elif)/) {
my $allowed = 0;

# Check the pre-context.

Re: [Qemu-devel] [PATCH 3/3] target-arm: Use TCG temporary leak debugging facilities

2011-02-25 Thread Peter Maydell

On 25 February 2011 15:32, Blue Swirl blauwir...@gmail.com wrote:
 On Wed, Feb 23, 2011 at 5:19 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 +
 +        if (tcg_check_temp_count()) {
 +            fprintf(stderr, TCG temporary leak before %08x\n, dc-pc);
 +        }

 Perhaps this check and tcg_clear_temp_count() calls should be added
 instead to tb_gen_code() in exec.c, to benefit all targets at once. PC
 information will not be as accurate, though.

You'd get a pile of false positives, for instance target-arm doesn't
bother to destroy the whole-TB temporaries like cpu_F0s because there's
no need to. We're trying to check whether the translator could
unboundedly leak temporaries...

-- PMM

Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.

2011-02-25 Thread Anthony PERARD

On Fri, Feb 25, 2011 at 14:11, Anthony Liguori anth...@codemonkey.ws wrote:
 I think I gave this feedback before but I'd really like to see static
 inlines here.

 It's very likely that you'll either want to have tracing or some commands
 can have a NULL function pointer in which case having a central location
 to
 do this is very useful.

 Plus, it's more natural to read code that's making a function call
 instead
 of going through a function pointer in a structure redirection.

 Can probably do this with just a sed over the current patch.


 Is it good to have a .h with functions like that? :

 static inline XenXC qemu_xc_interface_open(xentoollog_logger *logger,
                             xentoollog_logger *dombuild_logger,
                             unsigned open_flags)
 {
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION  410
     return xc_interface_open();
 #else
     return xc_interface_open(logger, dombuild_logger, open_flags);
 #endif
 }


 So there will have no more structure redirection.


 It would be better to have two versions of the header, one that implemented
 the  410 functions and one that implemented the newer functions.

 If you're just using the new signature for everything, you could even just
 #define in the later header.

Actually, the #define in the later header was done in a previous
version of this patch series. But I change to the structure
redirection after a comment of Alexander Graf and by taking one of his
patches for Xenner.

Here is the comment of Alexander:
http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg01251.html
The function pointers help switch at run time to either Xen or Xenner
implementation.

This message is why I did not use static inline.
http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03125.html


So, I can go for multiple version of the header that defines the
static inlines functions, or just have a few define.
BTW, I think there are now only 4 functions with a different prototype
between old and new version of Xen. Other prototype change are only
the handler parameter, but a typedef handle it.

Regards,

-- 
Anthony PERARD

Re: [Qemu-devel] Memory Map

2011-02-25 Thread Blue Swirl

On Thu, Feb 24, 2011 at 11:08 AM, Salvatore Lionetti
salvatorelione...@yahoo.it wrote:
 Hi,

 This is what my board do

 cpu_register_physical_memory(0, 128*1024*1024, ...)
 cpu_register_physical_memory(0xFF80, 8*1024*1024, ...)

 and this layout does not change over the entire live (virtual) of the board.

 For the following offset (1st column) and size in bytes (2nd column)
 {0x00, 512},
 {0x000200, 16},
 {0x000300, 32},
 {0x000400, 32},
 {0x000500, 64},
 {0x000600, 64},
 {0x000700, 128},
 {0x000800, 30},
 {0x000900, 256},
 {0x000A00, 44},
 {0x000B00, 256},
 {0x000C00, 24},
 {0x000F00, 20},
 {0x001000, 20},
 {0x001100, 20},
 {0x001400, 168},
 {0x001800, 24},
 {0x002000, 4096},
 {0x003000, 24},
 {0x003100, 24},
 {0x004500, 36},
 {0x005000, 224},
 {0x008000, 768},
 {0x008300, 16},

 i do, for each item,

 a = cpu_register_io_memory(r, w, o, DEVICE_NATIVE_ENDIAN)
 cpu_register_physical_memory(_base+offset, len, a)

 And _base could be reprogrammed at any time. So before to change _base i:

 cpu_unregister_io_memory(a)

 What i see is that accessing to _base+
 _base+0x005000 = Wake up r/w with offset 0
 _base+0x000204 = Wake up r/w with offset 0x204

 So the question
 - Am i wrong something?

cpu_unregister_io_memory() is the counterpart of
cpu_register_io_memory(), it does not affect mappings created by
cpu_register_physical_memory(). They should be removed first.

 - Is possible to map address with last TARGET_PAGE_BITS (es 0x200) bits set?

Yes.

[Qemu-devel] Re: qemu compiling error on ppc64: kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr'

2011-02-25 Thread Dushyant Bansal


On Wednesday 16 February 2011 02:39 PM, Avi Kivity wrote:

On 02/15/2011 05:59 PM, Dushyant Bansal wrote:
2. How to configure makefiles to get output of printk statements 
inside kvm/arch/powerpc/kvm/trace.h
Better don't make them printks - just use the tracing framework. I'd 
write up a small howto here myself, but I'm pretty much on the jump 
to my plane for vacation. Avi, could you please guide him a bit on 
how to get data out of tracepoints?

   Thanks for the quick reply :)
I have added some more trace parameters in the tracing framework and 
currently, it is working fine.

1. Add new field in struct kvm_vcpu_stat (kvm_host.h)
2. Add corresponding entry in struct kvm_stats_debugfs_item 
debugfs_entries[] (book3s.c)

3. Increment or Decrement that field where ever necessary.


Those aren't tracepoints; they're deprecated debug statistics.

For tracepoints, see

  include/trace/events/kvm.h (general kvm tracepoints)
  arch/powerpc/kvm/trace.h (ppc specific tracepoints)
  arch/powerpc/kvm/book3s_mmu_hpte.c (examples of use, look for 
trace_kvm_*)

  Documentation/trace/tracepoints.txt (documentation, likely outdated)


Thanks a lot for the information.

Dushyant

Re: [Qemu-devel] Missing op on SPARC

2011-02-25 Thread Blue Swirl

On Thu, Feb 24, 2011 at 11:12 AM, 陳韋任 che...@iis.sinica.edu.tw wrote:
 Hi, all

  I have a Linux/SPARC machine and want to run QEMU on it.
 Here is the system information.

 --
 $ uname -a
 Linux sparc 2.6.37-rc5-git #1 SMP Tue Dec 21 17:03:53 CST 2010 sparc64 sun4v 
 UltraSparc T2 (Niagara2) GNU/Linux
 $ gcc --version
 gcc (Gentoo 4.3.4 p1.0, pie-10.1.5) 4.3.4
 --

  QEMU is configured with --sparc_cpu=v8plus. QEMU report
 there are some missing op definitions. See below,

 --
 $ qemu-sparc hello
 Missing op definition for qemu_ld64
 Missing op definition for qemu_st64
 /tmp/chenwj/qemu-0.14.0/tcg/tcg.c:1116: tcg fatal error
 Aborted
 --

  Is it possible to fix it? If so, how?

Yes, the place is in tcg/sparc/tcg-target.[ch].  Sparc generator for
TCG only implements the functions qemu_ld64/st64 on V9 (full 64 bit).
These should be implemented also for v8plus.

This can be implemented by adding a helper function to call the V9
versions of tcg_out_qemu_ld/st. One problem is that v8plus gives few
64 bit registers, %g1 to %g7, so addr_reg should probably be set up to
%g1 and data_reg to %g2 in the v8plus helper. Data and address must be
moved to/from these registers from/to 32 bit registers allocated by
TCG.

Re: [Qemu-devel] [PATCH] Split machine creation from the main loop

2011-02-25 Thread Blue Swirl

On Wed, Feb 23, 2011 at 11:38 PM, Anthony Liguori aligu...@us.ibm.com wrote:
 The goal is to enable the monitor to run independently of whether the machine
 has been created such that the monitor can be used to specify all of the
 parameters for machine initialization.

 Signed-off-by: Anthony Liguori aligu...@us.ibm.com

 diff --git a/vl.c b/vl.c
 index b436952..181cc77 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -1917,17 +1917,360 @@ static const QEMUOption *lookup_opt(int argc, char 
 **argv,
     return popt;
  }

 +static int qemu_machine_init(QEMUMachine *machine, const char 
 *kernel_filename,
 +                             const char *kernel_cmdline,
 +                             const char *initrd_filename,
 +                             const char *boot_devices, const char *cpu_model,
 +                             int snapshot, int tb_size, const char 
 *gdbstub_dev,
 +                             const char *loadvm, const char *incoming)

qemu_machine_init() would mix host state initialization and machine
initialization. I'd make instead two functions, qemu_host_init() and
qemu_machine_init(). For example parameters snapshot, tb_size,
gdbstub_dev, (maybe also loadvm and incoming if handled elsewhere) do
not change how the machine is initialized. Also KVM, drive, chardev
and display init should go to qemu_host_init() if possible.

Re: [Qemu-devel] Re: KVM call agenda for Jan 25

2011-02-25 Thread Dushyant Bansal


On Saturday 29 January 2011 04:20 PM, Dushyant Bansal wrote:

Or this: which is faster, qemu-img convert -fformat  -Oformat
src-image  dst-image  or cpsrc-image  dst-image?  What about for
raw images, shouldn't that be the same speed as cp(1)?  Poke around
the source code, profile it, understand what it's doing, think about
ways to improve it.  No need to do everything, just doing part of this
will give you background on QEMU's block layer.

Contributing patches is a good way get up to speed and show your
skills.  If time doesn't permit that, just think about the problem and
how you intend to solve it, and feel free to bounce ideas off me.
   
I explored 'qemu-img create and convert' and got a basic understanding 
of how they work.


cp faster than qemu-img convert

For raw-raw
In cp, it just copies all the disk blocks actually occupied by the file.
And, with qemu-img convert, it checks all the sectors and copy those, 
which contains atleast one non-NUL byte.
The better performance of cp over qemu-img convert is the result of 
overhead of this checking.


I tried a few variations:
1. just copy all the sectors without checking
So, actual size becomes equal to virtual size.
2. In is_allocated_sectors,out of n sectors, if any sector has a non-NUL 
byte then break and copy all n sectors.

As expected, resultant raw image was quite large in size.

Looking forward to your comments.

Thanks,
Dushyant

Re: [Qemu-devel] [PATCH v3 00/16] vnc: adapative tight, zrle, zywrle, and bitmap module

2011-02-25 Thread Blue Swirl

On Fri, Feb 25, 2011 at 12:43 AM, Corentin Chary
corentin.ch...@gmail.com wrote:
 Is there a special reason why you use __always_inline
 instead of inline in bitops.h?

 Because it's not only a hint, I really want this function to be inlined.

 This breaks compilation for mingw :-(

 mingw also fails at timersub() in vnc.c.

 Then we should defined timersub when not available.

There's also this one, struct timeval is missing:

 CCui/vnc-enc-zlib.o
In file included from /src/qemu/ui/vnc.c:27:0:
/src/qemu/ui/vnc.h:105:20: error: array type has incomplete element type
/src/qemu/ui/vnc.h:116:20: error: field 'last_freq_check' has incomplete type

Re: [Qemu-devel] [PATCH] Fixing network over sockets implementation for win32

2011-02-25 Thread Blue Swirl

Thanks, applied.

On Mon, Feb 21, 2011 at 1:46 PM, Pavel Dovgaluk
pavel.dovga...@ispras.ru wrote:
  MSDN includes the following in WSAEALREADY error description for connect()
 function: To preserve backward compatibility, this error is reported as
 WSAEINVAL to Winsock applications that link to either Winsock.dll or
 Wsock32.dll. So check of this error code was added to allow network
 connections through the sockets in Windows.


 Signed-off-by: Pavel Dovgalyuk pavel.dovga...@gmail.com
 ---
 net/socket.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/net/socket.c b/net/socket.c
 index 3182b37..7337f4f 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -457,7 +457,7 @@ static int net_socket_connect_init(VLANState *vlan,
             } else if (err == EINPROGRESS) {
                 break;
  #ifdef _WIN32
 -            } else if (err == WSAEALREADY) {
 +            } else if (err == WSAEALREADY || err == WSAEINVAL) {
                 break;
  #endif
             } else {

Re: [Qemu-devel] [PATCH] Fixing tap adapter for win32

2011-02-25 Thread Blue Swirl

Thanks, applied.

On Mon, Feb 21, 2011 at 1:47 PM, Pavel Dovgaluk
pavel.dovga...@ispras.ru wrote:
   This fix allows connection of internal VLAN to the external TAP interface.
 If tap_win32_write function always returns 0, the TAP network interface
 in QEMU is disabled.

 Signed-off-by: Pavel Dovgalyuk pavel.dovga...@gmail.com
 ---
 net/tap-win32.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/net/tap-win32.c b/net/tap-win32.c
 index 081904e..596132e 100644
 --- a/net/tap-win32.c
 +++ b/net/tap-win32.c
 @@ -480,7 +480,7 @@ static int tap_win32_write(tap_win32_overlapped_t 
 *overlapped,
         }
     }

 -    return 0;
 +    return write_size;
  }

  static DWORD WINAPI tap_win32_thread_entry(LPVOID param)

Re: [Qemu-devel] [PATCH] slirp: Remove some type casts caused by bad declaration of x.tp_buf

2011-02-25 Thread Blue Swirl

Thanks, applied.

On Wed, Feb 23, 2011 at 8:40 PM, Stefan Weil w...@mail.berlios.de wrote:
 x.tp_buf was declared as a uint8_t array, but always used as
 a char array (which needed a lot of type casts).

 The patch includes these changes:

 * Fix declaration of x.tp_buf and remove all type casts.

 * Use offsetof() to get the offset of x.tp_buf.

 Signed-off-by: Stefan Weil w...@mail.berlios.de
 ---
  slirp/tftp.c |   14 +++---
  slirp/tftp.h |    2 +-
  2 files changed, 8 insertions(+), 8 deletions(-)

 diff --git a/slirp/tftp.c b/slirp/tftp.c
 index 1821648..8055ccc 100644
 --- a/slirp/tftp.c
 +++ b/slirp/tftp.c
 @@ -136,9 +136,9 @@ static int tftp_send_oack(struct tftp_session *spt,
     m-m_data += sizeof(struct udpiphdr);

     tp-tp_op = htons(TFTP_OACK);
 -    n += snprintf((char *)tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %s,
 +    n += snprintf(tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %s,
                   key) + 1;
 -    n += snprintf((char *)tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %u,
 +    n += snprintf(tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %u,
                   value) + 1;

     saddr.sin_addr = recv_tp-ip.ip_dst;
 @@ -283,7 +283,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
 *tp, int pktlen)

   /* skip header fields */
   k = 0;
 -  pktlen -= ((uint8_t *)tp-x.tp_buf[0] - (uint8_t *)tp);
 +  pktlen -= offsetof(struct tftp_t, x.tp_buf);

   /* prepend tftp_prefix */
   prefix_len = strlen(slirp-tftp_prefix);
 @@ -299,7 +299,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
 *tp, int pktlen)
       tftp_send_error(spt, 2, Access violation, tp);
       return;
     }
 -    req_fname[k] = (char)tp-x.tp_buf[k];
 +    req_fname[k] = tp-x.tp_buf[k];
     if (req_fname[k++] == '\0') {
       break;
     }
 @@ -311,7 +311,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
 *tp, int pktlen)
     return;
   }

 -  if (strcasecmp((const char *)tp-x.tp_buf[k], octet) != 0) {
 +  if (strcasecmp(tp-x.tp_buf[k], octet) != 0) {
       tftp_send_error(spt, 4, Unsupported transfer mode, tp);
       return;
   }
 @@ -340,7 +340,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
 *tp, int pktlen)
   while (k  pktlen) {
       const char *key, *value;

 -      key = (const char *)tp-x.tp_buf[k];
 +      key = tp-x.tp_buf[k];
       k += strlen(key) + 1;

       if (k = pktlen) {
 @@ -348,7 +348,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
 *tp, int pktlen)
          return;
       }

 -      value = (const char *)tp-x.tp_buf[k];
 +      value = tp-x.tp_buf[k];
       k += strlen(value) + 1;

       if (strcasecmp(key, tsize) == 0) {
 diff --git a/slirp/tftp.h b/slirp/tftp.h
 index b9f0847..72e5e91 100644
 --- a/slirp/tftp.h
 +++ b/slirp/tftp.h
 @@ -26,7 +26,7 @@ struct tftp_t {
       uint16_t tp_error_code;
       uint8_t tp_msg[512];
     } tp_error;
 -    uint8_t tp_buf[512 + 2];
 +    char tp_buf[512 + 2];
   } x;
  };

 --
 1.7.2.3

Re: [Qemu-devel] [PATCH] bitops: fix test_and_change_bit()

2011-02-25 Thread Blue Swirl

Thanks, applied.

On Fri, Feb 25, 2011 at 12:47 AM, Corentin Chary corenti...@iksaif.net wrote:
 ./bitops.h:192: warning: ‘old’ is used uninitialized in this function

 Signed-off-by: Corentin Chary corenti...@iksaif.net
 ---
  bitops.h |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/bitops.h b/bitops.h
 index ae7bcb1..e2b9df3 100644
 --- a/bitops.h
 +++ b/bitops.h
 @@ -187,7 +187,7 @@ static inline int test_and_change_bit(int nr, volatile 
 unsigned long *addr)
  {
        unsigned long mask = BIT_MASK(nr);
        unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
 -       unsigned long old;
 +       unsigned long old = *p;

        *p = old ^ mask;
        return (old  mask) != 0;
 --
 1.7.4.1

Re: [Qemu-devel] [PATCH v3 00/16] vnc: adapative tight, zrle, zywrle, and bitmap module

2011-02-25 Thread Blue Swirl

On Fri, Feb 25, 2011 at 12:43 AM, Corentin Chary
corentin.ch...@gmail.com wrote:
 Is there a special reason why you use __always_inline
 instead of inline in bitops.h?

 Because it's not only a hint, I really want this function to be inlined.

I applied a patch which changes this to just inline. See osdep.h.

Re: [Qemu-devel] null mac address

2011-02-25 Thread Blue Swirl

On Fri, Feb 25, 2011 at 4:55 AM, Wen Congyang we...@cn.fujitsu.com wrote:
 At 02/24/2011 10:40 PM, William Dauchy Write:
 Hi,

 I got some troubles hot plugging network pci devices. An attach works
 as expected but the mac address is still set to 00:00:00:00:00:00 on
 the guest machine. I have to reboot the guest to get the correct mac
 address.
 I first tried through libvirt with:
 # virsh attach-interface dom0 network default --mac 52:54:00:f6:84:ba

 and then through qemu monitor to make sure that it wasn't a libvirt issue:
 device_add rtl8139
 or
 device_add rtl8139,mac=01:02:03:04:05:06

 Always the same result on the guest. A device info on qemu give the
 correct result, that is to say, with a correct mac address.
 I went through rtl8139.c and saw that the mac address is set in 
 `rtl8139_reset`.
 This function was called in `pci_rtl8139_init` but removed since
 c169998802505c244b8bcad562633f29de7d74a4 commit, because it doesn't
 make sense to call it when the virtual machine is shutdown.
 I'm now wondering where I am supposed to call this reset function when
 live attaching a pci device. I think it could fix the mac address
 issue.
 I will be very pleased to receive some tips to create a patch for this issue.

 Please try the following patch.

 Thanks
 Wen Congyang

 From efa0632f563a69dc299daaf4b235c1a0521d6e02 Mon Sep 17 00:00:00 2001
 From: Wen Congyang we...@cn.fujitsu.com
 Date: Fri, 25 Feb 2011 09:56:27 +0800
 Subject: [PATCH] move eeprom init from reset function to init function

 ---
  hw/pcnet-pci.c |   12 
  hw/pcnet.c     |   13 -
  hw/rtl8139.c   |   24 
  3 files changed, 24 insertions(+), 25 deletions(-)

 diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
 index 339a401..d7c4fc3 100644
 --- a/hw/pcnet-pci.c
 +++ b/hw/pcnet-pci.c
 @@ -270,6 +270,8 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
     PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev, pci_dev);
     PCNetState *s = d-state;
     uint8_t *pci_conf;
 +    int i;
 +    uint16_t checksum;

  #if 0
     printf(sizeof(RMD)=%d, sizeof(TMD)=%d\n,
 @@ -292,6 +294,16 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
     pci_conf[PCI_MIN_GNT] = 0x06;
     pci_conf[PCI_MAX_LAT] = 0xff;

 +    /* Initialize the PROM */
 +
 +    memcpy(s-prom, s-conf.macaddr.a, 6);
 +    s-prom[12] = s-prom[13] = 0x00;
 +    s-prom[14] = s-prom[15] = 0x57;
 +
 +    for (i = 0,checksum = 0; i  16; i++)

Please add braces to fix the CODING_STYLE problem while moving.

 +        checksum += s-prom[i];
 +    *(uint16_t *)s-prom[12] = cpu_to_le16(checksum);

This is not the right place, since lance.c uses the common part of
pcnet.c. Please put the lines instead to pcnet_common_init().

 +    // PCI vendor and device ID should be mirrored here

Also here it would be nice to convert C99 comments to C89 while moving.

Re: [Qemu-devel] [FYI] memory leak in 0.14.0rc1 ?

2011-02-25 Thread Torsten Förtsch

On Tuesday, February 15, 2011 21:16:49 Stefan Hajnoczi wrote:
 2011/2/15 Torsten Förtsch torsten.foert...@gmx.net:
  On Tuesday, February 15, 2011 15:43:32 Stefan Hajnoczi wrote:
   I have installed winxp and run the machine as /usr/bin/qemu-kvm -name
   xp.home -m 768 
  
  Are you able to try QEMU 0.14.0-rc2 from source?
  
  $ git clone git://git.qemu.org/qemu.git
  $ git checkout v0.14.0-rc2
  $ ./configure --target-list=x86_64-softmmu --enable-io-thread
  --disable-strip --prefix=/usr
  $ make
  $ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 768 -name xp.home ...
  
  Now, the process size stays around 1300 Mb and RSS is very constant at
  794 Mb.
 
 Thank you for checking this.  This is probably a Suse-specific or
 qemu-kvm issue.

Just for your information, it turns out that --enable-vnc-thread is the 
culprit, see

  https://bugzilla.novell.com/show_bug.cgi?id=671809

The method explained there (comment 4) also makes a 0.14.0 compiled from the 
sources and configured as

  ./configure --target-list=x86_64-softmmu \
  --enable-io-thread --enable-vnc-thread

grow.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net

Re: [Qemu-devel] [PATCH 2/3] target-arm: Implement cp15 VA-PA translation

2011-02-25 Thread Peter Maydell

On 21 February 2011 23:19, Adam Lackorzynski a...@os.inf.tu-dresden.de wrote:
 Implement VA-PA translations by cp15-c7 that went through unchanged
 previously.

 Signed-off-by: Adam Lackorzynski a...@os.inf.tu-dresden.de

Reviewed-by: Peter Maydell peter.mayd...@linaro.org

(Sorry for the delay, I only got time to knock up a test program
for this functionality this afternoon.)

Note that without the patch I posted today that cleans up
cp15 wfi decoding, you won't be able to get at one of
the translation types.

-- PMM

Re: [Qemu-devel] [FYI] memory leak in 0.14.0rc1 ?

2011-02-25 Thread Bruce Rogers

  On 2/25/2011 at 11:21 AM, Torsten Förtschtorsten.foert...@gmx.net wrote: 
 On Tuesday, February 15, 2011 21:16:49 Stefan Hajnoczi wrote:
 2011/2/15 Torsten Förtsch torsten.foert...@gmx.net:
  On Tuesday, February 15, 2011 15:43:32 Stefan Hajnoczi wrote:
   I have installed winxp and run the machine as /usr/bin/qemu-kvm -name
   xp.home -m 768 
  
  Are you able to try QEMU 0.14.0-rc2 from source?
  
  $ git clone git://git.qemu.org/qemu.git
  $ git checkout v0.14.0-rc2
  $ ./configure --target-list=x86_64-softmmu --enable-io-thread
  --disable-strip --prefix=/usr
  $ make
  $ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 768 -name xp.home ...
  
  Now, the process size stays around 1300 Mb and RSS is very constant at
  794 Mb.
 
 Thank you for checking this.  This is probably a Suse-specific or
 qemu-kvm issue.
 
 Just for your information, it turns out that --enable-vnc-thread is the 
 culprit, see
 
   https://bugzilla.novell.com/show_bug.cgi?id=671809
 
 The method explained there (comment 4) also makes a 0.14.0 compiled from the 
 
 sources and configured as
 
   ./configure --target-list=x86_64-softmmu \
   --enable-io-thread --enable-vnc-thread
 
 grow.
 
 Torsten Förtsch

I haven't played much in the vnc code, but the following patch at least gets 
rid of the leak.
I'm not sure if it's the correct solution. If someone more familiar with the 
vnc code wants
to look into this, that would be great:

diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
index 0b5d750..ebdba41 100644
--- a/ui/vnc-jobs-async.c
+++ b/ui/vnc-jobs-async.c
@@ -52,7 +52,6 @@ struct VncJobQueue {
 QemuCond cond;
 QemuMutex mutex;
 QemuThread thread;
-Buffer buffer;
 bool exit;
 QTAILQ_HEAD(, VncJob) jobs;
 };
@@ -171,10 +170,9 @@ static void vnc_async_encoding_start(VncState *orig, VncSta
te *local)
 local-tight = orig-tight;
 local-zlib = orig-zlib;
 local-hextile = orig-hextile;
-local-output =  queue-buffer;
 local-csock = -1; /* Don't do any network work on this thread */
 
-buffer_reset(local-output);
+buffer_free(local-output);
 }
 
 static void vnc_async_encoding_end(VncState *orig, VncState *local)
@@ -288,7 +286,6 @@ static void vnc_queue_clear(VncJobQueue *q)
 {
 qemu_cond_destroy(queue-cond);
 qemu_mutex_destroy(queue-mutex);
-buffer_free(queue-buffer);
 qemu_free(q);
 queue = NULL; /* Unset global queue */
 }


Bruce

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-25 Thread Paolo Bonzini

On 02/23/2011 12:39 PM, Jan Kiszka wrote:
 You should try to trace the event flow in qemu, either via strace, via
 the built-in tracer (which likely requires a bit more tracepoints), or
 via a system-level tracer (ftrace / kernelshark).

The apparent problem is that 25% of cycles is spent in mutex locking and
unlocking.  But in fact, the real problem is that 90% of the time is
spent doing something else than executing code.

QEMU exits _a lot_ due to the vm_clock timers.  The deadlines are rarely more
than a few ms ahead, and at 1 MIPS that leaves room for executing a few
thousand instructions for each context switch.  The iothread overhead
is what makes the situation so bad, because it takes a lot more time to
execute those instructions.

We do so many (almost) useless passes through cpu_exec_all that even
microoptimization helps, for example this:

--- a/cpus.c
+++ b/cpus.c
@@ -767,10 +767,6 @@ static void qemu_wait_io_event_common(CPUState *env)
 {
 CPUState *env;
 
-while (all_cpu_threads_idle()) {
-qemu_cond_timedwait(tcg_halt_cond, qemu_global_mutex, 1000);
-}
-
 qemu_mutex_unlock(qemu_global_mutex);
 
 /*
@@ -1110,7 +,15 @@ bool cpu_exec_all(void)
 }
 }
 exit_request = 0;
+
+#ifdef CONFIG_IOTHREAD
+while (all_cpu_threads_idle()) {
+   qemu_cond_timedwait(tcg_halt_cond, qemu_global_mutex, 1000);
+}
+return true;
+#else
 return !all_cpu_threads_idle();
+#endif
 }
 
 void set_numa_modes(void)

is enough to cut all_cpu_threads_idle from 9 to 4.5% (not unexpected: the
number of calls is halved).  But it shouldn't be that high anyway, so
I'm not proposing the patch formally.

Additionally, the fact that the execution is 99.99% lockstep means you cannot
really overlap any part of the I/O and VCPU threads.

I found a couple of inaccuracies in my patches that already cut 50% of the
time, though.

 Did my patches contribute a bit to overhead reduction? They specifically
 target the costly vcpu/iothread switches in TCG mode (caused by TCGs
 excessive lock-holding times).

Yes, they cut 15%.

Paolo

[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents

2011-02-25 Thread Michael Roth


On 02/24/2011 06:48 AM, Amit Shah wrote:

On (Wed) 23 Feb 2011 [08:31:52], Michael Roth wrote:

On 02/22/2011 10:59 PM, Amit Shah wrote:

On (Tue) 22 Feb 2011 [16:40:55], Michael Roth wrote:

If something in the guest is attempting to read/write from the
virtio-serial device, and nothing is connected to virtio-serial's
host character device (say, a socket)

1. writes will block until something connect()s, at which point the
write will succeed

2. reads will always return 0 until something connect()s, at which
point the reads will block until there's data

This makes it difficult (impossible?) to implement the notion of
connect/disconnect or open/close over virtio-serial without layering
another protocol on top using hackish things like length-encoded
payloads or sentinel values to determine the end of one
RPC/request/response/session and the start of the next.

For instance, if the host side disconnects, then reconnects before
we read(), we may never get the read()=0, and our FD remains valid.
Whereas with a tcp/unix socket our FD is no longer valid, and the
read()=0 is an event we can check for at any point after the other
end does a close/disconnect.


There's SIGIO support, so host connect-disconnect notifications can be
caught via the signal.


I recall looking into this at some pointbut don't we get a SIGIO
for read/write-ability in general?


I don't get you -- the virtio_console driver emits the SIGIO signal
only when the host side connects or disconnects.  See

http://www.linux-kvm.org/page/Virtio-serial_API

So whenever you receive a SIGIO, poll() in the signal handler for all
fds of interest and whichever has POLLIN set is writable.  Whichever
has POLLHUP set is not.  If you maintain previous state of the fd
(before signal), you can figure out if something happened on the host
side.



I tried this on RHEL6+rhn updates but the O_ASYNC flag doesn't seem to 
be supported. Has this been backported?


Either way, it seems we can still lose the disconnect event/poll state 
change if the host reconnects before the signal is delivered. So SIGIO 
in an application would need to be reserved for absolutely 2 things: a 
host connect or disconnect (distinguishing between the 2 may not be so 
important, we could treat either as the previous session having been 
closed). Which limits the application to only having 1 O_ASYNC FD open 
at a time.


But even if we do that, it seems like there might still be a small 
window where the application could read/write data intended for the 
previous connection before the signal handler is invoked. Not too sure 
on that point though. Assuming this isn't the case...it could work. But 
what about windows guests?



So you still need some way
differentiate, say, readability from a disconnect/EOF, and the
read()=0 that could determine this is still racing with host-side
reconnects.



Also, nonblocking reads/writes will return -EPIPE if the host-side
connection is not up.


But we still essentially need to poll() for a host-side disconnected
state, which is still racy since they may reconnect before we've
done a read/write that would've generated the -EPIPE. It seems like
what we really need is for the FD to be invalid from that point
forward.


This would go against (or abuse) a chardev interface.  It would
effectively treat a host-side port close as a hot-unplug event.



Well, not a complete hot-unplug. The port would still be there, we'd 
just need to re-open it after a read()=0


Personally I'm not necessarily advocating we change the default 
behavior, but couldn't we support this as a separate mode?


-device virtserialport,inv_fd_on_host_close=1

or something along that line?


Also, I focused more on the guest-side connect/disconnect detection,
but as Anthony mentioned I think the host side shares similar
limitations as well. AFAIK once we connect to the chardev that FD
remains valid until the connected process closes it, and so races
with the guest side on detecting connect/disconnect events in a
similar manner. For the host side it looks like virtio-console has
guest_close/guest_open callbacks already that we could potentially
use...seems like it's just a matter of tying them to the chardev...
basically having virtio-serial's guest_close() result in a close()
on the corresponding chardev connection's FD.


Yes, this could be used.

However, the problem with that will be that the chardev can't be
opened again (AFAIR) and a new chardev will have to be used.



Hmm...yeah I was thinking more specifically about the socket chardev, 
where we can leave the listen_fd alone but close anything we've 
accept()'d prior to a guest-side disconnect. But isn't that enough? Just 
add this option for chardevs where this actually makes sense? For instance:


-chardev socket,inv_fd_on_guest_close=1

Although, this wouldn't make sense if we're using the chardev for 
anything other than virtio-serial...so that flag makes more sense as a 
virtio-serial flagbut that

[Qemu-devel] x86_64 debugging while in 32-bit mode

2011-02-25 Thread vagran


Hi,
I have a problem with debugging 64-bit emulation using Qemu GDB stub. The
problem is that Qemu always sends x86_64 registers set disregarding current
actual mode of an emulated CPU. It results in error message in GDB - 
Remote 'g'
packet reply is too long:  Yes, I understand that in case I will 
execute
set architecture i386:x86-64:intel command it will show me correct 
registers

content. But the problem is that in such case it will incorrectly try to
disassemble the code and unwind the stack - it will interpret it as 
64-bit while
it is actually 32-bit. In my understanding Qemu should dynamically 
change the
format of g and G packets depending on current CPU mode. On the 
other end,

user could change manually GDB current architecture by corresponding set
architecture command.
Please correct me, if I am not right. May be there is some existing 
methodology
of debugging Qemu emulated x86_64 architecture in different CPU modes. 
For now,
I have strong intention to make a patch for Qemu GDB stub, at least for 
me. But

I have impression that this should be corrected in official release too.

--
Best regards,
Artyom.

Re: [Qemu-devel] [PATCH] Use sigwait instead of sigwaitinfo.

2011-02-25 Thread Blue Swirl

Thanks, applied.

On Fri, Feb 18, 2011 at 3:17 PM, Tristan Gingold ging...@adacore.com wrote:
 Fix compilation failure on Darwin.

 Signed-off-by: Tristan Gingold ging...@adacore.com
 ---
  compatfd.c |   36 ++--
  1 files changed, 18 insertions(+), 18 deletions(-)

 diff --git a/compatfd.c b/compatfd.c
 index a7cebc4..bd377c4 100644
 --- a/compatfd.c
 +++ b/compatfd.c
 @@ -26,45 +26,45 @@ struct sigfd_compat_info
  static void *sigwait_compat(void *opaque)
  {
     struct sigfd_compat_info *info = opaque;
 -    int err;
     sigset_t all;

     sigfillset(all);
     sigprocmask(SIG_BLOCK, all, NULL);

 -    do {
 -        siginfo_t siginfo;
 +    while (1) {
 +        int sig;
 +        int err;

 -        err = sigwaitinfo(info-mask, siginfo);
 -        if (err == -1  errno == EINTR) {
 -            err = 0;
 -            continue;
 -        }
 -
 -        if (err  0) {
 -            char buffer[128];
 +        err = sigwait(info-mask, sig);
 +        if (err != 0) {
 +            if (errno == EINTR) {
 +                continue;
 +            } else {
 +                return NULL;
 +            }
 +        } else {
 +            struct qemu_signalfd_siginfo buffer;
             size_t offset = 0;

 -            memcpy(buffer, err, sizeof(err));
 +            memset(buffer, 0, sizeof(buffer));
 +            buffer.ssi_signo = sig;
 +
             while (offset  sizeof(buffer)) {
                 ssize_t len;

 -                len = write(info-fd, buffer + offset,
 +                len = write(info-fd, (char *)buffer + offset,
                             sizeof(buffer) - offset);
                 if (len == -1  errno == EINTR)
                     continue;

                 if (len = 0) {
 -                    err = -1;
 -                    break;
 +                    return NULL;
                 }

                 offset += len;
             }
         }
 -    } while (err = 0);
 -
 -    return NULL;
 +    }
  }

  static int qemu_signalfd_compat(const sigset_t *mask)
 --
 1.7.3.GIT

[Qemu-devel] Re: [PATCH] target-arm: Don't decode old cp15 WFI instructions on v7 cores

2011-02-25 Thread Adam Lackorzynski


On Fri Feb 25, 2011 at 15:04:12 +, Peter Maydell wrote:
 In v7 of the ARM architecture, WFI (wait for interrupt) is a first-class
 instruction, but in previous versions this functionality was provided
 via a cp15 coprocessor register. Add correct feature checks to the
 decoding of the cp15 WFI instructions so that they behave correctly
 for newer cores. In particular, the old 0,c7,c8,2 encoding used on
 ARM940 has been reused for VA-to-PA translation in v6 and v7.
 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Reviewed-by: Adam Lackorzynski a...@os.inf.tu-dresden.de

 ---
 This patch stands alone as a fix to target-arm; it's a prerequisite
 for Adam's VA-PA translation patch, because otherwise attempting a
 user-read translation will get you a WFI instead...

Thanks, (un)fortunately I never triggered this case in my setup.



Adam
-- 
Adam a...@os.inf.tu-dresden.de
  Lackorzynski http://os.inf.tu-dresden.de/~adam/

[Qemu-devel] [PATCH] vnc: fix a memory leak in threaded vnc server

2011-02-25 Thread Corentin Chary

VncJobQueue's buffer is intended to be used for
as the output buffer for all operations in this queue,
but unfortunatly.

vnc_async_encoding_start() is in charge of setting this
buffer as the current output buffer, but
vnc_async_encoding_end() was not writting the changes back
to VncJobQueue, resulting in a big and ugly memleak.

Signed-off-by: Corentin Chary corenti...@iksaif.net
---
I believe this is a (slightly) better patch than Bruce's one, because
it reduce memory allocations by using always the same buffer.

 ui/vnc-jobs-async.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
index 1d4c5e7..f596247 100644
--- a/ui/vnc-jobs-async.c
+++ b/ui/vnc-jobs-async.c
@@ -186,6 +186,8 @@ static void vnc_async_encoding_end(VncState *orig, VncState 
*local)
 orig-hextile = local-hextile;
 orig-zrle = local-zrle;
 orig-lossy_rect = local-lossy_rect;
+
+queue-buffer = local-output;
 }
 
 static int vnc_worker_thread_loop(VncJobQueue *queue)
-- 
1.7.4

Re: [Qemu-devel] [PATCH] Outdated comment in HACKING

2011-02-25 Thread Anthony Liguori

This patch won't apply with git-am because your mailer is doing weird 
things.  Please use git-send-email to send the patch.


Regards,

Anthony Liguori

On 02/24/2011 06:27 PM, Joey Trebbien wrote:
All printf-style functions in the source (except for a few in tests/) 
already have a format __attribute__ (via the GCC_ATTR or GCC_FMT_ATTR 
macros).


Signed-off-by:  Joseph Trebbien jtrebb...@gmail.com 
mailto:jtrebb...@gmail.com

---
HACKING | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)


diff --git a/HACKING b/HACKING
index 6ba9d7e..3af53fd 100644
--- a/HACKING
+++ b/HACKING
@@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype.
This makes it so gcc's -Wformat and -Wformat-security options can do
their jobs and cross-check format strings with the number and types
of arguments.
-
-Currently many functions in QEMU are not following this rule but
-patches to add the attribute would be very much appreciated.

[Qemu-devel] [RESENT][PATCH] HACKING: Update status of format checking

2011-02-25 Thread Stefan Weil


This patch was already sent on 2011-01-24:

Hopefully all functions with printf like arguments now use format checking.

This was tested with default build configuration on linux
and windows hosts (including some cross compilations),
so chances are good that there remain few (if any) functions
without format checking.

Therefore the last comment in HACKING is no longer valid but misleading.

Cc: Blue Swirl blauwir...@gmail.com
Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 HACKING |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/HACKING b/HACKING
index 6ba9d7e..3af53fd 100644
--- a/HACKING
+++ b/HACKING
@@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype.
 This makes it so gcc's -Wformat and -Wformat-security options can do
 their jobs and cross-check format strings with the number and types
 of arguments.
-
-Currently many functions in QEMU are not following this rule but
-patches to add the attribute would be very much appreciated.
--
1.7.2.3

Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject

2011-02-25 Thread Anthony Liguori


On 02/25/2011 03:54 AM, Markus Armbruster wrote:

Anthony Liguorialigu...@linux.vnet.ibm.com  writes:

   

On 02/24/2011 10:20 AM, Markus Armbruster wrote:
 

Anthony Liguorialigu...@linux.vnet.ibm.com   writes:


   

On 02/24/2011 02:33 AM, Markus Armbruster wrote:

 

Anthony Liguorianth...@codemonkey.wswrites:
   

[...]
   

Please describe all expected errors.


 

Quoting qmp-commands.hx:

   3. Errors, in special, are not documented. Applications should NOT check
  for specific errors classes or data (it's strongly recommended to only
  check for the error key)

Indeed, not a single error is documented there.  This is intentional.


   

Yeah, but we're not 0.14 anymore and for 0.15, we need to document
errors.  If you are suggesting I send a patch to remove that section,
I'm more than happy to.

 

Two separate issues here: 1. Are we ready to commit to the current
design of errors, and 2. Is it fair to reject Lai's patch now because he
doesn't document his errors.

I'm not commenting on 1. here.

Regarding 2.: rejecting a patch because it doesn't document an aspect
that current master intentionally leaves undocumented is not how you
treat contributors.  At least not if you want any other than certified
masochists who enjoy pain, and professionals who get adequately
compensated for it.

Lead by example, not by fiat.

   

http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json

I am in the process of documenting the errors of every command.  It's
a royal pain but I'm going to document everything we have right now.
It's actually the last bit of work I need to finish before sending
QAPI out.

So for new commands being added, it would be hugely helpful for the
authors to document the errors such that I don't have to reverse
engineer all of the possible error conditions.
 

The moment this lands in master, you can begin to demand error
descriptions from contributors.  Until then, I'll NAK error descriptions
in qmp-commands.txt.  We left them undocumented there for good reasons:
   


No, it was always a bad reason.  Good documentation is necessary to 
build good commands.  Errors are a huge part of the semantics of a 
command.  We cannot properly assess a command unless it's behavior in 
error conditions is well defined.



Once we have an error design in place that has a reasonable hope to
stand the test of time, and have errors documented for at least some of
the commands here, we can start to require proper error documentation
for new commands.  But not now.
   

I won't NAK non-normative error descriptions, say in commit messages, or
in comments.  And I won't object to you asking for them.  But I feel you
really shouldn't make it a condition for committing patches.  Especially
not for simple patches that have been on list for months.
   


I'm strongly committed to making QMP a first class interface in QEMU for 
0.15.  I feel documentation is a crucial part to making that happen.


I'm not asking for test cases even though that's something that we'll 
need for 0.15 because there's not enough infrastructure in master yet to 
do that in a reasonable way.  I realize I'm likely going to have to 
write that test case and I'm happy to do that.


But there's no reason that we shouldn't require thorough documentation 
for all new QMP commands moving forward including error semantics.  This 
is a critical part of having a first class API and no additional 
infrastructure is needed in master to do this.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Outdated comment in HACKING

2011-02-25 Thread Stefan Weil


Am 25.02.2011 23:08, schrieb Anthony Liguori:
This patch won't apply with git-am because your mailer is doing weird 
things.  Please use git-send-email to send the patch.


Regards,

Anthony Liguori

On 02/24/2011 06:27 PM, Joey Trebbien wrote:
All printf-style functions in the source (except for a few in tests/) 
already have a format __attribute__ (via the GCC_ATTR or GCC_FMT_ATTR 
macros).


Signed-off-by:  Joseph Trebbien jtrebb...@gmail.com 
mailto:jtrebb...@gmail.com

---
HACKING | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)


diff --git a/HACKING b/HACKING
index 6ba9d7e..3af53fd 100644
--- a/HACKING
+++ b/HACKING
@@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype.
This makes it so gcc's -Wformat and -Wformat-security options can do
their jobs and cross-check format strings with the number and types
of arguments.
-
-Currently many functions in QEMU are not following this rule but
-patches to add the attribute would be very much appreciated.





Hi Anthony,

the same patch is on my list of missing patches which I had sent
weeks ago, so no need for Joey to resent his patch.

I'll resend my version.

Regards,
Stefan W.

[Qemu-devel] [PATCH 10/26] FVD: add impl of interface bdrv_file_open()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_file_open() interface.
It supports openning an FVD image.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-journal.c  |6 +
 block/fvd-open.c |  445 +-
 block/fvd-prefetch.c |   17 ++
 block/fvd.c  |1 +
 4 files changed, 468 insertions(+), 1 deletions(-)
 create mode 100644 block/fvd-prefetch.c

diff --git a/block/fvd-journal.c b/block/fvd-journal.c
index 246f425..5ba34bd 100644
--- a/block/fvd-journal.c
+++ b/block/fvd-journal.c
@@ -22,6 +22,12 @@ static inline int64_t calc_min_journal_size(int64_t 
table_entries)
 return 512;
 }
 
+static int init_journal(int read_only, BlockDriverState * bs,
+FvdHeader * header)
+{
+return -ENOTSUP;
+}
+
 void fvd_emulate_host_crash(bool cond)
 {
 emulate_host_crash = cond;
diff --git a/block/fvd-open.c b/block/fvd-open.c
index 056b994..8caf8d3 100644
--- a/block/fvd-open.c
+++ b/block/fvd-open.c
@@ -11,7 +11,450 @@
  *
  */
 
+static void init_prefetch_timer(BlockDriverState * bs, BDRVFvdState * s);
+static int init_data_file(BDRVFvdState * s, FvdHeader * header, int flags);
+static int init_bitmap(BlockDriverState * bs, BDRVFvdState * s,
+   FvdHeader * header, const char *const filename);
+static int load_table(BDRVFvdState * s, FvdHeader * header,
+  const char *const filename);
+static int init_journal(int read_only, BlockDriverState * bs,
+FvdHeader * header);
+static int init_compact_image(BDRVFvdState * s, FvdHeader * header,
+  const char *const filename);
+
 static int fvd_open(BlockDriverState * bs, const char *filename, int flags)
 {
-return -ENOTSUP;
+BDRVFvdState *s = bs-opaque;
+int ret;
+FvdHeader header;
+BlockDriver *drv;
+int i;
+
+const char *protocol = strchr(filename, ':');
+if (protocol) {
+drv = bdrv_find_protocol(filename);
+filename = protocol + 1;
+} else {
+/* Use raw instead of file to allow storing the image on device. */
+drv = bdrv_find_format(raw);
+if (!drv) {
+fprintf(stderr, Failed to find the block device driver\n);
+return -EINVAL;
+}
+}
+
+s-fvd_metadata = bdrv_new();
+ret = bdrv_open(s-fvd_metadata, filename, flags, drv);
+if (ret  0) {
+fprintf(stderr, Failed to open %s\n, filename);
+return ret;
+}
+
+/* Initialize so that jumping to 'fail' would do cleanup properly. */
+s-stale_bitmap = NULL;
+s-fresh_bitmap = NULL;
+s-table = NULL;
+s-outstanding_copy_on_read_data = 0;
+QLIST_INIT(s-write_locks);
+QLIST_INIT(s-copy_locks);
+s-prefetch_acb = NULL;
+s-add_storage_cmd = NULL;
+#ifdef FVD_DEBUG
+s-total_copy_on_read_data = s-total_prefetch_data = 0;
+#endif
+
+if (bdrv_pread(s-fvd_metadata, 0, header, sizeof(header)) !=
+sizeof(header)) {
+fprintf(stderr, Failed to read the header of %s\n, filename);
+ret = -EIO;
+goto fail;
+}
+
+fvd_header_le_to_cpu(header);
+
+if (header.magic != FVD_MAGIC) {
+fprintf(stderr, Incorrect magic number in header: %0X\n,
+header.magic);
+ret = -EINVAL;
+goto fail;
+}
+
+/* Check incompatible features. */
+for (i = 0; i  INCOMPATIBLE_FEATURES_SPACE; i++) {
+if (header.incompatible_features[i] != 0) {
+fprintf(stderr, The image was created by FVD version %d 
+ and uses features not supported by this FVD version 
%d\n,
+header.create_version, FVD_VERSION);
+ret = -ENOTSUP;
+}
+}
+
+if (header.virtual_disk_size % 512 != 0) {
+fprintf(stderr, Disk size % PRId64  in the header of %s is not 
+a multple of 512.\n, header.virtual_disk_size, filename);
+ret = -EINVAL;
+goto fail;
+}
+
+/* Initialize the fields of BDRVFvdState. */
+s-chunks_relocated = header.chunks_relocated;
+s-dirty_image = false;
+s-metadata_err_prohibit_write = false;
+s-block_size = header.block_size / 512;
+s-bitmap_size = header.bitmap_size;
+s-prefetch_timer = NULL;
+s-sectors_per_prefetch = (header.bytes_per_prefetch + 511) / 512;
+s-prefetch_throttle_time = header.prefetch_throttle_time;
+s-prefetch_read_throughput_measure_time =
+header.prefetch_read_throughput_measure_time;
+s-prefetch_write_throughput_measure_time =
+header.prefetch_write_throughput_measure_time;
+
+/* Convert KB/s to bytes/millisec. */
+s-prefetch_min_read_throughput =
+((double)header.prefetch_min_read_throughput) * 1024.0 / 1000.0;
+s-prefetch_min_write_throughput =
+((double)header.prefetch_min_write_throughput)

[Qemu-devel] [PATCH 20/26] FVD: add impl of interface bdrv_get_info()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_get_info() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-misc.c |   98 +-
 1 files changed, 97 insertions(+), 1 deletions(-)

diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index a42bfac..c515d74 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -11,6 +11,7 @@
  *
  */
 
+static int read_fvd_header(BDRVFvdState * s, FvdHeader * header);
 static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb);
 static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb);
 static void fvd_aio_cancel_read(FvdAIOCB * acb);
@@ -95,7 +96,102 @@ static int fvd_is_allocated(BlockDriverState * bs, int64_t 
sector_num,
 
 static int fvd_get_info(BlockDriverState * bs, BlockDriverInfo * bdi)
 {
-return -ENOTSUP;
+BDRVFvdState *s = bs-opaque;
+FvdHeader header;
+
+if (read_fvd_header(s, header)  0) {
+return -1;
+}
+
+printf(= Begin of FVD specific information ==\n);
+printf(magic\t\t\t\t\t\t%0X\n, header.magic);
+printf(header_size\t\t\t\t\t%d\n, header.header_size);
+printf(create_version\t\t\t\t\t%d\n, header.create_version);
+printf(last_open_version\t\t\t\t%d\n, header.last_open_version);
+printf(virtual_disk_size (bytes)\t\t\t% PRId64 \n,
+   header.virtual_disk_size);
+printf(disk_metadata_size (bytes)\t\t\t% PRId64 \n, 
header.data_offset);
+if (header.data_file[0]) {
+printf(data_file\t\t\t\t\t%s\n, header.data_file);
+}
+if (header.data_file_fmt[0]) {
+printf(data_file_fmt\t\t\t\t\t%s\n, header.data_file_fmt);
+}
+
+if (header.table_offset  0) {
+printf(table_size (bytes)\t\t\t\t% PRId64 \n, header.table_size);
+printf(avail_storage (bytes)\t\t\t\t% PRId64 \n,
+   s-avail_storage * 512);
+printf(chunk_size (bytes)\t\t\t\t% PRId64 \n, header.chunk_size);
+printf(used_chunks (bytes)\t\t\t\t% PRId64 \n,
+   s-used_storage * 512);
+printf(storage_grow_unit (bytes)\t\t\t% PRId64 \n,
+   header.storage_grow_unit);
+printf(table_offset (bytes)\t\t\t\t% PRId64 \n,
+   header.table_offset);
+printf(table_size (bytes)\t\t\t\t% PRId64 \n, s-table_size);
+printf(chunks_relocated\t\t\t\t%s\n, BOOL(s-chunks_relocated));
+
+if (header.add_storage_cmd[0] != 0) {
+printf(add_storage_cmd\t\t\t\t\t%s\n, header.add_storage_cmd);
+}
+}
+
+printf(clean_shutdown\t\t\t\t\t%s\n, BOOL(header.clean_shutdown));
+if (header.journal_size  0) {
+printf(journal_offset\t\t\t\t\t% PRId64 \n, header.journal_offset);
+printf(journal_size\t\t\t\t\t% PRId64 \n, header.journal_size);
+printf(stable_journal_epoch\t\t\t\t% PRId64 \n,
+   header.stable_journal_epoch);
+printf(journal_buf_size (bytes)\t\t\t% PRId64 \n,
+   header.journal_buf_size);
+printf(journal_clean_buf_period (ms)\t\t\t% PRId64 \n,
+   header.journal_clean_buf_period);
+}
+
+if (header.base_img[0] != 0) {
+printf(base_img_fully_prefetched\t\t\t%s\n,
+   BOOL(header.base_img_fully_prefetched));
+printf(base_img\t\t\t\t\t%s\n, header.base_img);
+if (header.base_img_fmt[0]) {
+printf(base_img_fmt\t\t\t\t\t%s\n, header.base_img_fmt);
+}
+printf(base_img_size (bytes)\t\t\t\t% PRId64 \n,
+   header.base_img_size);
+printf(bitmap_offset (bytes)\t\t\t\t% PRId64 \n,
+   header.bitmap_offset);
+printf(bitmap_size (bytes)\t\t\t\t% PRId64 \n, header.bitmap_size);
+printf(block_size\t\t\t\t\t% PRId64 \n, header.block_size);
+printf(copy_on_read\t\t\t\t\t%s\n, BOOL(header.copy_on_read));
+printf(max_outstanding_copy_on_read_data (bytes)\t% PRId64 \n,
+   header.max_outstanding_copy_on_read_data);
+printf(need_zero_init\t\t\t\t\t%s\n, BOOL(header.need_zero_init));
+printf(prefetch_start_delay (sec)\t\t\t% PRId64 \n,
+   header.prefetch_start_delay);
+printf(num_prefetch_slots\t\t\t\t%d\n, header.num_prefetch_slots);
+printf(bytes_per_prefetch\t\t\t\t% PRIu64 \n,
+   header.bytes_per_prefetch);
+printf(prefetch_over_threshold_throttle_time (ms)\t% PRIu64 \n,
+   header.prefetch_throttle_time);
+printf(prefetch_read_throughput_measure_time (ms)\t% PRIu64 \n,
+   header.prefetch_read_throughput_measure_time);
+printf(prefetch_write_throughput_measure_time (ms)\t% PRIu64 \n,
+   header.prefetch_write_throughput_measure_time);
+printf(prefetch_min_read_throughput (KB/s)\t\t% PRIu64 \n,
+   header.prefetch_min_read_throughput);
+

[Qemu-devel] [PATCH 24/26] FVD: add impl of interface bdrv_has_zero_init()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_has_zero_init() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-misc.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index 766b62b..61e39bb 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -341,5 +341,12 @@ static int fvd_get_info(BlockDriverState * bs, 
BlockDriverInfo * bdi)
 
 static int fvd_has_zero_init(BlockDriverState * bs)
 {
-return 0;
+BDRVFvdState *s = bs-opaque;
+
+/* For a non-compact image, chunks_relocated is always false. For a
+ * compact image with chunks_relocated=true, it can no longer guarantee
+ * zero init even if the file system does that. This is because a partialy
+ * written chunk X may be relocated to a location previously used by
+ * another chunk Y and some garbage data are left there by Y. */
+return s-chunks_relocated ? 0 : bdrv_has_zero_init(s-fvd_data);
 }
-- 
1.7.0.4

[Qemu-devel] [PATCH 08/26] FVD: add debugging utilities

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds some debugging utilities to FVD.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/blksim.c  |7 +-
 block/fvd-debug.c   |  369 +++
 block/fvd-ext.h |   71 ++
 block/fvd-journal.c |   23 +++
 block/fvd.c |2 +
 block/fvd.h |1 +
 qemu-io-auto.c  |   17 ++-
 7 files changed, 478 insertions(+), 12 deletions(-)
 create mode 100644 block/fvd-debug.c
 create mode 100644 block/fvd-ext.h
 create mode 100644 block/fvd-journal.c

diff --git a/block/blksim.c b/block/blksim.c
index 5c7ef43..16e44ee 100644
--- a/block/blksim.c
+++ b/block/blksim.c
@@ -19,12 +19,7 @@
 #include qemu-queue.h
 #include qemu-common.h
 #include block/blksim.h
-
-#if 1
-# define QDEBUG(format,...) do {} while (0)
-#else
-# define QDEBUG printf
-#endif
+#include block/fvd-ext.h
 
 typedef enum
 {
diff --git a/block/fvd-debug.c b/block/fvd-debug.c
new file mode 100644
index 000..36b4c43
--- /dev/null
+++ b/block/fvd-debug.c
@@ -0,0 +1,369 @@
+/*
+ * QEMU Fast Virtual Disk Format Debugging Utilities
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef ENABLE_TRACE_IO
+#define TRACE_REQUEST(...) do {} while (0)
+#define TRACE_STORE_IN_FVD(...) do {} while (0)
+
+#else
+
+static void TRACE_REQUEST(int do_write, int64_t sector_num, int nb_sectors)
+{
+if (do_write) {
+QDEBUG(TRACE_REQUEST: write sector_num=% PRId64
+nb_sectors=%d[ , sector_num, nb_sectors);
+} else {
+QDEBUG(TRACE_REQUEST: read  sector_num=% PRId64  nb_sectors=%d
+   [ , sector_num, nb_sectors);
+}
+
+int64_t end = sector_num + nb_sectors;
+int64_t sec;
+for (sec = sector_num; sec  end; sec++) {
+QDEBUG(sec% PRId64  , sec);
+}
+QDEBUG( ]\n);
+}
+
+static void TRACE_STORE_IN_FVD(const char *str, int64_t sector_num,
+   int nb_sectors)
+{
+QDEBUG(TRACE_STORE: %s sector_num=% PRId64  nb_sectors=%d   [ ,
+   str, sector_num, nb_sectors);
+int64_t end = sector_num + nb_sectors;
+int64_t sec;
+for (sec = sector_num; sec  end; sec++) {
+QDEBUG(sec% PRId64  , sec);
+}
+QDEBUG( ]\n);
+}
+#endif
+
+#ifndef FVD_DEBUG
+#define my_qemu_malloc qemu_malloc
+#define my_qemu_mallocz qemu_mallocz
+#define my_qemu_blockalign qemu_blockalign
+#define my_qemu_free qemu_free
+#define my_qemu_vfree qemu_vfree
+#define my_qemu_aio_get qemu_aio_get
+#define my_qemu_aio_release qemu_aio_release
+#define COPY_UUID(to,from) do {} while (0)
+
+#else
+FILE *__fvd_debug_fp;
+static unsigned long long int fvd_uuid = 1;
+static int64_t pending_qemu_malloc = 0;
+static int64_t pending_qemu_aio_get = 0;
+static int64_t pending_local_writes = 0;
+static const char *alloc_file;
+static int alloc_line;
+
+#define my_qemu_malloc(size) \
+((void*)(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_malloc(size)))
+
+#define my_qemu_mallocz(size) \
+((void*)(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_mallocz(size)))
+
+#define my_qemu_blockalign(bs,size) \
+((void*)(alloc_file=__FILE__, \
+ alloc_line=__LINE__, \
+ _my_qemu_blockalign(bs,size)))
+
+#define my_qemu_aio_get(pool,bs,cb,op) \
+((void*)(alloc_file=__FILE__, \
+ alloc_line=__LINE__, \
+ _my_qemu_aio_get(pool,bs,cb,op)))
+
+#define my_qemu_free(p) \
+(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_free(p))
+
+#define my_qemu_vfree(p) \
+(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_vfree(p))
+
+static void COPY_UUID(FvdAIOCB * to, FvdAIOCB * from)
+{
+if (from) {
+to-uuid = from-uuid;
+FVD_DEBUG_ACB(to);
+}
+}
+
+#ifdef DEBUG_MEMORY_LEAK
+#define MAX_TRACER 10485760
+static int alloc_tracer_used = 1;   /* slot 0 is not used. */
+static void **alloc_tracers = NULL;
+
+static void __attribute__ ((constructor)) init_mem_alloc_tracers(void)
+{
+if (!alloc_tracers) {
+alloc_tracers = qemu_mallocz(sizeof(void *) * MAX_TRACER);
+}
+}
+
+static void trace_alloc(void *p, size_t size)
+{
+alloc_tracer_t *t = p;
+t-magic = FVD_ALLOC_MAGIC;
+t-alloc_file = alloc_file;
+t-alloc_line = alloc_line;
+t-size = size;
+
+if (alloc_tracer_used  MAX_TRACER) {
+t-alloc_tracer = alloc_tracer_used++;
+alloc_tracers[t-alloc_tracer] = t;
+QDEBUG(Allocate memory using tracer%d in %s on line %d.\n,
+   t-alloc_tracer, alloc_file, alloc_line);
+} else {
+t-alloc_tracer = 0;
+}
+
+/* Set header and footer to detect out-of-range writes. */
+if (size != (size_t) - 1) {
+uint8_t *q = (uint8_t *) p;
+

[Qemu-devel] [PATCH 16/26] FVD: add impl for buffered journal updates

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch enhances FVD's journal with the capability of buffering
multiple metadata updates and sending them to the journal in a single write.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-journal-buf.c |  336 ++-
 1 files changed, 333 insertions(+), 3 deletions(-)

diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c
index 3efdd47..b4077ce 100644
--- a/block/fvd-journal-buf.c
+++ b/block/fvd-journal-buf.c
@@ -20,15 +20,345 @@
  * case for cache!=writethrough.
  
**/
 
+static inline int bjnl_write_buf(FvdAIOCB *acb);
+static void bjnl_send_current_buf_to_write_queue(BlockDriverState *bs);
+
+static inline void bjnl_finish_write_buf(FvdAIOCB *acb, int ret)
+{
+ASSERT (acb-type == OP_BJNL_BUF_WRITE);
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+
+QDEBUG(JOURNAL: bjnl_finish_write_buf acb%llu-%p\n, acb-uuid, acb);
+
+my_qemu_vfree(acb-jcb.iov.iov_base);
+QTAILQ_REMOVE(s-bjnl.queued_bufs, acb, jcb.bjnl_next_queued_buf);
+my_qemu_aio_release(acb);
+
+if (ret != 0) {
+s-metadata_err_prohibit_write = true;
+}
+}
+
+static inline void bjnl_write_next_buf(BDRVFvdState *s)
+{
+FvdAIOCB *acb;
+while ((acb = QTAILQ_FIRST(s-bjnl.queued_bufs))) {
+if (bjnl_write_buf(acb) == 0) {
+return;
+}
+}
+}
+
+static inline void bjnl_aio_flush_cb(void *opaque, int ret)
+{
+FvdAIOCB *acb = (FvdAIOCB *) opaque;
+
+if (acb-cancel_in_progress) {
+return;
+}
+
+QDEBUG(JOURNAL: bjnl_aio_flush_cb acb%llu-%p\n, acb-uuid, acb);
+
+/* Invoke the callback initially provided to fvd_aio_flush(). */
+acb-common.cb(acb-common.opaque, ret);
+my_qemu_aio_release(acb);
+}
+
+static inline void bjnl_write_buf_cb(void *opaque, int ret)
+{
+FvdAIOCB *acb = (FvdAIOCB *) opaque;
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+
+if (acb-cancel_in_progress) {
+return;
+}
+
+QDEBUG(JOURNAL: bjnl_write_buf_cb acb%llu-%p\n, acb-uuid, acb);
+bjnl_finish_write_buf(acb, ret);
+bjnl_write_next_buf(s);
+}
+
+#ifndef ENABLE_QDEBUG
+#  define PRINT_JRECORDS(buf,len) do{}while(0)
+#else
+static void print_jrecords(const uint8_t *buf, size_t len);
+#  define PRINT_JRECORDS print_jrecords
+#endif
+
+static int bjnl_write_buf_start(FvdAIOCB *acb)
+{
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+int64_t journal_sec;
+int nb_sectors = acb-jcb.iov.iov_len / 512;
+int ret;
+
+ASSERT (nb_sectors = s-journal_size);
+QDEBUG(JOURNAL: bjnl_write_buf_start acb%llu-%p\n, acb-uuid, acb);
+
+if (s-next_journal_sector + nb_sectors = s-journal_size) {
+journal_sec = s-next_journal_sector;
+s-next_journal_sector += nb_sectors;
+} else {
+if ((ret = recycle_journal(bs))) {
+goto fail;
+}
+journal_sec = 0;
+s-next_journal_sector = nb_sectors;
+}
+
+PRINT_JRECORDS(acb-jcb.iov.iov_base, acb-jcb.iov.iov_len);
+
+acb-jcb.hd_acb = bdrv_aio_writev(s-fvd_metadata,
+  s-journal_offset + journal_sec,
+  acb-jcb.qiov, nb_sectors,
+  bjnl_write_buf_cb, acb);
+if (acb-jcb.hd_acb) {
+return 0;
+} else {
+ret = -EIO;
+}
+
+fail:
+bjnl_finish_write_buf(acb, ret);
+return ret;
+}
+
+static void bjnl_flush_data_before_update_bitmap_cb(void *opaque, int ret)
+{
+FvdAIOCB *acb = opaque;
+
+if (acb-cancel_in_progress) {
+return;
+}
+
+QDEBUG(JOURNAL: bjnl_flush_data_before_update_bitmap_cb acb%llu-%p\n,
+   acb-uuid, acb);
+
+if (ret != 0) {
+bjnl_finish_write_buf(acb, ret);
+} else if (bjnl_write_buf_start(acb) == 0) {
+return;
+}
+
+bjnl_write_next_buf(acb-common.bs-opaque);
+}
+
+static inline int bjnl_write_buf(FvdAIOCB *acb)
+{
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+
+QDEBUG(JOURNAL: bjnl_write_buf acb%llu-%p\n, acb-uuid, acb);
+
+if (!acb-jcb.bitmap_updated) {
+return bjnl_write_buf_start(acb);
+}
+
+/* If bitmap_updated, fvd_data need be flushed first before bitmap changes
+ * can be committed. Otherwise, a host crashes after bitmap metadata are
+ * updated but before the corresponding data are persisted on disk, the VM
+ * will get corrupted data, as correct data may be in the base image. */
+acb-jcb.hd_acb = bdrv_aio_flush(s-fvd_data,
+ bjnl_flush_data_before_update_bitmap_cb,
+ acb);
+if (acb-jcb.hd_acb) {
+return 0;
+} else {
+

[Qemu-devel] [PATCH 17/26] FVD: add impl of bdrv_flush() and bdrv_aio_flush()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_flush() and bdrv_aio_flush()
interfaces.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-flush.c   |  176 +-
 block/fvd-journal-buf.c |  218 +++
 2 files changed, 390 insertions(+), 4 deletions(-)

diff --git a/block/fvd-flush.c b/block/fvd-flush.c
index 34bd5cb..6658d27 100644
--- a/block/fvd-flush.c
+++ b/block/fvd-flush.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Fast Virtual Disk Format bdrv_flush() and bdrv_aio_flush()
+ * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface
  *
  * Copyright IBM, Corp. 2010
  *
@@ -11,14 +11,182 @@
  *
  */
 
+static void aio_wrapper_bh(void *opaque);
+static int bjnl_sync_flush(BlockDriverState * bs);
+static bool bjnl_clean_buf_on_aio_flush(BlockDriverState *bs,
+  BlockDriverCompletionFunc * cb,
+  void *opaque, BlockDriverAIOCB **p_acb);
+static BlockDriverAIOCB *fvd_aio_flush_start(BlockDriverState * bs,
+  BlockDriverCompletionFunc * cb,
+  void *opaque, FvdAIOCB *parent_acb);
+
+static int fvd_flush(BlockDriverState * bs)
+{
+BDRVFvdState *s = bs-opaque;
+int ret;
+
+QDEBUG(fvd_flush() invoked\n);
+
+if (s-metadata_err_prohibit_write) {
+return -EIO;
+}
+
+if (!s-fvd_metadata-enable_write_cache) {
+/* No need to flush since it uses O_DSYNC. */
+return 0;
+}
+
+if (s-use_bjnl) {
+return bjnl_sync_flush(bs);
+}
+
+/* Simply flush for unbuffered journal update. */
+if ((ret = bdrv_flush(s-fvd_data))) {
+return ret;
+}
+if (s-fvd_metadata == s-fvd_data) {
+return 0;
+}
+return bdrv_flush(s-fvd_metadata);
+}
+
 static BlockDriverAIOCB *fvd_aio_flush(BlockDriverState * bs,
BlockDriverCompletionFunc * cb,
void *opaque)
 {
-return NULL;
+BDRVFvdState *s = bs-opaque;
+BlockDriverAIOCB * pacb;
+FvdAIOCB  *acb;
+
+QDEBUG(fvd_aio_flush() invoked\n);
+
+if (s-metadata_err_prohibit_write) {
+return NULL;
+}
+
+if (!s-fvd_data-enable_write_cache) {
+/* Need to flush since it uses O_DSYNC. Use a QEMUBH to invoke the
+ * callback. */
+
+if (!(acb = my_qemu_aio_get(fvd_aio_pool, bs, cb, opaque))) {
+return NULL;
+}
+
+acb-type = OP_WRAPPER;
+acb-cancel_in_progress = false;
+acb-wrapper.bh = qemu_bh_new(aio_wrapper_bh, acb);
+qemu_bh_schedule(acb-wrapper.bh);
+return acb-common;
+}
+
+if (!s-use_bjnl) {
+QDEBUG(FLUSH: start now for unbuffered journal update);
+return fvd_aio_flush_start(bs, cb, opaque, NULL);
+}
+
+if (bjnl_clean_buf_on_aio_flush(bs, cb, opaque, pacb)) {
+/* Waiting for the journal buffer to be cleaned first. */
+return pacb;
+}
+
+/* No buffered journal data. Start flush now. */
+QDEBUG(FLUSH: start now as no buffered journal data);
+return fvd_aio_flush_start(bs, cb, opaque, NULL);
+}
+
+static inline void finish_flush(FvdAIOCB * acb)
+{
+QDEBUG(FLUSH: acb%llu-%p  finish_flush ret=%d\n,
+   acb-uuid, acb, acb-flush.ret);
+acb-common.cb(acb-common.opaque, acb-flush.ret);
+my_qemu_aio_release(acb);
 }
 
-static int fvd_flush(BlockDriverState * bs)
+static void flush_data_cb(void *opaque, int ret)
 {
-return -ENOTSUP;
+FvdAIOCB *acb = opaque;
+
+if (acb-cancel_in_progress) {
+return;
+}
+
+QDEBUG(FLUSH: acb%llu-%p  flush_data_cb ret=%d\n, acb-uuid, acb, ret);
+
+if (acb-flush.ret == 0) {
+acb-flush.ret = ret;
+}
+
+acb-flush.data_acb = NULL;
+acb-flush.num_finished++;
+if (acb-flush.num_finished == 2) {
+finish_flush(acb);
+}
+}
+
+static void flush_metadata_cb(void *opaque, int ret)
+{
+FvdAIOCB *acb = opaque;
+
+if (acb-cancel_in_progress) {
+return;
+}
+
+QDEBUG(FLUSH: acb%llu-%p  flush_metadata_cb ret=%d\n,
+   acb-uuid, acb, ret);
+
+if (acb-flush.ret == 0) {
+acb-flush.ret = ret;
+}
+
+acb-flush.metadata_acb = NULL;
+acb-flush.num_finished++;
+if (acb-flush.num_finished == 2) {
+finish_flush(acb);
+}
+}
+
+static BlockDriverAIOCB *fvd_aio_flush_start(BlockDriverState * bs,
+   BlockDriverCompletionFunc * cb,
+   void *opaque, FvdAIOCB *parent_acb)
+{
+BDRVFvdState *s = bs-opaque;
+FvdAIOCB  *acb;
+
+if (s-fvd_data == s-fvd_metadata) {
+if (parent_acb) {
+QDEBUG(FLUSH: acb%llu-%p  
started.\n,parent_acb-uuid,parent_acb);
+}
+return

[Qemu-devel] [PATCH 23/26] FVD: add impl of interface bdrv_is_allocated()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_is_allocated() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-misc.c |   67 ++
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index 63ed168..766b62b 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -169,6 +169,73 @@ static int fvd_probe(const uint8_t * buf, int buf_size, 
const char *filename)
 static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num,
 int nb_sectors, int *pnum)
 {
+BDRVFvdState *s = bs-opaque;
+
+if (s-prefetch_state == PREFETCH_STATE_FINISHED ||
+sector_num = s-base_img_sectors ||
+!fresh_bitmap_show_sector_in_base_img(sector_num, s)) {
+/* For the three cases that data may be saved in the FVD data file, we
+ * still need to check the underlying storage because those data could
+ * be holes in a sparse image, due to the optimization of free write
+ * to zero-filled blocks. See Section 3.3.3 of the FVD-cow paper.
+ * This also covers the case of no base image. */
+
+if (!s-table) {
+return bdrv_is_allocated(s-fvd_data, s-data_offset + sector_num,
+ nb_sectors, pnum);
+}
+
+/* Use the table to figure it out. */
+int64_t first_chunk = sector_num / s-chunk_size;
+int64_t last_chunk = (sector_num + nb_sectors - 1) / s-chunk_size;
+int allocated = !IS_EMPTY(s-table[first_chunk]);
+int count;
+
+if (first_chunk == last_chunk) {
+/* All data in one chunk. */
+*pnum = nb_sectors;
+return allocated;
+}
+
+/* Data in the first chunk. */
+count = s-chunk_size - (sector_num % s-chunk_size);
+
+/* Full chunks. */
+first_chunk++;
+while (first_chunk  last_chunk) {
+if ((allocated  IS_EMPTY(s-table[first_chunk]))
+|| (!allocated  !IS_EMPTY(s-table[first_chunk]))) {
+*pnum = count;
+return allocated;
+}
+
+count += s-chunk_size;
+first_chunk++;
+}
+
+/* Data in the last chunk. */
+if ((allocated  !IS_EMPTY(s-table[last_chunk]))
+|| (!allocated  IS_EMPTY(s-table[last_chunk]))) {
+int nb = (sector_num + nb_sectors) % s-chunk_size;
+count += nb ? nb : s-chunk_size;
+}
+
+*pnum = count;
+return allocated;
+}
+
+/* Use the FVD metadata to find out sectors in the base image. */
+int64_t end = sector_num + nb_sectors;
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+
+int64_t next = sector_num + 1;
+while (next  end  fresh_bitmap_show_sector_in_base_img(next, s)) {
+next++;
+}
+
+*pnum = next - sector_num;
 return 0;
 }
 
-- 
1.7.0.4

[Qemu-devel] [PATCH 03/26] FVD: add fully automated test-qcow2.sh

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

test-qcow2.sh drives 'qemu-io --auto' to perform fully automated testing for
QCOW2.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 test-qcow2.sh |   89 +
 1 files changed, 89 insertions(+), 0 deletions(-)
 create mode 100755 test-qcow2.sh

diff --git a/test-qcow2.sh b/test-qcow2.sh
new file mode 100755
index 000..d1e4dc0
--- /dev/null
+++ b/test-qcow2.sh
@@ -0,0 +1,89 @@
+#!/bin/bash
+
+# Drive 'qemu-io --auto' to test the QCOW2 image format.
+#
+# Copyright IBM, Corp. 2010
+#
+# Authors:
+# Chunqiang Tang ct...@us.ibm.com
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+if [ $USER != root ]; then
+echo This command must be run by root in order to mount tmpfs.
+exit 1
+fi
+
+QEMU_DIR=.
+QEMU_IMG=$QEMU_DIR/qemu-img
+QEMU_IO=$QEMU_DIR/qemu-io
+
+if [ ! -e $QEMU_IMG ]; then
+echo $QEMU_IMG does not exist.
+exit 1;
+fi
+
+if [ ! -e $QEMU_IO ]; then
+echo $QEMU_IO does not exist.
+exit 1;
+fi
+
+DATA_DIR=/var/ramdisk
+TRUTH_IMG=$DATA_DIR/truth.raw
+TEST_IMG=$DATA_DIR/test.qcow2
+TEST_BASE=$DATA_DIR/zero-500M.raw
+CMD_LOG=./test-qcow2.log
+
+parallel=100
+round=1
+fail_prob=0.1
+cancel_prob=0
+instant_qemubh=true
+seed=$RANDOM$RANDOM
+count=0
+
+function invoke() {
+echo $*  $CMD_LOG
+$*
+ret=$?
+if [ $ret -ne 0 ]; then
+echo Exit with error code $ret: $*
+exit $ret
+fi
+}
+
+mount | grep $DATA_DIR  /dev/null
+if [ $? -ne 0 ]; then
+echo Create tmpfs at $DATA_DIR to store testing images.
+if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi
+invoke mount -t tmpfs none $DATA_DIR -o size=4G
+if [ $? -ne 0 ]; then exit 1; fi
+fi
+
+/bin/rm -f $CMD_LOG $DATA_DIR/*
+touch $CMD_LOG
+
+while [ -t ]; do
+for cache in none writethrough writeback; do
+for cluster_size in 65536 ; do
+for io_size in 1048576 ; do
+count=$[$count + 1]
+echo Round $count  $CMD_LOG
+
+# QCOW2 image is about 1G
+img_size=$[(1073741824 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512]
+
+# base image is about 500MB
+base_size=$[(536870912 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512]
+
+invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG $TEST_BASE
+invoke $QEMU_IO --auto --create=$TEST_BASE --seed=$seed 
--block_size=1048576 --empty_block_prob=0 --empty_block_chain=1 
--file_size=$base_size
+invoke cp --sparse=always $TEST_BASE $TRUTH_IMG
+invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size
+invoke $QEMU_IMG create -f qcow2 
-ocluster_size=$cluster_size,backing_fmt=blksim -b $TEST_BASE $TEST_IMG 
$img_size
+
+invoke $QEMU_IO --auto --cache=$cache --seed=$seed --truth=$TRUTH_IMG 
--format=qcow2 --test=blksim:$TEST_IMG --verify_write=true 
--compare_before=false --compare_after=true --round=$round --parallel=$parallel 
--io_size=$io_size --fail_prob=$fail_prob --cancel_prob=$cancel_prob 
--instant_qemubh=$instant_qemubh
+
+seed=$[$seed + 1]
+done; done; done; done
-- 
1.7.0.4

[Qemu-devel] [PATCH 26/26] FVD: add fully automated test-fvd.sh

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

test-fvd.sh drives 'qemu-io --auto' to perform fully automated testing for FVD.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 test-fvd.sh |  161 +++
 1 files changed, 161 insertions(+), 0 deletions(-)
 create mode 100755 test-fvd.sh

diff --git a/test-fvd.sh b/test-fvd.sh
new file mode 100755
index 000..3d67c3f
--- /dev/null
+++ b/test-fvd.sh
@@ -0,0 +1,161 @@
+#!/bin/bash
+
+# Drive 'qemu-io --auto' to test the FVD image format.
+#
+# Copyright IBM, Corp. 2010
+#
+# Authors:
+# Chunqiang Tang ct...@us.ibm.com
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+if [ $USER != root ]; then
+echo This command must be run by root in order to mount tmpfs.
+exit 1
+fi
+
+QEMU_DIR=.
+QEMU_IMG=$QEMU_DIR/qemu-img
+QEMU_IO=$QEMU_DIR/qemu-io
+
+if [ ! -e $QEMU_IMG ]; then
+echo $QEMU_IMG does not exist.
+exit 1;
+fi
+
+if [ ! -e $QEMU_IO ]; then
+echo $QEMU_IO does not exist.
+exit 1;
+fi
+
+DATA_DIR=/var/ramdisk
+TRUTH_IMG=$DATA_DIR/truth.raw
+TEST_IMG=$DATA_DIR/test.fvd
+TEST_BASE=$DATA_DIR/zero-500M.raw
+TEST_IMG_DATA=$DATA_DIR/test.dat
+CMD_LOG=./test-fvd.log
+
+G1=1073741824
+MAX_MEM=536870912
+MAX_ROUND=100
+MAX_IO_SIZE=1
+fail_prob=0.1
+cancel_prob=0.1
+flush_prob_base=0.05
+aio_flush_prob_base=0.1
+seed=$RANDOM$RANDOM
+count=0
+
+function invoke() {
+echo $*  $CMD_LOG
+sync
+$*
+ret=$?
+if [ $ret -ne 0 ]; then
+echo $Exit with error code $ret: $*
+exit $ret
+fi
+}
+
+mount | grep $DATA_DIR  /dev/null
+if [ $? -ne 0 ]; then
+echo Create tmpfs at $DATA_DIR to store testing images.
+if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi
+invoke mount -t tmpfs none $DATA_DIR -o size=4G
+if [ $? -ne 0 ]; then exit 1; fi
+fi
+
+/bin/rm -f $CMD_LOG $DATA_DIR/*
+touch $CMD_LOG
+
+while [ -t ]; do
+for block_size in 7680 512 1024 15872 65536 65024 1048576 1048064; do
+for chunk_mult in 5 1 2 3 7 9 12 16 33 99 ; do
+for cache in writeback writethrough ; do
+#for compact_image in on off ; do
+for compact_image in on ; do
+for prefetch_delay in 1 0; do
+for copy_on_read in on off; do
+for base_img in -b $TEST_BASE  ; do
+chunk_size=$[$block_size * $chunk_mult]
+large_io_size=$[$chunk_size * 5]
+if [ $large_io_size -gt $MAX_IO_SIZE ]; then 
large_io_size=$MAX_IO_SIZE; fi
+for io_size in $large_io_size 1048576 ; do
+for use_data_file in  data_file=$TEST_IMG_DATA, ; do
+
+if [ cache == writethrough ]; then
+JOURNAL_BUF_SIZE=0
+JOURNAL_CLEAN_BUF_PERIOD=0
+else
+JOURNAL_BUF_SIZE=512 1024 65536
+JOURNAL_CLEAN_BUF_PERIOD=5000 1000 6
+fi
+
+for journal_buf_size in $JOURNAL_BUF_SIZE ; do
+for journal_clean_buf_period in $JOURNAL_CLEAN_BUF_PERIOD ; do
+/bin/rm -rf /tmp/fvd.log*
+
+# FVD image is about 1G
+img_size=$[(1073741824 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 
512]
+
+# base image is about 500MB
+base_size=$[(536870912 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 
512]
+
+count=$[$count + 1]
+echo Round $count  $CMD_LOG
+
+invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG $TEST_BASE $TEST_IMG_DATA
+
+if [ -z $base_img ]; then
+# Use zero-filled empty images.
+invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size
+else
+# Use images with random contents.
+invoke $QEMU_IO --auto --create=$TEST_BASE --seed=$seed 
--block_size=$block_size --empty_block_prob=0.2 --empty_block_chain=10 
--file_size=$base_size
+invoke cp --sparse=always $TEST_BASE $TRUTH_IMG
+invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size
+fi
+
+if [ ! -z $use_data_file ]; then invoke touch $TEST_IMG_DATA; fi
+
+# Ensure the journal is large enough to hold at least one write.
+mixed_records_per_journal_sector=119
+if [ cache == writethrough ]; then
+journal_size_factor=1000
+else
+journal_size_factor=100
+fi
+journal_size=$[$io_size / $chunk_size ) + 1 ) / 
$mixed_records_per_journal_sector ) + 1) * 512 * (1 + $RANDOM$RANDOM % 
$journal_size_factor) ]
+
+invoke $QEMU_IMG create -f fvd $base_img 
-ojournal_buf_size=$journal_buf_size,journal_clean_buf_period=$journal_clean_buf_period,${use_data_file}data_file_fmt=blksim,backing_fmt=blksim,compact_image=$compact_image,copy_on_read=$copy_on_read,block_size=$block_size,chunk_size=$chunk_size,journal_size=$journal_size,prefetch_start_delay=$prefetch_delay
 $TEST_IMG $img_size
+invoke $QEMU_IMG update -oinit_data_region=on $TEST_IMG
+if [ $prefetch_delay -eq 1 ]; then

[Qemu-devel] [PATCH 21/26] FVD: add impl of interface bdrv_close()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_close() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-misc.c |   78 ++
 1 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index c515d74..63ed168 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -81,6 +81,84 @@ static void fvd_aio_cancel(BlockDriverAIOCB * blockacb)
 
 static void fvd_close(BlockDriverState * bs)
 {
+BDRVFvdState *s = bs-opaque;
+FvdAIOCB *acb;
+int i;
+
+if (s-prefetch_state == PREFETCH_STATE_RUNNING) {
+s-prefetch_state = PREFETCH_STATE_DISABLED;
+}
+if (s-prefetch_timer) {
+qemu_del_timer(s-prefetch_timer);
+qemu_free_timer(s-prefetch_timer);
+s-prefetch_timer = NULL;
+}
+
+if (s-prefetch_acb) {
+/* Clean up prefetch operations. */
+for (i = 0; i  s-num_prefetch_slots; i++) {
+if (s-prefetch_acb[i] != NULL) {
+fvd_aio_cancel_copy(s-prefetch_acb[i]);
+s-prefetch_acb[i] = NULL;
+}
+}
+my_qemu_free(s-prefetch_acb);
+s-prefetch_acb = NULL;
+}
+
+if (s-use_bjnl) {
+/* Clean up buffered journal update. */
+bjnl_sync_flush(bs);
+if (s-bjnl.timer_scheduled) {
+qemu_del_timer(s-bjnl.clean_buf_timer);
+}
+qemu_free_timer(s-bjnl.clean_buf_timer);
+}
+
+/* Clean up unfinished copy_on_read operations. */
+QLIST_FOREACH(acb, s-copy_locks, copy_lock.next) {
+fvd_aio_cancel_copy(acb);
+}
+
+flush_metadata_to_disk_on_exit(bs);
+
+if (s-stale_bitmap) {
+my_qemu_vfree(s-stale_bitmap);
+if (s-fresh_bitmap != s-stale_bitmap) {
+my_qemu_vfree(s-fresh_bitmap);
+}
+s-stale_bitmap = NULL;
+s-fresh_bitmap = NULL;
+}
+
+if (s-table) {
+my_qemu_vfree(s-table);
+s-table = NULL;
+}
+
+if (s-fvd_metadata) {
+if (s-fvd_metadata != s-fvd_data) {
+bdrv_delete(s-fvd_metadata);
+}
+s-fvd_metadata = NULL;
+}
+if (s-fvd_data) {
+bdrv_delete(s-fvd_data);
+s-fvd_data = NULL;
+}
+
+if (s-add_storage_cmd) {
+my_qemu_free(s-add_storage_cmd);
+s-add_storage_cmd = NULL;
+}
+
+if (s-leaked_chunks) {
+my_qemu_free(s-leaked_chunks);
+s-leaked_chunks = NULL;
+}
+#ifdef FVD_DEBUG
+dump_resource_summary(s);
+#endif
 }
 
 static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename)
-- 
1.7.0.4

[Qemu-devel] [PATCH 22/26] FVD: add impl of interface bdrv_update()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_update() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-update.c |  274 +++-
 1 files changed, 272 insertions(+), 2 deletions(-)

diff --git a/block/fvd-update.c b/block/fvd-update.c
index 2498618..4ef4969 100644
--- a/block/fvd-update.c
+++ b/block/fvd-update.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Fast Virtual Disk Format bdrv_update
+ * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface
  *
  * Copyright IBM, Corp. 2010
  *
@@ -13,9 +13,279 @@
 
 static int fvd_update(BlockDriverState * bs, QEMUOptionParameter * options)
 {
-return -ENOTSUP;
+BDRVFvdState *s = bs-opaque;
+FvdHeader header;
+int ret;
+
+read_fvd_header(s, header);
+
+while (options  options-name) {
+if (!strcmp(options-name, BLOCK_OPT_SIZE)) {
+if (header.table_offset  0) {
+fprintf(stderr, Cannot resize a compact FVD image.\n);
+return -EINVAL;
+}
+if (options-value.n  header.virtual_disk_size) {
+printf(Warning: image's new size % PRId64
+is smaller than the original size % PRId64
+   . Some image data will be truncated.\n,
+   options-value.n, header.virtual_disk_size);
+}
+header.virtual_disk_size = options-value.n;
+printf(Image resized to % PRId64  bytes.\n, options-value.n);
+} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FILE)) {
+if (strlen(options-value.s)  1023) {
+fprintf(stderr, Error: the new base image name is longer 
+than 1023, which is not allowed.\n);
+return -EINVAL;
+}
+memset(header.base_img, 0, 1024);
+pstrcpy(header.base_img, 1024, options-value.s);
+printf(Backing file updated to '%s'.\n, options-value.s);
+} else if (!strcmp(options-name, data_file)) {
+if (strlen(options-value.s)  1023) {
+fprintf(stderr, Error: the new data file name is longer 
+than 1023, which is not allowed.\n);
+return -EINVAL;
+}
+
+memset(header.data_file, 0, 1024);
+pstrcpy(header.data_file, 1024, options-value.s);
+printf(Data file updated to '%s'.\n, options-value.s);
+} else if (!strcmp(options-name, need_zero_init)) {
+header.need_zero_init = options-value.n;
+if (header.need_zero_init) {
+printf(need_zero_init is turned on.\n);
+} else {
+printf(need_zero_init is turned off.\n);
+}
+} else if (!strcmp(options-name, copy_on_read)) {
+header.copy_on_read = options-value.n;
+if (header.copy_on_read) {
+printf(Copy on read is enabled for this disk.\n);
+} else {
+printf(Copy on read is disabled for this disk.\n);
+}
+} else if (!strcmp(options-name, clean_shutdown)) {
+header.clean_shutdown = options-value.n;
+if (header.clean_shutdown) {
+printf(clean_shutdown is manually set to true\n);
+} else {
+printf(clean_shutdown is manually set to false\n);
+}
+} else if (!strcmp(options-name, journal_buf_size)) {
+header.journal_buf_size = options-value.n;
+printf(journal_buf_size is updated to %PRIu64 bytes.\n,
+   header.journal_buf_size);
+} else if (!strcmp(options-name, journal_clean_buf_period)) {
+header.journal_clean_buf_period = options-value.n;
+printf(journal_clean_buf_period is updated to %PRIu64
+milliseconds.\n,
+   header.journal_clean_buf_period);
+} else if (!strcmp(options-name,max_outstanding_copy_on_read_data)) 
{
+header.max_outstanding_copy_on_read_data = options-value.n;
+if (header.max_outstanding_copy_on_read_data = 0) {
+fprintf(stderr, Error: max_outstanding_copy_on_read_data 
+must be positive.\n);
+return -EINVAL;
+}
+printf(max_outstanding_copy_on_read_data updated to % PRId64
+   .\n, header.max_outstanding_copy_on_read_data);
+} else if (!strcmp(options-name, init_data_region)) {
+if (options-value.n  !s-data_region_prepared) {
+init_data_region(s);
+}
+} else if (!strcmp(options-name, prefetch_start_delay)) {
+if (options-value.n = 0) {
+header.prefetch_start_delay = -1;
+} else {
+header.prefetch_start_delay =

[Qemu-devel] [PATCH 15/26] FVD: add basic journal functionality

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the basic journal functionality to FVD. The journal provides
several benefits. First, updating both the bitmap and the lookup table
requires only a single write to journal. Second, K concurrent updates to any
potions of the bitmap or the lookup table are converted to K sequential writes
in the journal, which can be merged into a single write by the host Linux
kernel. Third, it increases concurrency by avoiding locking the bitmap or the
lookup table. For example, updating one bit in the bitmap requires writing a
512-byte sector to the on-disk bitmap. This bitmap sector covers a total of
512*8*64K=256MB data, and any two writes to that same bitmap sector cannot
proceed concurrently. The journal solves this problem and eliminates
unnecessary locking.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block.c |2 +-
 block/fvd-bitmap.c  |   57 
 block/fvd-journal-buf.c |   34 ++
 block/fvd-journal.c |  814 ++-
 block/fvd-write.c   |1 +
 block/fvd.c |   19 ++
 6 files changed, 920 insertions(+), 7 deletions(-)
 create mode 100644 block/fvd-journal-buf.c

diff --git a/block.c b/block.c
index f7d91a2..8b3083d 100644
--- a/block.c
+++ b/block.c
@@ -58,7 +58,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
 
-static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
+QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c
index 30e4a4b..06d7912 100644
--- a/block/fvd-bitmap.c
+++ b/block/fvd-bitmap.c
@@ -66,6 +66,63 @@ static inline void update_fresh_bitmap(int64_t sector_num, 
int nb_sectors,
 }
 }
 
+static void update_stale_bitmap(BDRVFvdState * s, int64_t sector_num,
+int nb_sectors)
+{
+if (sector_num = s-base_img_sectors) {
+return;
+}
+
+int64_t end = sector_num + nb_sectors;
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+
+int64_t block_num = sector_num / s-block_size;
+const int64_t block_end = (end - 1) / s-block_size;
+
+for (; block_num = block_end; block_num++) {
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t mask = (uint8_t) (0x01  bitmap_bit_offset);
+uint8_t b = s-stale_bitmap[bitmap_byte_offset];
+if (!(b  mask)) {
+ASSERT(s-stale_bitmap == s-fresh_bitmap ||
+   (s-fresh_bitmap[bitmap_byte_offset]  mask));
+b |= mask;
+s-stale_bitmap[bitmap_byte_offset] = b;
+}
+}
+}
+
+static void update_both_bitmaps(BDRVFvdState * s, int64_t sector_num,
+int nb_sectors)
+{
+if (sector_num = s-base_img_sectors) {
+return;
+}
+
+int64_t end = sector_num + nb_sectors;
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+
+int64_t block_num = sector_num / s-block_size;
+const int64_t block_end = (end - 1) / s-block_size;
+
+for (; block_num = block_end; block_num++) {
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t mask = (uint8_t) (0x01  bitmap_bit_offset);
+uint8_t b = s-fresh_bitmap[bitmap_byte_offset];
+if (!(b  mask)) {
+b |= mask;
+s-fresh_bitmap[bitmap_byte_offset] =
+s-stale_bitmap[bitmap_byte_offset] = b;
+}
+}
+}
+
 static inline bool bitmap_show_sector_in_base_img(int64_t sector_num,
   const BDRVFvdState * s,
   int bitmap_offset,
diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c
new file mode 100644
index 000..3efdd47
--- /dev/null
+++ b/block/fvd-journal-buf.c
@@ -0,0 +1,34 @@
+/*
+ * QEMU Fast Virtual Disk Format Metadata Journal
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+/*=
+ * There are two different ways of writing metadata changes to the journal:
+ * immediate write or buffered write. If cache=writethrough, metadata changes
+ * are written to the journal immediately. If cache!=writethrough, metadata
+ * changes are buffered in memory and later written to the journal either
+ * triggered by bdrv_aio_flush() or by a timeout. This module implements the

[Qemu-devel] [PATCH 14/26] FVD: add impl of loading data from compact image

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the implementation of load data from a compact image. This
capability is to support fvd_aio_readv() when FVD is configured to use its
one-level lookup table to do storage allocation.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-load.c  |  448 +
 block/fvd-utils.c |   40 +
 2 files changed, 488 insertions(+), 0 deletions(-)

diff --git a/block/fvd-load.c b/block/fvd-load.c
index 80ab32c..88e5fb4 100644
--- a/block/fvd-load.c
+++ b/block/fvd-load.c
@@ -11,10 +11,458 @@
  *
  */
 
+static void load_data_from_compact_image_cb(void *opaque, int ret);
+static BlockDriverAIOCB *load_data_from_compact_image(FvdAIOCB *parent_acb,
+BlockDriverState * bs, int64_t sector_num,
+QEMUIOVector * qiov, int nb_sectors,
+BlockDriverCompletionFunc * cb, void *opaque);
+static inline FvdAIOCB *init_load_acb(FvdAIOCB * parent_acb,
+BlockDriverState * bs, int64_t sector_num,
+QEMUIOVector * orig_qiov, int nb_sectors,
+BlockDriverCompletionFunc * cb, void *opaque);
+static int load_create_child_requests(bool count_only, BDRVFvdState *s,
+QEMUIOVector * orig_qiov, int64_t sector_num,
+int nb_sectors, int *p_nziov, int *p_niov, int *p_nqiov,
+FvdAIOCB *acb,  QEMUIOVector *q, struct iovec *v);
+
 static inline BlockDriverAIOCB *load_data(FvdAIOCB * parent_acb,
 BlockDriverState * bs, int64_t sector_num,
 QEMUIOVector * orig_qiov, int nb_sectors,
 BlockDriverCompletionFunc * cb, void *opaque)
 {
+BDRVFvdState *s = bs-opaque;
+
+if (!s-table) {
+/* Load directly since it is not a compact image. */
+return bdrv_aio_readv(s-fvd_data, s-data_offset + sector_num,
+  orig_qiov, nb_sectors, cb, opaque);
+} else {
+return load_data_from_compact_image(parent_acb, bs, sector_num,
+orig_qiov, nb_sectors, cb, opaque);
+}
+}
+
+static BlockDriverAIOCB *load_data_from_compact_image(FvdAIOCB * parent_acb,
+BlockDriverState * bs, int64_t sector_num,
+QEMUIOVector * orig_qiov, int nb_sectors,
+BlockDriverCompletionFunc * cb, void *opaque)
+{
+BDRVFvdState *s = bs-opaque;
+FvdAIOCB * acb;
+int64_t start_sec = -1;
+int nziov = 0;
+int nqiov = 0;
+int niov = 0;
+int i;
+
+/* Count the number of qiov and iov needed to cover the continuous regions
+ * of the compact image. */
+load_create_child_requests(true/*count_only*/, s, orig_qiov, sector_num,
+  nb_sectors, nziov, niov, nqiov, NULL, NULL, NULL);
+
+if (nqiov + nziov == 1) {
+/* All data can be read in one qiov. Reuse orig_qiov. */
+if (nziov == 1) {
+/* This is a zero-filled region. */
+for (i = 0; i  orig_qiov-niov; i++) {
+memset(orig_qiov-iov[i].iov_base,
+   0, orig_qiov-iov[i].iov_len);
+}
+
+/* Use a bh to invoke the callback. */
+if (!(acb = my_qemu_aio_get(fvd_aio_pool, bs, cb, opaque))) {
+return NULL;
+}
+COPY_UUID(acb, parent_acb);
+QDEBUG(LOAD: acb%llu-%p  load_fill_all_with_zeros\n,
+   acb-uuid, acb);
+acb-type = OP_WRAPPER;
+acb-cancel_in_progress = false;
+acb-wrapper.bh = qemu_bh_new(aio_wrapper_bh, acb);
+qemu_bh_schedule(acb-wrapper.bh);
+return acb-common;
+} else {
+/* A non-empty region. */
+const uint32_t first_chunk = sector_num / s-chunk_size;
+start_sec = READ_TABLE(s-table[first_chunk]) * s-chunk_size +
+(sector_num % s-chunk_size);
+if (parent_acb) {
+QDEBUG(LOAD: acb%llu-%p  
+   load_directly_as_one_continuous_region\n,
+   parent_acb-uuid, parent_acb);
+}
+return bdrv_aio_readv(s-fvd_data, s-data_offset + start_sec,
+  orig_qiov, nb_sectors, cb, opaque);
+}
+}
+
+/* Need to submit multiple requests to the lower layer. Initialize acb. */
+if (!(acb = init_load_acb(parent_acb, bs, sector_num, orig_qiov,
+  nb_sectors, cb, opaque))) {
+return NULL;
+}
+acb-load.num_children = nqiov;
+
+/* Allocate memory and create multiple requests. */
+acb-load.children = my_qemu_malloc((sizeof(CompactChildCB) +
+ sizeof(QEMUIOVector)) * nqiov +
+

[Qemu-devel] [PATCH 13/26] FVD: add impl of storing data in compact image

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the implementation of storing data in a compact image. This
capability is needed for both copy-on-write (see fvd_aio_writev()) and
copy-on-read (see fvd_aio_readv()).

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-store.c |  459 +
 block/fvd-utils.c |   65 
 2 files changed, 524 insertions(+), 0 deletions(-)

diff --git a/block/fvd-store.c b/block/fvd-store.c
index 85e45d4..fe670eb 100644
--- a/block/fvd-store.c
+++ b/block/fvd-store.c
@@ -11,10 +11,469 @@
  *
  */
 
+static uint32_t allocate_chunk(BlockDriverState * bs);
+static inline FvdAIOCB *init_store_acb(int soft_write,
+QEMUIOVector * orig_qiov, BlockDriverState * bs,
+int64_t sector_num, int nb_sectors, FvdAIOCB * parent_acb,
+BlockDriverCompletionFunc * cb, void *opaque);
+static BlockDriverAIOCB *store_data_in_compact_image(int soft_write,
+struct FvdAIOCB *parent_acb, BlockDriverState * bs,
+int64_t sector_num, QEMUIOVector * qiov, int nb_sectors,
+BlockDriverCompletionFunc * cb, void *opaque);
+static void store_data_in_compact_image_cb(void *opaque, int ret);
+
 static inline BlockDriverAIOCB *store_data(int soft_write,
 FvdAIOCB * parent_acb, BlockDriverState * bs,
 int64_t sector_num, QEMUIOVector * orig_qiov, int nb_sectors,
 BlockDriverCompletionFunc * cb, void *opaque)
 {
+BDRVFvdState *s = bs-opaque;
+
+TRACE_STORE_IN_FVD(store_data, sector_num, nb_sectors);
+
+if (!s-table) {
+/* Write directly since it is not a compact image. */
+return bdrv_aio_writev(s-fvd_data, s-data_offset + sector_num,
+   orig_qiov, nb_sectors, cb, opaque);
+} else {
+return store_data_in_compact_image(soft_write, parent_acb, bs,
+   sector_num, orig_qiov, nb_sectors,
+   cb, opaque);
+}
+}
+
+/* Store data in the compact image. The argument 'soft_write' means
+ * the store was caused by copy-on-read or prefetching, which need not
+ * update metadata immediately. */
+static BlockDriverAIOCB *store_data_in_compact_image(int soft_write,
+ FvdAIOCB * parent_acb,
+ BlockDriverState * bs,
+ int64_t sector_num,
+ QEMUIOVector * orig_qiov,
+ const int nb_sectors,
+ BlockDriverCompletionFunc
+ * cb, void *opaque)
+{
+BDRVFvdState *s = bs-opaque;
+FvdAIOCB *acb;
+const uint32_t first_chunk = sector_num / s-chunk_size;
+const uint32_t last_chunk = (sector_num + nb_sectors - 1) / s-chunk_size;
+int table_dirty = false;
+uint32_t chunk;
+int64_t start_sec;
+
+/* Check if storag space is allocated. */
+for (chunk = first_chunk; chunk = last_chunk; chunk++) {
+if (IS_EMPTY(s-table[chunk])) {
+uint32_t id = allocate_chunk(bs);
+if (IS_EMPTY(id)) {
+return NULL;
+}
+QDEBUG (STORE: map chunk %u to %u\n, chunk, id);
+id |= DIRTY_TABLE;
+WRITE_TABLE(s-table[chunk], id);
+table_dirty = true;
+} else if (IS_DIRTY(s-table[chunk])) {
+/* This is possible in several cases. 1) If a previous soft-write
+ * allocated the storage space but did not flush the table entry
+ * change to the journal and hence did not clean the dirty bit. 2)
+ * This is possible if a previous hard-write was canceled before
+ * it could write the table entry to disk. 3) Finally, this is
+ * also possible with two concurrent hard-writes. The first
+ * hard-write allocated the storage space but has not flushed the
+ * table entry change to the journal yet and hence the table entry
+ * remains dirty. In this case, the second hard-write will also
+ * try to flush this dirty table entry to the journal. The outcome
+ * is correct since they store the same metadata change in the
+ * journal (although twice). For this race condition, we prefer to
+ * have two writes to the journal rather than introducing a
+ * locking mechanism, because this happens rarely and those two
+ * writes to the journal are likely to be merged by the kernel
+ * into a single write since they are likely to update
+ * back-to-back sectors in the journal.  A locking

[Qemu-devel] FVD latest patches with your review comments addressed

2011-02-25 Thread Chunqiang Tang

Hi Andreas, Anthony, Stefan H., and Stefan W.,

I just posed the latest series of FVD patches to the mailing list, which 
addressed the review comments you previously made on FVD . Thank you for 
the feedback. Off the mailing list, Stefan Weil provided guidance on 
porting FVD to win32 and also sent me some patches, which have also been 
incorporated - many thanks.

This new release addressed the following review comments.
- Formatting issues like white space and empty lines are fixed.
- Code style is made consistent with what is described in CODING_STYLE.
- Non-portable header files are removed.
- Non-portable code is rewritten.
- File header is fixed with copyright and license information, and now 
includes more descriptive information.
- 'qemu-img update' is modified to use QEMUOptionParameter, like that in 
qemu-img create.
- Make FVD's testing tools part of qemu-io.
- Patches are broken into smaller, coherent pieces. 
- No patch breaks the build or bisect.
- FVD has been ported to win32 on Cygwin, 32-bit Linux on i686, and 64 bit 
Linux on x86_64.

Your further comments and feedback are more than welcome. Thanks.

Regards,
ChunQiang (CQ) Tang
Homepage: http://www.research.ibm.com/people/c/ctang

[Qemu-devel] [PATCH 07/26] FVD: extend FVD header fvd.h to be more complete

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch makes FVD's header file fvd.h more complete, by adding type
definition for BDRVFvdState, FvdAIOCB, etc.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd.h |  337 +++
 1 files changed, 337 insertions(+), 0 deletions(-)

diff --git a/block/fvd.h b/block/fvd.h
index f2da330..b83b7aa 100644
--- a/block/fvd.h
+++ b/block/fvd.h
@@ -168,4 +168,341 @@ typedef struct __attribute__ ((__packed__)) FvdHeader {
 } FvdHeader;
 
 typedef struct BDRVFvdState {
+BlockDriverState *fvd_metadata;
+BlockDriverState *fvd_data;
+uint64_t virtual_disk_size;  /*in bytes. */
+uint64_t bitmap_offset;  /* in sectors */
+uint64_t bitmap_size;/* in bytes. */
+uint64_t data_offset;/* in sectors. Begin of real data. */
+uint64_t base_img_sectors;
+uint64_t block_size; /* in sectors. */
+bool copy_on_read;
+uint64_t max_outstanding_copy_on_read_data;/* in bytes. */
+uint64_t outstanding_copy_on_read_data;/* in bytes. */
+bool data_region_prepared;
+QLIST_HEAD(WriteLocks, FvdAIOCB) write_locks; /* All writes. */
+QLIST_HEAD(CopyLocks, FvdAIOCB) copy_locks; /* copy-on-read and CoW. */
+
+/* Keep two copies of bitmap to reduce the overhead of updating the
+ * on-disk bitmap, i.e., copy-on-read and prefetching do not update the
+ * on-disk bitmap. See Section 3.3.4 of the FVD-cow paper. */
+uint8_t *fresh_bitmap;
+uint8_t *stale_bitmap;
+
+/ Begin: for compact image. */
+uint32_t *table;/* Mapping table stored in memory in little endian. */
+uint64_t table_size;/* in bytes. */
+uint64_t used_storage;/* in sectors. */
+uint64_t avail_storage;/* in sectors. */
+uint64_t chunk_size;  /* in sectors. */
+uint64_t storage_grow_unit;   /* in sectors. */
+uint64_t table_offset;/* in sectors. */
+char *add_storage_cmd;
+uint32_t *leaked_chunks;
+uint32_t num_leaked_chunks;
+uint32_t next_avail_leaked_chunk;
+uint32_t chunks_relocated;/* Affect bdrv_has_zero_init(). */
+/ Begin: for compact image. */
+
+/ Begin: for journal. ***/
+uint64_t journal_offset;   /* in sectors. */
+uint64_t journal_size; /* in sectors. */
+uint64_t journal_epoch;
+uint64_t next_journal_sector;  /* in sector. */
+bool dirty_image;
+bool metadata_err_prohibit_write;
+
+/* There are two different ways of writing metadata changes to the
+ * journal. If cache=writethrough, metadata changes are written to the
+ * journal immediately. If (cache!=writethrough||IN_QEMU_TOOL), metadata
+ * changes are buffered in memory (bjnl.journal_buf below), and later
+ * written to the journal either triggered by bdrv_aio_flush() or by a
+ * timeout (bjnl.clean_buf_timer below). */
+bool use_bjnl;  /* 'bjnl' stands for buffered journal update. */
+union {
+/* 'ujnl' stands for unbuffered journal update. */
+struct {
+int active_writes;
+/* Journal writes waiting for journal recycle to finish.
+ * See JournalCB.ujnl_next_wait4_recycle. */
+QLIST_HEAD(JournalRecycle, FvdAIOCB) wait4_recycle;
+} ujnl;
+
+/* 'bjnl' stands for buffered journal update. */
+struct {
+uint8_t *buf;
+size_t buf_size;
+size_t def_buf_size;
+size_t buf_used;
+bool buf_contains_bitmap_update;
+QEMUTimer *clean_buf_timer;
+bool timer_scheduled;
+uint64_t clean_buf_period;
+/* See JournalCB.bjnl_next_queued_buf. */
+QTAILQ_HEAD(CleanBuf, FvdAIOCB) queued_bufs;
+} bjnl;
+};
+/ End: for journal. /
+
+/ Begin: for prefetching. ***/
+struct FvdAIOCB **prefetch_acb;
+int prefetch_state;/* PREFETCH_STATE_RUNNING, FINISHED, or DISABLED. */
+int num_prefetch_slots;
+int num_filled_prefetch_slots;
+int next_prefetch_read_slot;
+bool prefetch_read_active;
+bool pause_prefetch_requested;
+int64_t prefetch_start_delay;  /* in seconds  */
+uint64_t unclaimed_prefetch_region_start;
+uint64_t prefetch_read_time; /* in milliseconds. */
+uint64_t prefetch_write_time;/* in milliseconds. */
+uint64_t prefetch_data_read; /* in bytes. */
+uint64_t prefetch_data_written;  /* in bytes. */
+double prefetch_read_throughput; /* in bytes/millisecond. 
*/
+double

[Qemu-devel] [PATCH 25/26] FVD: add impl of interface bdrv_probe()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_probe() interface.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-misc.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index 61e39bb..6315218 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -163,7 +163,14 @@ static void fvd_close(BlockDriverState * bs)
 
 static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename)
 {
-return 0;
+const FvdHeader *header = (const void *)buf;
+
+if (buf_size = sizeof(uint32_t) 
+le32_to_cpu(header-magic) == FVD_MAGIC) {
+return 100;
+} else {
+return 0;
+}
 }
 
 static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num,
-- 
1.7.0.4

[Qemu-devel] [PATCH 05/26] FVD: add the 'qemu-img update' command

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the 'update' command to qemu-img. It is a general interface
that allows various image format specific manipulations. For example,
'qemu-img rebase' and 'qemu-img resize' can be considered as two special cases
of update.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block_int.h  |3 +
 qemu-img-cmds.hx |6 +++
 qemu-img.c   |  125 +++---
 qemu-option.c|   79 ++
 qemu-option.h|4 ++
 5 files changed, 201 insertions(+), 16 deletions(-)

diff --git a/block_int.h b/block_int.h
index 545ad11..8f6b6d0 100644
--- a/block_int.h
+++ b/block_int.h
@@ -98,6 +98,7 @@ struct BlockDriver {
 int (*bdrv_snapshot_load_tmp)(BlockDriverState *bs,
   const char *snapshot_name);
 int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
+int (*bdrv_update)(BlockDriverState *bs, QEMUOptionParameter *options);
 
 int (*bdrv_save_vmstate)(BlockDriverState *bs, const uint8_t *buf,
  int64_t pos, int size);
@@ -122,6 +123,8 @@ struct BlockDriver {
 /* List of options for creating images, terminated by name == NULL */
 QEMUOptionParameter *create_options;
 
+/* List of options for updating images, terminated by name == NULL */
+QEMUOptionParameter *update_options;
 
 /*
  * Returns 0 for completed check, -errno for internal errors.
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 6c7176f..a7ed395 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -39,6 +39,12 @@ STEXI
 @item info [-f @var{fmt}] @var{filename}
 ETEXI
 
+DEF(update, img_update,
+update [-f fmt] [-o options] filename)
+STEXI
+@item update [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+ETEXI
+
 DEF(snapshot, img_snapshot,
 snapshot [-l | -a snapshot | -c snapshot | -d snapshot] filename)
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index 7e3cc4c..215e7b9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -179,10 +179,11 @@ static int read_password(char *buf, int buf_size)
 }
 #endif
 
-static int print_block_option_help(const char *filename, const char *fmt)
+static int print_block_option_help(const char *filename, const char *fmt,
+   bool create_options)
 {
 BlockDriver *drv, *proto_drv;
-QEMUOptionParameter *create_options = NULL;
+QEMUOptionParameter *options = NULL;
 
 /* Find driver and parse its options */
 drv = bdrv_find_format(fmt);
@@ -197,12 +198,15 @@ static int print_block_option_help(const char *filename, 
const char *fmt)
 return 1;
 }
 
-create_options = append_option_parameters(create_options,
-  drv-create_options);
-create_options = append_option_parameters(create_options,
-  proto_drv-create_options);
-print_option_help(create_options);
-free_option_parameters(create_options);
+if (create_options) {
+options = append_option_parameters(options, drv-create_options);
+options = append_option_parameters(options, proto_drv-create_options);
+} else {
+options = append_option_parameters(options, drv-update_options);
+options = append_option_parameters(options, proto_drv-update_options);
+}
+print_option_help(options);
+free_option_parameters(options);
 return 0;
 }
 
@@ -337,7 +341,7 @@ static int img_create(int argc, char **argv)
 }
 
 if (options  !strcmp(options, ?)) {
-ret = print_block_option_help(filename, fmt);
+ret = print_block_option_help(filename, fmt, true /*create*/);
 goto out;
 }
 
@@ -631,7 +635,7 @@ static int img_convert(int argc, char **argv)
 out_filename = argv[argc - 1];
 
 if (options  !strcmp(options, ?)) {
-ret = print_block_option_help(out_filename, out_fmt);
+ret = print_block_option_help(out_filename, out_fmt, true /*create*/);
 goto out;
 }
 
@@ -869,7 +873,7 @@ static int img_convert(int argc, char **argv)
assume that sectors which are unallocated in the input image
are present in both the output's and input's base images (no
need to copy them). */
-if (out_baseimg) {
+if (out_baseimg || bs[bs_i]-backing_file[0]==0) {
 if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
n, n1)) {
 sector_num += n1;
@@ -1040,11 +1044,6 @@ static int img_info(int argc, char **argv)
 if (bdrv_is_encrypted(bs)) {
 printf(encrypted: yes\n);
 }
-if (bdrv_get_info(bs, bdi) = 0) {
-if (bdi.cluster_size != 0) {
-printf(cluster_size: %d\n, bdi.cluster_size);
-}
-}

[Qemu-devel] [PATCH 18/26] FVD: add support for base image prefetching

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds adaptive prefetching of base image to FVD.  FVD supports both
copy-on-write and copy-on-read of base image. Adaptive prefetching is similar
to copy-on-read except that it is initiated by the FVD driver rather than
triggered by the VM's read requests. FVD's prefetching is conservative in
that, if it detects resource contention, it will back off and temporarily
pause prefetching.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-prefetch.c |  600 +-
 block/fvd-read.c |1 +
 qemu-io-sim.c|   13 +
 3 files changed, 613 insertions(+), 1 deletions(-)

diff --git a/block/fvd-prefetch.c b/block/fvd-prefetch.c
index 5844aa7..b8be98c 100644
--- a/block/fvd-prefetch.c
+++ b/block/fvd-prefetch.c
@@ -11,7 +11,605 @@
  *
  */
 
+static void prefetch_read_cb(void *opaque, int ret);
+static void resume_prefetch(BlockDriverState * bs);
+static void do_next_prefetch_read(BlockDriverState * bs, int64_t current_time);
+
 void fvd_init_prefetch(void *opaque)
 {
-/* To be implemented. */
+BlockDriverState *bs = opaque;
+BDRVFvdState *s = bs-opaque;
+FvdAIOCB *acb;
+int i;
+
+QDEBUG(Start prefetching\n);
+
+if (!s-data_region_prepared) {
+init_data_region(s);
+}
+
+s-prefetch_acb = my_qemu_malloc(sizeof(FvdAIOCB *)*s-num_prefetch_slots);
+
+for (i = 0; i  s-num_prefetch_slots; i++) {
+acb = my_qemu_aio_get(fvd_aio_pool, bs, prefetch_null_cb, NULL);
+s-prefetch_acb[i] = acb;
+if (!acb) {
+int j;
+for (j = 0; j  i; j++) {
+my_qemu_aio_release(s-prefetch_acb[j]);
+s-prefetch_acb[j] = NULL;
+}
+
+my_qemu_free(s-prefetch_acb);
+s-prefetch_acb = NULL;
+fprintf(stderr, No acb and cannot start prefetching.\n);
+return;
+}
+
+acb-type = OP_COPY;
+acb-cancel_in_progress = false;
+}
+
+s-prefetch_state = PREFETCH_STATE_RUNNING;
+
+for (i = 0; i  s-num_prefetch_slots; i++) {
+acb = s-prefetch_acb[i];
+acb-copy.buffered_sector_begin = acb-copy.buffered_sector_end = 0;
+QLIST_INIT(acb-copy_lock.dependent_writes);
+acb-copy_lock.next.le_prev = NULL;
+acb-copy.hd_acb = NULL;
+acb-sector_num = 0;
+acb-nb_sectors = 0;
+acb-copy.iov.iov_len = s-sectors_per_prefetch * 512;
+acb-copy.buf = acb-copy.iov.iov_base =
+my_qemu_blockalign(bs-backing_hd, acb-copy.iov.iov_len);
+qemu_iovec_init_external(acb-copy.qiov, acb-copy.iov, 1);
+}
+
+if (s-prefetch_timer) {
+qemu_free_timer(s-prefetch_timer);
+s-prefetch_timer =
+qemu_new_timer(rt_clock, (QEMUTimerCB *) resume_prefetch, bs);
+}
+
+s-pause_prefetch_requested = false;
+s-unclaimed_prefetch_region_start = 0;
+s-prefetch_read_throughput = -1;   /* Indicate not initialized. */
+s-prefetch_write_throughput = -1;  /* Indicate not initialized. */
+s-prefetch_read_time = 0;
+s-prefetch_write_time = 0;
+s-prefetch_data_read = 0;
+s-prefetch_data_written = 0;
+s-next_prefetch_read_slot = 0;
+s-num_filled_prefetch_slots = 0;
+s-prefetch_read_active = false;
+
+do_next_prefetch_read(bs, qemu_get_clock(rt_clock));
+}
+
+static void pause_prefetch(BDRVFvdState * s)
+{
+int64_t ms = 1 + (int64_t) ((rand() / ((double)RAND_MAX))
+* s-prefetch_throttle_time);
+QDEBUG(Pause prefetch for % PRId64  milliseconds\n, ms);
+/* When the timer expires, it goes to resume_prefetch(). */
+qemu_mod_timer(s-prefetch_timer, qemu_get_clock(rt_clock) + ms);
+}
+
+/* Return true if every bit of freshbitmap is set to 1. */
+static bool all_data_prefetched(BDRVFvdState *s)
+{
+uint64_t n = s-base_img_sectors / s-block_size / sizeof(uint64_t) / 8;
+uint64_t *p = (uint64_t*)s-fresh_bitmap;
+uint64_t i;
+
+for (i = 0; i  n; i++, p++) {
+if (*p != UINT64_C(0x)) {
+return false;
+}
+}
+
+uint64_t sec = n * sizeof(uint64_t) * 8 * s-block_size;
+while (sec  s-base_img_sectors) {
+if (fresh_bitmap_show_sector_in_base_img(sec, s)) {
+return false;
+}
+sec += s-block_size;
+}
+
+return true;
+}
+
+static void terminate_prefetch(BlockDriverState * bs, int final_state)
+{
+BDRVFvdState *s = bs-opaque;
+int i;
+
+ASSERT(!s-prefetch_read_active  s-num_filled_prefetch_slots == 0);
+
+for (i = 0; i  s-num_prefetch_slots; i++) {
+if (s-prefetch_acb) {
+my_qemu_vfree(s-prefetch_acb[i]-copy.buf);
+my_qemu_aio_release(s-prefetch_acb[i]);
+s-prefetch_acb[i] = NULL;
+}
+}
+my_qemu_free(s-prefetch_acb);
+s-prefetch_acb = NULL;
+
+

Re: [Qemu-devel] [RESENT][PATCH] HACKING: Update status of format checking

2011-02-25 Thread Anthony Liguori


On 02/25/2011 04:20 PM, Stefan Weil wrote:

This patch was already sent on 2011-01-24:

Hopefully all functions with printf like arguments now use format 
checking.


This was tested with default build configuration on linux
and windows hosts (including some cross compilations),
so chances are good that there remain few (if any) functions
without format checking.

Therefore the last comment in HACKING is no longer valid but misleading.

Cc: Blue Swirl blauwir...@gmail.com
Signed-off-by: Stefan Weil w...@mail.berlios.de


Applied.  Thanks.

Regards,

Anthony Liguori


---
 HACKING |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/HACKING b/HACKING
index 6ba9d7e..3af53fd 100644
--- a/HACKING
+++ b/HACKING
@@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype.
 This makes it so gcc's -Wformat and -Wformat-security options can do
 their jobs and cross-check format strings with the number and types
 of arguments.
-
-Currently many functions in QEMU are not following this rule but
-patches to add the attribute would be very much appreciated.

[Qemu-devel] [PATCH 11/26] FVD: add impl of interface bdrv_aio_writev()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_aio_writev() interface. It
supports copy-on-write in FVD.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-bitmap.c  |  150 
 block/fvd-journal.c |4 +
 block/fvd-store.c   |   20 +++
 block/fvd-write.c   |  468 ++-
 block/fvd.c |4 +-
 block/fvd.h |1 +
 6 files changed, 645 insertions(+), 2 deletions(-)
 create mode 100644 block/fvd-bitmap.c
 create mode 100644 block/fvd-store.c

diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c
new file mode 100644
index 000..7e96201
--- /dev/null
+++ b/block/fvd-bitmap.c
@@ -0,0 +1,150 @@
+/*
+ * QEMU Fast Virtual Disk Format Utility Functions for Bitmap
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static inline bool stale_bitmap_show_sector_in_base_img(int64_t sector_num,
+const BDRVFvdState * s)
+{
+if (sector_num = s-base_img_sectors) {
+return false;
+}
+
+int64_t block_num = sector_num / s-block_size;
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t b = s-stale_bitmap[bitmap_byte_offset];
+return 0 == (int)((b  bitmap_bit_offset)  0x01);
+}
+
+static inline bool fresh_bitmap_show_sector_in_base_img(int64_t sector_num,
+const BDRVFvdState * s)
+{
+if (sector_num = s-base_img_sectors) {
+return false;
+}
+
+int64_t block_num = sector_num / s-block_size;
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t b = s-fresh_bitmap[bitmap_byte_offset];
+return 0 == (int)((b  bitmap_bit_offset)  0x01);
+}
+
+static inline void update_fresh_bitmap(int64_t sector_num, int nb_sectors,
+   const BDRVFvdState * s)
+{
+if (sector_num = s-base_img_sectors) {
+return;
+}
+
+int64_t end = sector_num + nb_sectors;
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+
+int64_t block_num = sector_num / s-block_size;
+int64_t block_end = (end - 1) / s-block_size;
+
+for (; block_num = block_end; block_num++) {
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t mask = (uint8_t) (0x01  bitmap_bit_offset);
+uint8_t b = s-fresh_bitmap[bitmap_byte_offset];
+if (!(b  mask)) {
+b |= mask;
+s-fresh_bitmap[bitmap_byte_offset] = b;
+}
+}
+}
+
+static inline bool bitmap_show_sector_in_base_img(int64_t sector_num,
+  const BDRVFvdState * s,
+  int bitmap_offset,
+  uint8_t * bitmap)
+{
+if (sector_num = s-base_img_sectors) {
+return false;
+}
+
+int64_t block_num = sector_num / s-block_size;
+int64_t bitmap_byte_offset = block_num / 8 - bitmap_offset;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t b = bitmap[bitmap_byte_offset];
+return 0 == (int)((b  bitmap_bit_offset)  0x01);
+}
+
+static inline bool stale_bitmap_need_update(FvdAIOCB * acb)
+{
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+int64_t end = acb-sector_num + acb-nb_sectors;
+
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+int64_t block_end = (end - 1) / s-block_size;
+int64_t block_num = acb-sector_num / s-block_size;
+
+for (; block_num = block_end; block_num++) {
+int64_t bitmap_byte_offset = block_num / 8;
+uint8_t bitmap_bit_offset = block_num % 8;
+uint8_t mask = (uint8_t) (0x01  bitmap_bit_offset);
+uint8_t b = s-stale_bitmap[bitmap_byte_offset];
+if (!(b  mask)) {
+return true;
+}
+}
+
+return false;
+}
+
+/* Return true if stable_bitmap needs update. */
+static bool update_fresh_bitmap_and_check_stale_bitmap(FvdAIOCB * acb)
+{
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+
+if (acb-sector_num = s-base_img_sectors) {
+return false;
+}
+
+bool need_update = false;
+int64_t end = acb-sector_num + acb-nb_sectors;
+
+if (end  s-base_img_sectors) {
+end = s-base_img_sectors;
+}
+
+int64_t block_end = (end - 1) / s-block_size;
+int64_t block_num = acb-sector_num / s-block_size;
+
+for (; block_num = block_end; block_num++) {
+int64_t bitmap_byte_offset = block_num / 8;
+

[Qemu-devel] [PATCH 01/26] FVD: add simulated block driver 'blksim'

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the 'blksim' block device driver, which is a tool to
facilitate testing and debugging. blksim operates on a RAW image, but it uses
neither AIO nor posix threads to perform actual I/Os.  blksim function like an
event-driven disk simulator, and allows a block device driver developer to
fully control the order of disk I/Os, the order of callbacks, and the return
code of every I/O operation. The purpose is to extensively test a block device
driver under failures and race conditions.  Bugs found by blksim under rare
race conditions are guranteed to be precisely reproducible.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 Makefile.objs  |1 +
 block/blksim.c |  757 
 block/blksim.h |   35 +++
 3 files changed, 793 insertions(+), 0 deletions(-)
 create mode 100644 block/blksim.c
 create mode 100644 block/blksim.h

diff --git a/Makefile.objs b/Makefile.objs
index 9e98a66..264aab3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,6 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
qcow2-snapshot.o qcow
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += blksim.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blksim.c b/block/blksim.c
new file mode 100644
index 000..5c7ef43
--- /dev/null
+++ b/block/blksim.c
@@ -0,0 +1,757 @@
+/*
+ * QEMU Simulated Block Device to Facilitate Testing and Debugging
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include block_int.h
+#include osdep.h
+#include qemu-option.h
+#include qemu-timer.h
+#include block.h
+#include qemu-queue.h
+#include qemu-common.h
+#include block/blksim.h
+
+#if 1
+# define QDEBUG(format,...) do {} while (0)
+#else
+# define QDEBUG printf
+#endif
+
+typedef enum
+{
+SIM_NULL,
+SIM_READ,
+SIM_WRITE,
+SIM_FLUSH,
+SIM_READ_CALLBACK,
+SIM_WRITE_CALLBACK,
+SIM_FLUSH_CALLBACK,
+SIM_TIMER
+} sim_op_t;
+
+static void sim_aio_cancel(BlockDriverAIOCB * acb);
+static int64_t sim_uuid = 0;
+static int64_t current_time = 0;
+static int64_t rand_time = 0;
+static int interactive_print = true;
+static int blksim_invoked = false;
+static bool instant_qemubh = true;
+struct SimAIOCB;
+
+/*
+ * Note: disk_io_return_code, set_disk_io_return_code(), and insert_task() work
+ * together to ensure that multiple subrequests triggered by the same
+ * outtermost request either succeed together or fail together. This behavior
+ * is required by qemu-test.  Here is one example of problems caused by
+ * departuring from this behavior.  Consider a write request that generates
+ * two subrequests, w1 and w2. If w1 succeeds but w2 fails, the data will not
+ * be written into qemu-test's truth image but the part of the data handled
+ * by w1 will be written into qemu-test's test image. As a result, their
+ * contents diverge can automated testing cannot continue.
+ */
+static int disk_io_return_code = 0;
+
+typedef struct BDRVSimState
+{
+int fd;
+} BDRVSimState;
+
+typedef struct SimAIOCB
+{
+BlockDriverAIOCB common;
+int64_t uuid;
+sim_op_t op;
+int64_t sector_num;
+QEMUIOVector *qiov;
+int nb_sectors;
+int ret;
+int64_t time;
+struct SimAIOCB *next;
+struct SimAIOCB *prev;
+
+} SimAIOCB;
+
+static AIOPool sim_aio_pool = {
+.aiocb_size = sizeof(SimAIOCB),
+.cancel = sim_aio_cancel,
+};
+
+static SimAIOCB head = {
+.uuid = -1,
+.time = (int64_t) (9223372036854775807ULL),
+.op = SIM_NULL,
+.next = head,
+.prev = head,
+};
+
+/* Debug a specific task.*/
+#if 0
+static inline void CHECK_TASK(int64_t uuid)
+{
+if (uuid == 19LL) {
+printf(CHECK_TASK pause for task % PRId64 \n, uuid);
+}
+}
+#else
+#  define CHECK_TASK(acb) do { } while (0)
+#endif
+
+/* do_io() should never fail. A failure indicates a bug in the upper layer
+ * block device driver, or failure in the real hardware. */
+static int do_io(BlockDriverState * bs, int64_t sector_num, uint8_t * buf,
+ int nb_sectors, int do_read)
+{
+BDRVSimState *s = bs-opaque;
+size_t size = nb_sectors * 512;
+uint8_t *new_buf, *p;
+int ret;
+
+if (interactive_print) {
+printf (Do %s %s sector_num=%PRId64 nb_sectors=%d\n,
+do_read ? READ : WRITE, bs-filename,
+sector_num, nb_sectors);
+}
+
+if ((ret=lseek(s-fd, sector_num * 512, SEEK_SET))  0) {
+fprintf(stderr, Error: lseek %s sector_num=%PRId64\n,
+

[Qemu-devel] [PATCH 04/26] FVD: add fully automated test-vdi.sh

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

test-vdi.sh drives 'qemu-io --auto' to perform fully automated testing for VDI.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 test-vdi.sh |   83 +++
 1 files changed, 83 insertions(+), 0 deletions(-)
 create mode 100755 test-vdi.sh

diff --git a/test-vdi.sh b/test-vdi.sh
new file mode 100755
index 000..b0bfe65
--- /dev/null
+++ b/test-vdi.sh
@@ -0,0 +1,83 @@
+#!/bin/bash
+
+# Drive 'qemu-io --auto' to test the VDI image format.
+#
+# Copyright IBM, Corp. 2010
+#
+# Authors:
+# Chunqiang Tang ct...@us.ibm.com
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+if [ $USER != root ]; then
+echo This command must be run by root in order to mount tmpfs.
+exit 1
+fi
+
+QEMU_DIR=.
+QEMU_IMG=$QEMU_DIR/qemu-img
+QEMU_IO=$QEMU_DIR/qemu-io
+
+if [ ! -e $QEMU_IMG ]; then
+echo $QEMU_IMG does not exist.
+exit 1;
+fi
+
+if [ ! -e $QEMU_IO ]; then
+echo $QEMU_IO does not exist.
+exit 1;
+fi
+
+DATA_DIR=/var/ramdisk
+TRUTH_IMG=$DATA_DIR/truth.raw
+TEST_IMG=$DATA_DIR/test.vdi
+CMD_LOG=./test-vdi.log
+
+parallel=10
+round=1000
+fail_prob=0.1
+cancel_prob=0
+flush_prob=0
+aio_flush_prob=0
+instant_qemubh=true
+seed=$RANDOM$RANDOM
+count=0
+
+function invoke() {
+echo $*  $CMD_LOG
+$*
+ret=$?
+if [ $? -ne 0 ]; then
+echo Exit with error code $?: $*
+exit $ret
+fi
+}
+
+mount | grep $DATA_DIR  /dev/null
+if [ $? -ne 0 ]; then
+echo Create tmpfs at $DATA_DIR to store testing images.
+if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi
+invoke mount -t tmpfs none $DATA_DIR -o size=400M
+if [ $? -ne 0 ]; then exit 1; fi
+fi
+
+/bin/rm -f $CMD_LOG
+touch $CMD_LOG
+
+while [ -t ]; do
+for io_size in 3145728; do
+count=$[$count + 1]
+echo Round $count  $CMD_LOG
+
+# VDI image is about 100M
+img_size=$[(104857600 + ($RANDOM$RANDOM$RANDOM % 10485760)) / 512 * 512]
+
+invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG
+invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size
+invoke $QEMU_IMG create -f vdi $TEST_IMG $img_size
+
+invoke $QEMU_IO --auto --seed=$seed --truth=$TRUTH_IMG --format=vdi 
--test=blksim:$TEST_IMG --verify_write=true --compare_before=false 
--compare_after=true --round=$round --parallel=$parallel --io_size=$io_size 
--fail_prob=$fail_prob --cancel_prob=$cancel_prob 
--aio_flush_prob=$aio_flush_prob --flush_prob=$flush_prob 
--instant_qemubh=$instant_qemubh
+
+seed=$[$seed + 1]
+done; done
-- 
1.7.0.4

Re: [Qemu-devel] [patch 2/3] Add support for live block copy

2011-02-25 Thread Marcelo Tosatti

On Wed, Feb 23, 2011 at 01:06:46PM -0600, Anthony Liguori wrote:
 On 02/22/2011 11:00 AM, Marcelo Tosatti wrote:
 Index: qemu/qerror.h
 ===
 --- qemu.orig/qerror.h
 +++ qemu/qerror.h
 @@ -171,4 +171,13 @@ QError *qobject_to_qerror(const QObject
   #define QERR_VNC_SERVER_FAILED \
   { 'class': 'VNCServerFailed', 'data': { 'target': %s } }
 
 +#define QERR_BLOCKCOPY_IN_PROGRESS \
 +{ 'class': 'BlockCopyInProgress', 'data': { 'device': %s } }
 
 The caller already knows the device name by virtue of issuing the
 command so this is redundant.
 
 I think a better approach would be a QERR_IN_PROGRESS 'data': {
 'operation': %s }
 
 For block copy, we'd say QERR_IN_PROGRESS(block copy).
 
 +
 +#define QERR_BLOCKCOPY_IMAGE_SIZE_DIFFERS \
 +{ 'class': 'BlockCopyImageSizeDiffers', 'data': {} }
 +
 +#define QERR_MIGRATION_IN_PROGRESS \
 +{ 'class': 'MigrationInProgress', 'data': {} }
 
 Then QERR_IN_PROGRESS(live migration)

Can the error format change like that? What about applications that make
use of it? If it can change, sure. (libvirt.git does not seem to be
aware of MigrationInProgress).

   #endif /* QERROR_H */
 Index: qemu/qmp-commands.hx
 ===
 --- qemu.orig/qmp-commands.hx
 +++ qemu/qmp-commands.hx
 @@ -581,6 +581,75 @@ Example:
   EQMP
 
   {
 +.name   = block_copy,
 +.args_type  = device:s,filename:s,commit_filename:s?,inc:-i,
 +.params = device filename [commit_filename] [-i],
 +.help   = live block copy device to image
 +  \n\t\t\t optional commit filename 
 +  \n\t\t\t -i for incremental copy 
 +  (base image shared between src and destination),
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_bdrv_copy,
 +},
 +
 +SQMP
 +block-copy
 +---
 +
 +Live block copy.
 
 I'm not sure copy really describes what we're doing here.  Maybe
 migrate-block?

Problem its easy to confuse migrate-block with block migration. I
could not come up with a better, non-confusing name than live block
copy.

 +Arguments:
 +
 +- device: device name (json-string)
 +- filename: target image filename (json-string)
 
 Is this a created image?  Is this an image to create?

A previously created image.

 To future proof for blockdev, we should make this argument optional
 and if it's not specified throw an error about missing argument.
 This let's us introduce an optional blockdev argument such that we
 can use a blockdev name.

What you mean blockdev?

 
 +- commit_filename: target commit filename (json-string, optional)
 
 I think we should drop this.

Why? Sorry but this can't wait for non-config persistent storage. This
mistake was made in the past with irqchip for example, lets not repeat
it.

Its OK to deprecate commit_filename in favour of its location in
non-config persistent storage. 

Its not the end of the world for a mgmt app to handle change (not saying
its not a good principle) such as this.

 +- inc: incremental disk copy (json-bool, optional)
 
 Let's use the full name (incremental) and we need to describe in
 detail what the semantics of this are.  Will it scan the target
 block device to identify identical blocks?

No, it does not attempt to identify identical blocks, yet.

You are right, i'll write down a document to describe these details.

 
 +Example:
 +
 +-  { execute: block_copy,
 +arguments: { device: ide0-hd1,
 +   filename: /mnt/new-disk.img,
 +   commit_filename: /mnt/commit-new-disk.img
 + } }
 +
 +- { return: {} }
 +
 +Notes:
 +
 +(1) The 'query-block-copy' command should be used to check block copy 
 progress
 +and final result (this information is provided by the 'status' member)
 +(2) Boolean argument inc defaults to false
 
 We should also document error semantics.  What errors are expected and why?

Fair.

 +EQMP
 +
 +{
 +.name   = block_copy_cancel,
 +.args_type  = device:s,
 +.params = device,
 +.help   = cancel live block copy,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_bdrv_copy_cancel,
 +},
 +
 +SQMP
 +block_copy_cancel
 +--
 +
 +Cancel live block copy.
 +
 +Arguments:
 +
 +- device: device name (json-string)
 +
 +Example:
 +
 +-  { execute: block_copy_cancel, arguments: { device: ide0-hd1 } 
 }
 +- { return: {} }
 
 cancel-block-migration?

Again, conflicts with block migration from live migration.

 What happens if:
  - No block copy is active anymore (it's completed)

cancel succeeds.

  - A block copy was never started

qerror_report(QERR_DEVICE_NOT_FOUND, device);

  - device refers to a device that no longer exists

qerror_report(QERR_DEVICE_NOT_FOUND, device);

  - device refers to a device with no

[Qemu-devel] [PATCH 09/26] FVD: add impl of interface bdrv_create()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_create() interface. It
supports FVD image creation.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-create.c  |  702 ++-
 block/fvd-journal.c |5 +
 block/fvd.c |2 +-
 3 files changed, 707 insertions(+), 2 deletions(-)

diff --git a/block/fvd-create.c b/block/fvd-create.c
index 5593cea..c8912aa 100644
--- a/block/fvd-create.c
+++ b/block/fvd-create.c
@@ -11,11 +11,711 @@
  *
  */
 
+static void fvd_header_cpu_to_le(FvdHeader * header);
+static inline int64_t calc_min_journal_size(int64_t table_entries);
+static inline int search_empty_blocks(int fd, uint8_t * bitmap,
+  BlockDriverState * bs,
+  int64_t nb_sectors,
+  int32_t hole_size,
+  int32_t block_size);
+
 static int fvd_create(const char *filename, QEMUOptionParameter * options)
 {
-return -ENOTSUP;
+int fd, ret = 0;
+FvdHeader *header;
+int64_t virtual_disk_size = DEF_PAGE_SIZE;
+int32_t header_size;
+const char *base_img = NULL;
+const char *base_img_fmt = NULL;
+const char *data_file = NULL;
+const char *data_file_fmt = NULL;
+int32_t hole_size = 0;
+int copy_on_read = false;
+int prefetch_start_delay = -1;
+BlockDriverState *bs = NULL;
+int bitmap_size = 0;
+int64_t base_img_size = 0;
+int64_t table_size = 0;
+int64_t journal_size = 0;
+int32_t block_size = 0;
+int compact_image = false;
+uint64_t max_copy_on_read = MAX_OUTSTANDING_COPY_ON_READ_DATA;
+uint32_t num_prefetch_slots = NUM_PREFETCH_SLOTS;
+uint64_t bytes_per_prefetch = BYTES_PER_PREFETCH;
+uint64_t prefetch_throttle_time = PREFETCH_THROTTLING_TIME;
+uint64_t prefetch_read_measure_time = PREFETCH_MIN_MEASURE_READ_TIME;
+uint64_t prefetch_write_measure_time = PREFETCH_MIN_MEASURE_WRITE_TIME;
+uint64_t prefetch_min_read_throughput = PREFETCH_MIN_READ_THROUGHPUT;
+uint64_t prefetch_min_write_throughput = PREFETCH_MIN_WRITE_THROUGHPUT;
+uint64_t prefetch_max_read_throughput = PREFETCH_MAX_READ_THROUGHPUT;
+uint64_t prefetch_max_write_throughput = PREFETCH_MAX_WRITE_THROUGHPUT;
+
+header_size = sizeof(FvdHeader);
+header_size = ROUND_UP(header_size, DEF_PAGE_SIZE);
+header = my_qemu_mallocz(header_size);
+header-header_size = header_size;
+
+/* Read out options */
+while (options  options-name) {
+if (!strcmp(options-name, BLOCK_OPT_SIZE)) {
+virtual_disk_size = options-value.n;
+} else if (!strcmp(options-name, prefetch_start_delay)) {
+if (options-value.n = 0) {
+prefetch_start_delay = -1;
+} else {
+prefetch_start_delay = options-value.n;
+}
+} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FILE)) {
+base_img = options-value.s;
+} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FMT)) {
+base_img_fmt = options-value.s;
+} else if (!strcmp(options-name, copy_on_read)) {
+copy_on_read = options-value.n;
+} else if (!strcmp(options-name, data_file)) {
+data_file = options-value.s;
+} else if (!strcmp(options-name, data_file_fmt)) {
+data_file_fmt = options-value.s;
+} else if (!strcmp(options-name, optimize_empty_block)) {
+hole_size = options-value.n;
+} else if (!strcmp(options-name, compact_image)) {
+compact_image = options-value.n;
+} else if (!strcmp(options-name, block_size)) {
+block_size = options-value.n;
+} else if (!strcmp(options-name, chunk_size)) {
+header-chunk_size = options-value.n;
+} else if (!strcmp(options-name, journal_size)) {
+journal_size = options-value.n;
+} else if (!strcmp(options-name, journal_buf_size)) {
+header-journal_buf_size = options-value.n;
+} else if (!strcmp(options-name, journal_clean_buf_period)) {
+header-journal_clean_buf_period = options-value.n;
+} else if (!strcmp(options-name, storage_grow_unit)) {
+header-storage_grow_unit = options-value.n;
+} else if (!strcmp(options-name, add_storage_cmd) 
+   options-value.s) {
+pstrcpy(header-add_storage_cmd, sizeof(header-add_storage_cmd),
+options-value.s);
+} else if (!strcmp(options-name, num_prefetch_slots) 
+   options-value.n  0) {
+num_prefetch_slots = options-value.n;
+} else if (!strcmp(options-name, bytes_per_prefetch) 
+   options-value.n  0) {
+bytes_per_prefetch = options-value.n;
+} else if

[Qemu-devel] [PATCH 19/26] FVD: add support for aio_cancel

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the support for aio_cancel into FVD. FVD faithfully cleans up
all resources upon aio_cancel.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-journal-buf.c |   16 +++
 block/fvd-load.c|   24 +
 block/fvd-misc.c|   67 +++
 block/fvd-read.c|   37 ++
 block/fvd-store.c   |   31 +
 block/fvd-write.c   |   23 +++-
 block/fvd.c |   25 +
 7 files changed, 222 insertions(+), 1 deletions(-)

diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c
index e99a585..c6b60f9 100644
--- a/block/fvd-journal-buf.c
+++ b/block/fvd-journal-buf.c
@@ -360,6 +360,22 @@ use_current_buf:
 return s-bjnl.buf;
 }
 
+static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb)
+{
+BlockDriverState *bs = acb-common.bs;
+BDRVFvdState *s = bs-opaque;
+QTAILQ_REMOVE(s-bjnl.queued_bufs, acb, jcb.bjnl_next_queued_buf);
+my_qemu_aio_release(acb);
+}
+
+static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb)
+{
+/* OP_BJNL_BUF_WRITE is never exposed to any external entity, and this
+ * should not be invoked. Internal cancellation of OP_BJNL_BUF_WRITE
+ * is handled by bjnl_sync_flush(). */
+abort();
+}
+
 static void bjnl_clean_buf_timer_cb(BlockDriverState * bs)
 {
 BDRVFvdState *s = bs-opaque;
diff --git a/block/fvd-load.c b/block/fvd-load.c
index 88e5fb4..9789cc5 100644
--- a/block/fvd-load.c
+++ b/block/fvd-load.c
@@ -188,6 +188,30 @@ static inline FvdAIOCB *init_load_acb(FvdAIOCB * 
parent_acb,
 return acb;
 }
 
+static void fvd_aio_cancel_wrapper(FvdAIOCB * acb)
+{
+qemu_bh_cancel(acb-wrapper.bh);
+qemu_bh_delete(acb-wrapper.bh);
+my_qemu_aio_release(acb);
+}
+
+static void fvd_aio_cancel_load_compact(FvdAIOCB * acb)
+{
+if (acb-load.children) {
+int i;
+for (i = 0; i  acb-load.num_children; i++) {
+if (acb-load.children[i].hd_acb) {
+bdrv_aio_cancel(acb-load.children[i].hd_acb);
+}
+}
+my_qemu_free(acb-load.children);
+}
+if (acb-load.one_child.hd_acb) {
+bdrv_aio_cancel(acb-load.one_child.hd_acb);
+}
+my_qemu_aio_release(acb);
+}
+
 static inline int load_create_one_child(bool count_only, bool empty,
 QEMUIOVector * orig_qiov, int *iov_index, size_t *iov_left,
 uint8_t **iov_buf, int64_t start_sec, int 
sectors_in_region,
diff --git a/block/fvd-misc.c b/block/fvd-misc.c
index f4e1038..a42bfac 100644
--- a/block/fvd-misc.c
+++ b/block/fvd-misc.c
@@ -11,6 +11,73 @@
  *
  */
 
+static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb);
+static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb);
+static void fvd_aio_cancel_read(FvdAIOCB * acb);
+static void fvd_aio_cancel_write(FvdAIOCB * acb);
+static void fvd_aio_cancel_copy(FvdAIOCB * acb);
+static void fvd_aio_cancel_load_compact(FvdAIOCB * acb);
+static void fvd_aio_cancel_store_compact(FvdAIOCB * acb);
+static void fvd_aio_cancel_wrapper(FvdAIOCB * acb);
+static void flush_metadata_to_disk_on_exit (BlockDriverState *bs);
+
+static void fvd_aio_cancel_flush(FvdAIOCB * acb)
+{
+if (acb-flush.data_acb) {
+bdrv_aio_cancel(acb-flush.data_acb);
+}
+if (acb-flush.metadata_acb) {
+bdrv_aio_cancel(acb-flush.metadata_acb);
+}
+my_qemu_aio_release(acb);
+}
+
+static void fvd_aio_cancel(BlockDriverAIOCB * blockacb)
+{
+FvdAIOCB *acb = container_of(blockacb, FvdAIOCB, common);
+
+QDEBUG(CANCEL: acb%llu-%p\n, acb-uuid, acb);
+acb-cancel_in_progress = true;
+
+switch (acb-type) {
+case OP_READ:
+fvd_aio_cancel_read(acb);
+break;
+
+case OP_WRITE:
+fvd_aio_cancel_write(acb);
+break;
+
+case OP_COPY:
+fvd_aio_cancel_copy(acb);
+break;
+
+case OP_LOAD_COMPACT:
+fvd_aio_cancel_load_compact(acb);
+break;
+
+case OP_STORE_COMPACT:
+fvd_aio_cancel_store_compact(acb);
+break;
+
+case OP_WRAPPER:
+fvd_aio_cancel_wrapper(acb);
+break;
+
+case OP_FLUSH:
+fvd_aio_cancel_flush(acb);
+break;
+
+case OP_BJNL_BUF_WRITE:
+fvd_aio_cancel_bjnl_buf_write(acb);
+break;
+
+case OP_BJNL_FLUSH:
+fvd_aio_cancel_bjnl_flush(acb);
+break;
+}
+}
+
 static void fvd_close(BlockDriverState * bs)
 {
 }
diff --git a/block/fvd-read.c b/block/fvd-read.c
index 675af9e..b18fdf2 100644
--- a/block/fvd-read.c
+++ b/block/fvd-read.c
@@ -502,3 +502,40 @@ static inline void calc_read_region(BDRVFvdState * s, 
int64_t sector_num,
 *p_first_sec_in_backing = first_sec_in_backing;
 *p_last_sec_in_backing = last_sec_in_backing;
 }
+
+static void fvd_aio_cancel_read(FvdAIOCB * acb)
+{
+if

Re: [Qemu-devel] [PATCH] vnc: fix a memory leak in threaded vnc server

2011-02-25 Thread Anthony Liguori


On 02/25/2011 03:54 PM, Corentin Chary wrote:

VncJobQueue's buffer is intended to be used for
as the output buffer for all operations in this queue,
but unfortunatly.

vnc_async_encoding_start() is in charge of setting this
buffer as the current output buffer, but
vnc_async_encoding_end() was not writting the changes back
to VncJobQueue, resulting in a big and ugly memleak.

Signed-off-by: Corentin Charycorenti...@iksaif.net
   


Applied.  Thanks.

Regards,

Anthony Liguori


---
I believe this is a (slightly) better patch than Bruce's one, because
it reduce memory allocations by using always the same buffer.

  ui/vnc-jobs-async.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
index 1d4c5e7..f596247 100644
--- a/ui/vnc-jobs-async.c
+++ b/ui/vnc-jobs-async.c
@@ -186,6 +186,8 @@ static void vnc_async_encoding_end(VncState *orig, VncState 
*local)
  orig-hextile = local-hextile;
  orig-zrle = local-zrle;
  orig-lossy_rect = local-lossy_rect;
+
+queue-buffer = local-output;
  }

  static int vnc_worker_thread_loop(VncJobQueue *queue)

[Qemu-devel] [PATCH 02/26] FVD: extend qemu-io to do fully automated testing

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch extends qemu-io in two ways. First, it adds the 'sim' command to
work with the simulated block device driver 'blksim', which allows a developer
to fully control the order of disk I/Os, the order of callbacks, and the
return code of every I/O operation. Second, it adds a fully automated testing
mode, 'qemu-io --auto'. This mode can, e.g., simulate 1,000 threads
concurrently submitting overlapping disk I/O requests to QEMU block drivers,
use blksim to inject I/O errors and race conditions, and automatically verify
the correctness of I/O results. This tool can run unattended to exercise an
unlimited number of randomized test cases. Once it finds a bug, the bug is
precisely repeatable with the help of blksim, even if it is a rare race
condition bug. This makes debugging much easier.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 qemu-io-auto.c |  947 
 qemu-io-sim.c  |  127 
 qemu-io.c  |   50 +++-
 qemu-tool.c|  107 ++-
 4 files changed, 1209 insertions(+), 22 deletions(-)
 create mode 100644 qemu-io-auto.c
 create mode 100644 qemu-io-sim.c

diff --git a/qemu-io-auto.c b/qemu-io-auto.c
new file mode 100644
index 000..73d79c7
--- /dev/null
+++ b/qemu-io-auto.c
@@ -0,0 +1,947 @@
+/*
+ * Extension of qemu-io to perform automated random tests
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+/*=
+ *  This module implements a fully automated testing tool for block device
+ *  drivers. It works with block/blksim.c to test race conditions by
+ *  randomizing event timing. It is recommended to perform automated testing
+ *  on a ramdisk or tmpfs, which stores files in memory and avoids wearing out
+ *  the disk. Below is one example of using qemu-io to perform a fully
+ *  automated testing.
+
+# mount -t tmpfs none /var/tmpfs -o size=4G
+# dd if=/dev/zero of=/var/tmpfs/truth.raw count=0 bs=1 seek=1G
+# dd if=/dev/zero of=/var/tmpfs/zero-500M.raw count=0 bs=1 seek=500M
+# qemu-img create -f qcow2 -obacking_fmt=blksim -b /var/tmpfs/zero-500M.raw \
+/var/tmpfs/test.qcow2 1G
+# qemu-io --auto --seed=1 --truth=/var/tmpfs/truth.raw --format=qcow2 \
+--test=blksim:/var/tmpfs/test.qcow2 --verify_write=true \
+--compare_before=false --compare_after=true --round=10 \
+--parallel=1000 --io_size=10485760 --fail_prob=0 --cancel_prob=0 \
+--instant_qemubh=true
+ *=
+ */
+
+#include qemu-timer.h
+#include qemu-common.h
+#include block_int.h
+#include block/blksim.h
+
+#if 1
+# define QDEBUG(format,...) do {} while (0)
+#else
+# define QDEBUG printf
+#endif
+
+#define die(format,...) \
+do { \
+fprintf (stderr, %s:%d --- , __FILE__, __LINE__); \
+fprintf (stderr, format, ##__VA_ARGS__); \
+abort(); \
+} while(0)
+
+typedef enum { OP_NULL = 0, OP_READ, OP_WRITE, OP_FLUSH,
+OP_AIO_FLUSH } op_type_t;
+const char *op_type_str[] = { NULL, READ, WRITE, FLUSH, AIO_FLUSH};
+
+typedef struct CompareFullCB
+{
+QEMUIOVector qiov;
+struct iovec iov;
+int64_t sector_num;
+int nb_sectors;
+int max_nb_sectors;
+uint8_t *truth_buf;
+} CompareFullCB;
+
+typedef struct RandomIO
+{
+QEMUIOVector qiov;
+int64_t sector_num;
+int nb_sectors;
+uint8_t *truth_buf;
+uint8_t *test_buf;
+op_type_t type;
+int tester;
+int64_t uuid;
+int allow_cancel;
+BlockDriverAIOCB *acb;
+} RandomIO;
+
+static int fd;
+static int64_t total_sectors;
+static int64_t io_size = 262144;
+static bool verify_write = false;
+static int parallel = 1;
+static int max_iov = 10;
+static int64_t round = 10;
+static int64_t finished_round = 0;
+static RandomIO *testers = NULL;
+static double fail_prob = 0;
+static double cancel_prob = 0;
+static double aio_flush_prob = 0;
+static double flush_prob = 0;
+static int64_t rand_time = 1000;
+static int64_t test_uuid = 0;
+static int finished_testers = 0;
+
+static void rand_io_cb(void *opaque, int ret);
+static void perform_next_io(RandomIO * r);
+
+static void auto_test_usage(void)
+{
+printf(%s --auto [--help]\n
+   \t[--truth=truth_img]\n
+   \t[--test=img_to_test]\n
+   \t[--seed=#d]\n
+   \t[--format=test_img_fmt]\n
+   \t[--round=#d]\n
+   \t[--instant_qemubh=true|false]\n
+   \t[--fail_prob=#f]\n
+   \t[--cancel_prob=#f]\n
+   \t[--aio_flush_prob=#f]\n
+   \t[--flush_prob=#f]\n
+   \t[--io_size=#d]\n
+   \t[--verify_write=[true|false]]\n
+   \t[--parallel=[#d]\n
+

[Qemu-devel] [PATCH 06/26] FVD: skeleton of Fast Virtual Disk

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds the skeleton of the block device driver for
Fast Virtual Disk (FVD).

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 Makefile.objs  |2 +-
 block/fvd-create.c |   21 +++
 block/fvd-flush.c  |   24 +++
 block/fvd-misc.c   |   37 +++
 block/fvd-open.c   |   17 +
 block/fvd-read.c   |   21 +++
 block/fvd-update.c |   21 +++
 block/fvd-write.c  |   21 +++
 block/fvd.c|   60 ++
 block/fvd.h|  171 
 10 files changed, 394 insertions(+), 1 deletions(-)
 create mode 100644 block/fvd-create.c
 create mode 100644 block/fvd-flush.c
 create mode 100644 block/fvd-misc.c
 create mode 100644 block/fvd-open.c
 create mode 100644 block/fvd-read.c
 create mode 100644 block/fvd-update.c
 create mode 100644 block/fvd-write.c
 create mode 100644 block/fvd.c
 create mode 100644 block/fvd.h

diff --git a/Makefile.objs b/Makefile.objs
index 264aab3..9185d3e 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
qcow2-snapshot.o qcow
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
-block-nested-y += blksim.o
+block-nested-y += blksim.o fvd.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/fvd-create.c b/block/fvd-create.c
new file mode 100644
index 000..5593cea
--- /dev/null
+++ b/block/fvd-create.c
@@ -0,0 +1,21 @@
+/*
+ * QEMU Fast Virtual Disk Format bdrv_create()
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static int fvd_create(const char *filename, QEMUOptionParameter * options)
+{
+return -ENOTSUP;
+}
+
+static QEMUOptionParameter fvd_create_options[] = {
+{NULL}
+};
diff --git a/block/fvd-flush.c b/block/fvd-flush.c
new file mode 100644
index 000..34bd5cb
--- /dev/null
+++ b/block/fvd-flush.c
@@ -0,0 +1,24 @@
+/*
+ * QEMU Fast Virtual Disk Format bdrv_flush() and bdrv_aio_flush()
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static BlockDriverAIOCB *fvd_aio_flush(BlockDriverState * bs,
+   BlockDriverCompletionFunc * cb,
+   void *opaque)
+{
+return NULL;
+}
+
+static int fvd_flush(BlockDriverState * bs)
+{
+return -ENOTSUP;
+}
diff --git a/block/fvd-misc.c b/block/fvd-misc.c
new file mode 100644
index 000..f4e1038
--- /dev/null
+++ b/block/fvd-misc.c
@@ -0,0 +1,37 @@
+/*
+ * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static void fvd_close(BlockDriverState * bs)
+{
+}
+
+static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename)
+{
+return 0;
+}
+
+static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num,
+int nb_sectors, int *pnum)
+{
+return 0;
+}
+
+static int fvd_get_info(BlockDriverState * bs, BlockDriverInfo * bdi)
+{
+return -ENOTSUP;
+}
+
+static int fvd_has_zero_init(BlockDriverState * bs)
+{
+return 0;
+}
diff --git a/block/fvd-open.c b/block/fvd-open.c
new file mode 100644
index 000..056b994
--- /dev/null
+++ b/block/fvd-open.c
@@ -0,0 +1,17 @@
+/*
+ * QEMU Fast Virtual Disk Format bdrv_file_open()
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static int fvd_open(BlockDriverState * bs, const char *filename, int flags)
+{
+return -ENOTSUP;
+}
diff --git a/block/fvd-read.c b/block/fvd-read.c
new file mode 100644
index 000..b9f3ac9
--- /dev/null
+++ b/block/fvd-read.c
@@ -0,0 +1,21 @@
+/*
+ * QEMU Fast Virtual Disk Format bdrv_aio_readv()
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static BlockDriverAIOCB *fvd_aio_readv(BlockDriverState * bs,
+

[Qemu-devel] [PATCH 12/26] FVD: add impl of interface bdrv_aio_readv()

2011-02-25 Thread Chunqiang Tang

This patch is part of the Fast Virtual Disk (FVD) proposal.
See http://wiki.qemu.org/Features/FVD.

This patch adds FVD's implementation of the bdrv_aio_readv() interface. It
supports read and copy-on-read in FVD.

Signed-off-by: Chunqiang Tang ct...@us.ibm.com
---
 block/fvd-bitmap.c |   88 ++
 block/fvd-load.c   |   20 +++
 block/fvd-read.c   |  484 +++-
 block/fvd-utils.c  |   44 +
 block/fvd.c|2 +
 5 files changed, 637 insertions(+), 1 deletions(-)
 create mode 100644 block/fvd-load.c
 create mode 100644 block/fvd-utils.c

diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c
index 7e96201..30e4a4b 100644
--- a/block/fvd-bitmap.c
+++ b/block/fvd-bitmap.c
@@ -148,3 +148,91 @@ static bool 
update_fresh_bitmap_and_check_stale_bitmap(FvdAIOCB * acb)
 
 return need_update;
 }
+
+/* Return true if a valid region is found. */
+static bool find_region_in_base_img(BDRVFvdState * s, int64_t * from,
+int64_t * to)
+{
+int64_t sec = *from;
+int64_t region_end = *to;
+
+if (region_end  s-base_img_sectors) {
+region_end = s-base_img_sectors;
+}
+
+check_next_region:
+if (sec = region_end) {
+return false;
+}
+
+if (!fresh_bitmap_show_sector_in_base_img(sec, s)) {
+/* Find the first sector in the base image. */
+
+sec = ROUND_UP(sec + 1, s-block_size); /* Begin of next block. */
+while (1) {
+if (sec = region_end) {
+return false;
+}
+if (fresh_bitmap_show_sector_in_base_img(sec, s)) {
+break;
+}
+sec += s-block_size;   /* Begin of the next block. */
+}
+}
+
+/* Find the end of the region in the base image. */
+int64_t first_sec = sec;
+sec = ROUND_UP(sec + 1, s-block_size); /* Begin of next block. */
+while (1) {
+if (sec = region_end) {
+sec = region_end;
+break;
+}
+if (!fresh_bitmap_show_sector_in_base_img(sec, s)) {
+break;
+}
+sec += s-block_size;   /* Begin of the next block. */
+}
+int64_t last_sec = sec;
+
+/* Check conflicting writes. */
+FvdAIOCB *old;
+QLIST_FOREACH(old, s-write_locks, write.next_write_lock) {
+int64_t old_begin = ROUND_DOWN(old-sector_num, s-block_size);
+int64_t old_end = old-sector_num + old-nb_sectors;
+old_end = ROUND_UP(old_end, s-block_size);
+if (old_begin = first_sec  first_sec  old_end) {
+first_sec = old_end;
+}
+if (old_begin  last_sec  last_sec = old_end) {
+last_sec = old_begin;
+}
+}
+
+if (first_sec = last_sec) {
+/* The region in [first_sec, sec) is fully covered. */
+goto check_next_region;
+}
+
+/* This loop cannot be merged with the loop above. Otherwise, the logic
+ * would be incorrect.  This loop covers the case that an old request
+ * spans over a subset of the region being checked. */
+QLIST_FOREACH(old, s-write_locks, write.next_write_lock) {
+int64_t old_begin = ROUND_DOWN(old-sector_num, s-block_size);
+if (first_sec = old_begin  old_begin  last_sec) {
+last_sec = old_begin;
+}
+}
+
+if (first_sec = last_sec) {
+/* The region in [first_sec, sec) is fully covered. */
+goto check_next_region;
+}
+
+ASSERT(first_sec % s-block_size == 0  (last_sec % s-block_size == 0 ||
+   last_sec == s-base_img_sectors));
+
+*from = first_sec;
+*to = last_sec;
+return true;
+}
diff --git a/block/fvd-load.c b/block/fvd-load.c
new file mode 100644
index 000..80ab32c
--- /dev/null
+++ b/block/fvd-load.c
@@ -0,0 +1,20 @@
+/*
+ * QEMU Fast Virtual Disk Format Load Data from Compact Image
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *Chunqiang Tang ct...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+static inline BlockDriverAIOCB *load_data(FvdAIOCB * parent_acb,
+BlockDriverState * bs, int64_t sector_num,
+QEMUIOVector * orig_qiov, int nb_sectors,
+BlockDriverCompletionFunc * cb, void *opaque)
+{
+return NULL;
+}
diff --git a/block/fvd-read.c b/block/fvd-read.c
index b9f3ac9..cd041e5 100644
--- a/block/fvd-read.c
+++ b/block/fvd-read.c
@@ -11,11 +11,493 @@
  *
  */
 
+static void read_backing_for_copy_on_read_cb(void *opaque, int ret);
+static void read_fvd_cb(void *opaque, int ret);
+static inline void calc_read_region(BDRVFvdState * s, int64_t sector_num,
+int nb_sectors, int64_t * p_first_sec_in_fvd,
+int64_t * p_last_sec_in_fvd,
+int64_t * p_first_sec_in_backing,
+int64_t * p_last_sec_in_backing);
+static

[Qemu-devel] [PATCH v3] rtl8139: add vlan support

2011-02-25 Thread Benjamin Poirier

I've posted v2 of these patches back in november
http://article.gmane.org/gmane.comp.emulators.qemu/84252

Changes since v2:

insertion:
* moved insertion later in the process, to handle tso
* use qemu_sendv_packet() to insert the tag for us
* added dot1q_buf parameter to rtl8139_do_receive() to avoid some
  memcpy() in loopback mode. Note that the code path through that
  function is unchanged when dot1q_buf is NULL.

extraction:
* reduced the amount of copying by moving the frame too short logic
  after the removal of the vlan tag (as is done in e1000.c for
  example). Unfortunately, that logic can no longer be shared betwen
  C+ and C mode.

I've tested on the following combinations of guest and hosts:
host: x86_64, guest: x86_64
host: x86_64, guest: ppc32
host: ppc32, guest: ppc32

Testing on the x86_64 host used '-net tap' and consisted of:
* making an http transfert on the untagged interface.
* ping -s 0-1472 to another host on a vlan.
* making an scp upload to another host on a vlan.

Testing on the ppc32 host used '-net socket' connected to an x86_64 qemu-kvm
running the virtio nic and consisted of:
* establishing an ssh connection between the two using an untagged interface.
* ping -s 0-1472 to the ppc32 using a vlan.
* making an scp transfer in both directions using a vlan.

All that was successful. Nevertheless, it doesn't exercise all code paths so
care is in order.

Please note that the lack of vlan support in rtl8139 has taken a few people
aback:
https://bugzilla.redhat.com/show_bug.cgi?id=516587
http://article.gmane.org/gmane.linux.network.general/14266

Thanks,
-Ben

[Qemu-devel] [PATCH v3 1/2] rtl8139: add vlan tag insertion

2011-02-25 Thread Benjamin Poirier

Add support to the emulated hardware to insert vlan tags in packets
going from the guest to the network.

Signed-off-by: Benjamin Poirier benjamin.poir...@gmail.com
Cc: Igor V. Kovalenko igor.v.kovale...@gmail.com
Cc: Jason Wang jasow...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
---
 hw/rtl8139.c |  123 +-
 1 files changed, 96 insertions(+), 27 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..35ccd3d 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -47,6 +47,8 @@
  *  Darwin)
  */
 
+#include net/ethernet.h
+
 #include hw.h
 #include pci.h
 #include qemu-timer.h
@@ -68,6 +70,16 @@
 #if defined(RTL8139_CALCULATE_RXCRC)
 /* For crc32 */
 #include zlib.h
+
+static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len)
+{
+return crc32(crc, buf, len);
+}
+#else
+static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len)
+{
+return 0;
+}
 #endif
 
 #define SET_MASKED(input, mask, curr) \
@@ -77,6 +89,9 @@
 #define MOD2(input, size) \
 ( ( input )  ( size - 1 )  )
 
+#define VLAN_TCI_LEN 2
+#define VLAN_HDR_LEN (ETHER_TYPE_LEN + VLAN_TCI_LEN)
+
 #if defined (DEBUG_RTL8139)
 #  define DEBUG_PRINT(x) do { printf x ; } while (0)
 #else
@@ -814,9 +829,11 @@ static int rtl8139_can_receive(VLANClientState *nc)
 }
 }
 
-static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size_, int do_interrupt)
+static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf,
+size_t buf_size, int do_interrupt, const uint8_t *dot1q_buf)
 {
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque;
+int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0);
 int size = size_;
 
 uint32_t packet_header = 0;
@@ -935,7 +952,14 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* if too small buffer, then expand it */
 if (size  MIN_BUF_SIZE) {
-memcpy(buf1, buf, size);
+if (unlikely(dot1q_buf)) {
+memcpy(buf1, buf, 2 * ETHER_ADDR_LEN);
+memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN);
+memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 *
+ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN);
+} else {
+memcpy(buf1, buf, size);
+}
 memset(buf1 + size, 0, MIN_BUF_SIZE - size);
 buf = buf1;
 size = MIN_BUF_SIZE;
@@ -1022,7 +1046,21 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
 /* receive/copy to target memory */
-cpu_physical_memory_write( rx_addr, buf, size );
+if (unlikely(dot1q_buf)) {
+cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN);
+val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
+cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, dot1q_buf,
+VLAN_HDR_LEN);
+val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
+cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN +
+VLAN_HDR_LEN, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+} else {
+cpu_physical_memory_write(rx_addr, buf, size);
+val = rtl8139_crc32(0, buf, size);
+}
 
 if (s-CpCmd  CPlusRxChkSum)
 {
@@ -1031,9 +1069,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* write checksum */
 #if defined (RTL8139_CALCULATE_RXCRC)
-val = cpu_to_le32(crc32(0, buf, size));
-#else
-val = 0;
+val = cpu_to_le32(val);
 #endif
 cpu_physical_memory_write( rx_addr+size, (uint8_t *)val, 4);
 
@@ -1133,13 +1169,24 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 rtl8139_write_buffer(s, (uint8_t *)val, 4);
 
-rtl8139_write_buffer(s, buf, size);
+/* receive/copy to target memory */
+if (unlikely(dot1q_buf)) {
+rtl8139_write_buffer(s, buf, 2 * ETHER_ADDR_LEN);
+val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
+rtl8139_write_buffer(s, dot1q_buf, VLAN_HDR_LEN);
+val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
+rtl8139_write_buffer(s, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
+ETHER_ADDR_LEN);
+} else {
+rtl8139_write_buffer(s, buf, size);
+val = rtl8139_crc32(0, buf, size);
+}
 
 /* write checksum */
 #if defined (RTL8139_CALCULATE_RXCRC)
-val = cpu_to_le32(crc32(0, buf, size));
-#else
-

[Qemu-devel] [PATCH v3 2/2] rtl8139: add vlan tag extraction

2011-02-25 Thread Benjamin Poirier

Add support to the emulated hardware to extract vlan tags in packets
going from the network to the guest.

Signed-off-by: Benjamin Poirier benjamin.poir...@gmail.com
Cc: Igor V. Kovalenko igor.v.kovale...@gmail.com
Cc: Jason Wang jasow...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com

--

AFAIK, extraction is optional to get vlans working. The driver
requests rx detagging but should not assume that it was done. Under
Linux, the mac layer will catch the vlan ethertype. I only added this
part for completeness (to emulate the hardware more truthfully.)
---
 hw/rtl8139.c |   89 +-
 1 files changed, 63 insertions(+), 26 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 35ccd3d..f3aaebc 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -835,10 +835,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf,
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque;
 int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0);
 int size = size_;
+const uint8_t *next_part;
+size_t next_part_size;
 
 uint32_t packet_header = 0;
 
-uint8_t buf1[60];
 static const uint8_t broadcast_macaddr[6] =
 { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
@@ -950,21 +951,6 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf,
 }
 }
 
-/* if too small buffer, then expand it */
-if (size  MIN_BUF_SIZE) {
-if (unlikely(dot1q_buf)) {
-memcpy(buf1, buf, 2 * ETHER_ADDR_LEN);
-memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN);
-memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 *
-ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN);
-} else {
-memcpy(buf1, buf, size);
-}
-memset(buf1 + size, 0, MIN_BUF_SIZE - size);
-buf = buf1;
-size = MIN_BUF_SIZE;
-}
-
 if (rtl8139_cp_receiver_enabled(s))
 {
 DEBUG_PRINT((RTL8139: in C+ Rx mode \n));
@@ -1025,6 +1011,44 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf,
 
 uint32_t rx_space = rxdw0  CP_RX_BUFFER_SIZE_MASK;
 
+/* write VLAN info to descriptor variables */
+/* next_part starts right after the vlan header (if any), at the
+ * ethertype for the payload */
+next_part = buf[ETHER_ADDR_LEN * 2];
+if (s-CpCmd  CPlusRxVLAN  (dot1q_buf || be16_to_cpup((uint16_t *)
+buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN)) {
+if (!dot1q_buf) {
+/* the tag is in the buffer */
+dot1q_buf = buf[ETHER_ADDR_LEN * 2];
+next_part += VLAN_HDR_LEN;
+}
+size -= VLAN_HDR_LEN;
+
+rxdw1 = ~CP_RX_VLAN_TAG_MASK;
+/* BE + ~le_to_cpu()~ + cpu_to_le() = BE */
+rxdw1 |= CP_RX_TAVA | le16_to_cpup((uint16_t *)
+buf[ETHER_HDR_LEN]);
+
+DEBUG_PRINT((RTL8139: C+ Rx mode : extracted vlan tag with tci: 
+%u\n, be16_to_cpup((uint16_t *) buf[ETHER_HDR_LEN])));
+} else {
+/* reset VLAN tag flag */
+rxdw1 = ~CP_RX_TAVA;
+}
+next_part_size = buf + buf_size - next_part;
+
+/* if too small buffer, then expand it */
+if (size  MIN_BUF_SIZE) {
+size_t tmp_size = MIN_BUF_SIZE - ETHER_ADDR_LEN * 2;
+uint8_t *tmp = alloca(tmp_size);
+
+memcpy(tmp, next_part, next_part_size);
+memset(tmp + next_part_size, 0, tmp_size - next_part_size);
+next_part = tmp;
+next_part_size = tmp_size;
+size = MIN_BUF_SIZE;
+}
+
 /* TODO: scatter the packet over available receive ring descriptors 
space */
 
 if (size+4  rx_space)
@@ -1049,14 +1073,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf,
 if (unlikely(dot1q_buf)) {
 cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN);
 val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN);
-cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, dot1q_buf,
-VLAN_HDR_LEN);
 val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN);
-cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN +
-VLAN_HDR_LEN, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
-ETHER_ADDR_LEN);
-val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 *
-ETHER_ADDR_LEN);
+cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, next_part,
+next_part_size);
+val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN,
+next_part_size);
 } else {
 cpu_physical_memory_write(rx_addr, buf, size);
 val = rtl8139_crc32(0, buf, size);
@@ -1115,9 +1136,6 @@ static ssize_t

[Qemu-devel] Congratulation

2011-02-25 Thread Nokia Rewards

Nokia celebrates 40yrs.Your Mobile Number has won 600,000 pounds in the Nokia 
Awards. To claim your prize, send your Claim code: TN1, to 
nokiacare...@ymail.com

[Qemu-devel] Congratulation

2011-02-25 Thread Nokia Rewards

Nokia celebrates 40yrs.Your Mobile Number has won 600,000 pounds in the Nokia 
Awards. To claim your prize, send your Claim code: TN1, to 
nokiacare...@ymail.com

87 matches

Mail list logo