date:20100714

[Qemu-devel] Re: [PATCH 0/8] vlan cleanup

2010-07-14 Thread Markus Armbruster

Miguel Di Ciurcio Filho  writes:

> On Tue, Jul 13, 2010 at 3:16 AM, Jan Kiszka  wrote:
>> Miguel Di Ciurcio Filho wrote:
>>> This series removes the vlan stuff without mercy. I've tried to make the 
>>> steps
>>> as small as possible, but the last one is huge. I did some basic tests and
>>> networking is still working, so reviews are welcome :-D
>>
>> Sorry, this is a bit too rude. This not only removes the vlan model,
>> something one may talk about, but also the innocent socket back-ends and
>> the useful pcap dump support.
>>
>> Socket back-ends allow quick and easy unprivileged inter-VM network
>> setups. Nothing for production systems, but useful for testing purposes
>> on boxes where taps are not allowed or unhandy to configure.
>>
>
> I agree that it might be handy sometimes, but one could use VDE for
> that too. Runs on user-space and can be tunneled over SSH or netcat
> [1].
> Another option would be to make the socket backend properly work as a
> netdev, so one could directly connect guest NICs on different hosts,
> but on a 1:1 relationship.
>
>> The dump client helps to debug user mode guest networks, namely slirp
>> which you did not remove. If that should become the only use case for
>> vlans with more than 2 nodes, we could think about making it a special
>> feature of backend devices.
>>
>
> socket and dump are only used when the vlan backends are concerned, so
> they don't have any useful meaning outside of that.
>
> How about add dump hooks on backends? I don't think network backends
> need to be stackable like block devices, thought.

Yes, add a dump hook in net.c for netdev in all the places where a dump
backend on a VLAN gets invoked.

>> I'm open for cleanups here, but they do require a bit mercy - and should
>> also mention the reason.
>>
>
> Well, basically there is a lot of "if (vlan) else if (peer)". While
> discussing the query-netdev QMP command, no one has shown any love
> about the vlan stuff at all, quite the contrary and it was kept out of
> the protocol.

Others are more knowledgable about that than I am, but here's my
understanding.  VLANs can't be accelerated.  1:1 connection (netdev) is
almost always just fine.  If you need a virtual LAN, there are better
tools to build it than QEMU.

> Regards,
>
> Miguel
>
> [1] http://wiki.virtualsquare.org/index.php/VDE#vde_plug

[Qemu-devel] Re: [PATCH 8/8] vlan cleanup: remove usage of VLANState

2010-07-14 Thread Jan Kiszka

Markus Armbruster wrote:
> Miguel Di Ciurcio Filho  writes:
>> diff --git a/qemu-monitor.hx b/qemu-monitor.hx
>> index 275f3bc..e2c2a75 100644
>> --- a/qemu-monitor.hx
>> +++ b/qemu-monitor.hx
>> @@ -1224,8 +1224,8 @@ EQMP
>>  #ifdef CONFIG_SLIRP
>>  {
>>  .name   = "hostfwd_add",
>> -.args_type  = "arg1:s,arg2:s?,arg3:s?",
>> -.params = "[vlan_id name] 
>> [tcp|udp]:[hostaddr]:hostport-[guestaddr]:guestport",
>> +.args_type  = "arg1:s,arg2:s?",
>> +.params = "[tcp|udp]:[hostaddr]:hostport-[guestaddr]:guestport",
>>  .help   = "redirect TCP or UDP connections from host to guest 
>> (requires -net user)",
>>  .mhandler.cmd = net_slirp_hostfwd_add,
>>  },
> 
> params is inconsistent with args_type and net_slirp_hostfwd_add().

Instead of the vlan ID, we need the netdev ID here - actually already
now because we can set up multiple slirp netdevs but cannot address them
via hostfwd_add/remove.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 0/8] vlan cleanup

2010-07-14 Thread Jan Kiszka

Markus Armbruster wrote:
> Miguel Di Ciurcio Filho  writes:
> 
>> On Tue, Jul 13, 2010 at 3:16 AM, Jan Kiszka  wrote:
>>> Miguel Di Ciurcio Filho wrote:
 This series removes the vlan stuff without mercy. I've tried to make the 
 steps
 as small as possible, but the last one is huge. I did some basic tests and
 networking is still working, so reviews are welcome :-D
>>> Sorry, this is a bit too rude. This not only removes the vlan model,
>>> something one may talk about, but also the innocent socket back-ends and
>>> the useful pcap dump support.
>>>
>>> Socket back-ends allow quick and easy unprivileged inter-VM network
>>> setups. Nothing for production systems, but useful for testing purposes
>>> on boxes where taps are not allowed or unhandy to configure.
>>>
>> I agree that it might be handy sometimes, but one could use VDE for
>> that too. Runs on user-space and can be tunneled over SSH or netcat
>> [1].
>> Another option would be to make the socket backend properly work as a
>> netdev, so one could directly connect guest NICs on different hosts,
>> but on a 1:1 relationship.
>>
>>> The dump client helps to debug user mode guest networks, namely slirp
>>> which you did not remove. If that should become the only use case for
>>> vlans with more than 2 nodes, we could think about making it a special
>>> feature of backend devices.
>>>
>> socket and dump are only used when the vlan backends are concerned, so
>> they don't have any useful meaning outside of that.
>>
>> How about add dump hooks on backends? I don't think network backends
>> need to be stackable like block devices, thought.
> 
> Yes, add a dump hook in net.c for netdev in all the places where a dump
> backend on a VLAN gets invoked.

So far dump backends can be hot-added and removed. Once we make them a
property of a netdev backend, monitor support to restore this would be
good. Something like netdev_dump file=FILE[,len=n], empty filename to
disable.

> 
>>> I'm open for cleanups here, but they do require a bit mercy - and should
>>> also mention the reason.
>>>
>> Well, basically there is a lot of "if (vlan) else if (peer)". While
>> discussing the query-netdev QMP command, no one has shown any love
>> about the vlan stuff at all, quite the contrary and it was kept out of
>> the protocol.
> 
> Others are more knowledgable about that than I am, but here's my
> understanding.  VLANs can't be accelerated.  1:1 connection (netdev) is
> almost always just fine.  If you need a virtual LAN, there are better
> tools to build it than QEMU.

That's my understanding as well. But the patch series should state this
to make the decision traceable.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks

2010-07-14 Thread Isaku Yamahata

On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:
> Memory accesses must go through the IOMMU layer.
> 
> Signed-off-by: Eduard - Gabriel Munteanu 
> ---
>  hw/pci.c |   21 +
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 6871728..9c5d706 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "hw.h"
> +#include "iommu.h"
>  #include "pci.h"
>  #include "monitor.h"
>  #include "net.h"
> @@ -733,12 +734,25 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
>  pci_config_free(pci_dev);
>  }
>  
> +#ifdef CONFIG_IOMMU
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +return iommu_register_device(bus->qbus.iommu, &dev->qdev);
> +}
> +#else
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +return 0;
> +}
> +#endif
> +
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> int instance_size, int devfn,
> PCIConfigReadFunc *config_read,
> PCIConfigWriteFunc *config_write)
>  {
>  PCIDevice *pci_dev;
> +int err;
>  
>  pci_dev = qemu_mallocz(instance_size);
>  pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char 
> *name,
>  if (pci_dev == NULL) {
>  hw_error("PCI: can't register device\n");
>  }
> +
> +err = pci_iommu_register_device(bus, pci_dev);
> +if (err) {
> +hw_error("PCI: can't register device with IOMMU\n");
> +return NULL;
> +}
> +
>  return pci_dev;
>  }

pci_register_device() is pre-qdev api.
qdev'fied device doesn't call pci_register_device().
So please move the initialization hook into do_pci_register_device()
which are commonly used by pci_register_device() and pci_qdev_init().
-- 
yamahata

[Qemu-devel] [Bug 544527] Re: usbfs is bugged with >2.6.32.9 and <=2.6.33 (breaks VMWare, Qemu, sane scanners, ...)

2010-07-14 Thread David Kühling

I'm not very experienced with kernel development, so didn't try to
create a patch for now. It wasn't even immediately obvious were to send
patches to.  If nobody else wants to work on the issue I'll try to
allocate some time for it. Pretty amazing how the bug got in the kernel
in the first place.  I mean, even the simplest USB testcase could have
caught it.

cheers,

David

-- 
usbfs is bugged with >2.6.32.9 and <=2.6.33 (breaks VMWare, Qemu, sane 
scanners, ...)
https://bugs.launchpad.net/bugs/544527
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Fix Committed
Status in SANE-backends - Backends for SANE: Fix Committed
Status in Tv Time: Fix Committed
Status in Virtualbox: Fix Committed
Status in “linux” package in Ubuntu: Fix Committed

Bug description:
Binary package hint: tvtime

There's a problem with isochronous and usbfs, suse tried to improve usbfs but 
it end up that it broke usbfs.
For isochronous the entire packet needs to be copied and not only a part of it.

http://lkml.org/lkml/2010/2/26/490  (Report)
http://lkml.org/lkml/2010/2/27/226 (Bugfix)

please merge this bugfix asap.

ProblemType: Bug
Architecture: amd64
Date: Mon Mar 22 21:09:00 2010
DistroRelease: Ubuntu 10.04
LiveMediaBuild: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100322)
Package: tvtime 1.0.2-5ubuntu2
ProcEnviron:
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
SourcePackage: tvtime
Uname: Linux 2.6.32-16-generic x86_64

[Qemu-devel] [STABLE][PATCH 02/14] qcow2: Fix creation of large images

2010-07-14 Thread Kevin Wolf

qcow_create2 assumes that the new image will only need one cluster for its
refcount table initially. Obviously that's not true any more when the image is
big enough (exact value depends on the cluster size).

This patch calculates the refcount table size dynamically.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 4768fa902c3860f2fe34403e6e1c83bfca6da034)

Conflicts:

block/qcow2.c

Signed-off-by: Kevin Wolf 
---
 block/qcow2.c |   41 -
 1 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5d33d6c..35c05e0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -747,10 +747,11 @@ static int qcow_create2(const char *filename, int64_t 
total_size,
 {
 
 int fd, header_size, backing_filename_len, l1_size, i, shift, l2_bits;
-int ref_clusters, backing_format_len = 0;
+int ref_clusters, reftable_clusters, backing_format_len = 0;
 int rounded_ext_bf_len = 0;
 QCowHeader header;
 uint64_t tmp, offset;
+uint64_t old_ref_clusters;
 QCowCreateState s1, *s = &s1;
 QCowExtension ext_bf = {0, 0};
 
@@ -809,17 +810,37 @@ static int qcow_create2(const char *filename, int64_t 
total_size,
 header.l1_size = cpu_to_be32(l1_size);
 offset += align_offset(l1_size * sizeof(uint64_t), s->cluster_size);
 
-s->refcount_table = qemu_mallocz(s->cluster_size);
+/* count how many refcount blocks needed */
+
+#define NUM_CLUSTERS(bytes) \
+(((bytes) + (s->cluster_size) - 1) / (s->cluster_size))
+
+ref_clusters = NUM_CLUSTERS(NUM_CLUSTERS(offset) * sizeof(uint16_t));
+
+do {
+uint64_t image_clusters;
+old_ref_clusters = ref_clusters;
+
+/* Number of clusters used for the refcount table */
+reftable_clusters = NUM_CLUSTERS(ref_clusters * sizeof(uint64_t));
+
+/* Number of clusters that the whole image will have */
+image_clusters = NUM_CLUSTERS(offset) + ref_clusters
++ reftable_clusters;
+
+/* Number of refcount blocks needed for the image */
+ref_clusters = NUM_CLUSTERS(image_clusters * sizeof(uint16_t));
+
+} while (ref_clusters != old_ref_clusters);
+
+s->refcount_table = qemu_mallocz(reftable_clusters * s->cluster_size);
 
 s->refcount_table_offset = offset;
 header.refcount_table_offset = cpu_to_be64(offset);
-header.refcount_table_clusters = cpu_to_be32(1);
-offset += s->cluster_size;
+header.refcount_table_clusters = cpu_to_be32(reftable_clusters);
+offset += (reftable_clusters * s->cluster_size);
 s->refcount_block_offset = offset;
 
-/* count how many refcount blocks needed */
-tmp = offset >> s->cluster_bits;
-ref_clusters = (tmp >> (s->cluster_bits - REFCOUNT_SHIFT)) + 1;
 for (i=0; i < ref_clusters; i++) {
 s->refcount_table[i] = cpu_to_be64(offset);
 offset += s->cluster_size;
@@ -831,7 +852,8 @@ static int qcow_create2(const char *filename, int64_t 
total_size,
 qcow2_create_refcount_update(s, 0, header_size);
 qcow2_create_refcount_update(s, s->l1_table_offset,
 l1_size * sizeof(uint64_t));
-qcow2_create_refcount_update(s, s->refcount_table_offset, s->cluster_size);
+qcow2_create_refcount_update(s, s->refcount_table_offset,
+reftable_clusters * s->cluster_size);
 qcow2_create_refcount_update(s, s->refcount_block_offset,
 ref_clusters * s->cluster_size);
 
@@ -859,7 +881,8 @@ static int qcow_create2(const char *filename, int64_t 
total_size,
 write(fd, &tmp, sizeof(tmp));
 }
 lseek(fd, s->refcount_table_offset, SEEK_SET);
-write(fd, s->refcount_table, s->cluster_size);
+write(fd, s->refcount_table,
+reftable_clusters * s->cluster_size);
 
 lseek(fd, s->refcount_block_offset, SEEK_SET);
 write(fd, s->refcount_block, ref_clusters * s->cluster_size);
-- 
1.7.1.1

[Qemu-devel] [STABLE][PULL 00/14] Block patches for 0.12.5

2010-07-14 Thread Kevin Wolf

Hi Aurelien,

these are the block related patches that have been applied to git master since
0.12.4 was released and that I consider relevant for stable-0.12.

Kevin


The following changes since commit 0c0f53e25cded4b3b1909391f9bf22d334ed8155:

  qemu-options: add documentation for stdio signal=on|off (2010-07-13 21:18:13 
+0200)

are available in the git repository at:
  git://repo.or.cz/qemu/kevin.git for-stable-0.12

Kevin Wolf (13):
  vmdk: fix double free
  qcow2: Fix creation of large images
  vmdk: Fix COW
  qcow2: Remove abort on free_clusters failure
  block/vdi: Fix image opening and creation for odd disk sizes
  qcow2: Restore L1 entry on l2_allocate failure
  block: Add bdrv_(p)write_sync
  qcow: Use bdrv_(p)write_sync for metadata writes
  qcow2: Use bdrv_(p)write_sync for metadata writes
  vmdk: Use bdrv_(p)write_sync for metadata writes
  vpc: Use bdrv_(p)write_sync for metadata writes
  block: Fix early failure in multiwrite
  block: Handle multiwrite errors only when all requests have completed

Stefan Weil (1):
  block/vpc: Fix conversion from size to disk geometry

 block.c|   81 +--
 block.h|4 ++
 block/qcow.c   |   18 ++-
 block/qcow2-cluster.c  |   46 +++
 block/qcow2-refcount.c |   33 ++-
 block/qcow2-snapshot.c |   23 ++---
 block/qcow2.c  |   41 +++-
 block/vdi.c|   18 +--
 block/vmdk.c   |   47 ++--
 block/vpc.c|   30 ++
 block_int.h|1 +
 11 files changed, 219 insertions(+), 123 deletions(-)

[Qemu-devel] [STABLE][PATCH 03/14] vmdk: Fix COW

2010-07-14 Thread Kevin Wolf

When trying to do COW, VMDK wrote the data back to the backing file. This
problem was revealed by the patch that made backing files read-only. This patch
does not only fix the problem, but also simplifies the VMDK code a bit.

This fixes the backing file qemu-iotests cases for VMDK.

Signed-off-by: Kevin Wolf 
(cherry picked from commit c336500df5bf08492f4e7796b2193cd4976f3548)
---
 block/vmdk.c |   35 +++
 1 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 765e95a..d52904a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -87,14 +87,6 @@ typedef struct VmdkMetaData {
 int valid;
 } VmdkMetaData;
 
-typedef struct ActiveBDRVState{
-BlockDriverState *hd;// active image handler
-uint64_t cluster_offset; // current write offset
-}ActiveBDRVState;
-
-static ActiveBDRVState activeBDRV;
-
-
 static int vmdk_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
 uint32_t magic;
@@ -458,30 +450,28 @@ static uint64_t get_cluster_offset(BlockDriverState *bs, 
VmdkMetaData *m_data,
 static int get_whole_cluster(BlockDriverState *bs, uint64_t cluster_offset,
  uint64_t offset, int allocate)
 {
-uint64_t parent_cluster_offset;
 BDRVVmdkState *s = bs->opaque;
 uint8_t  whole_grain[s->cluster_sectors*512];// 128 sectors * 512 
bytes each = grain size 64KB
 
 // we will be here if it's first write on non-exist grain(cluster).
 // try to read from parent image, if exist
 if (bs->backing_hd) {
-BDRVVmdkState *ps = bs->backing_hd->opaque;
+int ret;
 
 if (!vmdk_is_cid_valid(bs))
 return -1;
 
-parent_cluster_offset = get_cluster_offset(bs->backing_hd, NULL,
-offset, allocate);
-
-if (parent_cluster_offset) {
-BDRVVmdkState *act_s = activeBDRV.hd->opaque;
-
-if (bdrv_pread(ps->hd, parent_cluster_offset, whole_grain, 
ps->cluster_sectors*512) != ps->cluster_sectors*512)
-return -1;
+ret = bdrv_read(bs->backing_hd, offset >> 9, whole_grain,
+s->cluster_sectors);
+if (ret < 0) {
+return -1;
+}
 
-//Write grain only into the active image
-if (bdrv_pwrite(act_s->hd, activeBDRV.cluster_offset << 9, 
whole_grain, sizeof(whole_grain)) != sizeof(whole_grain))
-return -1;
+//Write grain only into the active image
+ret = bdrv_write(s->hd, cluster_offset, whole_grain,
+s->cluster_sectors);
+if (ret < 0) {
+return -1;
 }
 }
 return 0;
@@ -567,9 +557,6 @@ static uint64_t get_cluster_offset(BlockDriverState *bs, 
VmdkMetaData *m_data,
 cluster_offset >>= 9;
 tmp = cpu_to_le32(cluster_offset);
 l2_table[l2_index] = tmp;
-// Save the active image state
-activeBDRV.cluster_offset = cluster_offset;
-activeBDRV.hd = bs;
 }
 /* First of all we write grain itself, to avoid race condition
  * that may to corrupt the image.
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 01/14] vmdk: fix double free

2010-07-14 Thread Kevin Wolf

fail_gd error case would also free rgd_buf that was already freed

Signed-off-by: Juan Quintela 
Signed-off-by: Anthony Liguori 
(cherry picked from commit a161329b61106ab093aab6d3227ac85e0b8251a9)

Conflicts:

block/vmdk.c

Signed-off-by: Kevin Wolf 
---
 block/vmdk.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 4e48622..765e95a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -285,7 +285,6 @@ static int vmdk_snapshot_create(const char *filename, const 
char *backing_file)
 goto fail_rgd;
 if (write(snp_fd, rgd_buf, gd_size) == -1)
 goto fail_rgd;
-qemu_free(rgd_buf);
 
 /* write GD */
 gd_buf = qemu_malloc(gd_size);
@@ -298,6 +297,7 @@ static int vmdk_snapshot_create(const char *filename, const 
char *backing_file)
 if (write(snp_fd, gd_buf, gd_size) == -1)
 goto fail_gd;
 qemu_free(gd_buf);
+qemu_free(rgd_buf);
 
 close(p_fd);
 close(snp_fd);
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 04/14] qcow2: Remove abort on free_clusters failure

2010-07-14 Thread Kevin Wolf

While it's true that during regular operation free_clusters failure would be a
bug, an I/O error can always happen. There's no need to kill the VM, the worst
thing that can happen (and it will) is that we leak some clusters.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 003fad6e2cae5311d3aea996388c90e3ab17de90)
---
 block/qcow2-refcount.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 465d5d3..ff2cf6d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -631,7 +631,7 @@ void qcow2_free_clusters(BlockDriverState *bs,
 ret = update_refcount(bs, offset, size, -1);
 if (ret < 0) {
 fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret));
-abort();
+/* TODO Remember the clusters to free them later and avoid leaking */
 }
 }
 
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 10/14] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-07-14 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 8b3b720620a1137a1b794fc3ed64734236f94e06)

Conflicts:

block/qcow2-cluster.c
block/qcow2-refcount.c
block/qcow2-snapshot.c
block/qcow2.c

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c  |   45 -
 block/qcow2-refcount.c |   31 ---
 block/qcow2-snapshot.c |   23 +++
 3 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8c67e3c..0a555dc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -62,8 +62,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 
 for(i = 0; i < s->l1_size; i++)
 new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-ret = bdrv_pwrite(s->hd, new_l1_table_offset, new_l1_table, new_l1_size2);
-if (ret != new_l1_size2)
+ret = bdrv_pwrite_sync(s->hd, new_l1_table_offset, new_l1_table, 
new_l1_size2);
+if (ret < 0)
 goto fail;
 for(i = 0; i < s->l1_size; i++)
 new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
@@ -71,8 +71,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 /* set new table */
 cpu_to_be32w((uint32_t*)data, new_l1_size);
 cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
-ret = bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,sizeof(data));
-if (ret != sizeof(data)) {
+ret = bdrv_pwrite_sync(s->hd, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
+if (ret < 0) {
 goto fail;
 }
 qemu_free(s->l1_table);
@@ -84,7 +84,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
  fail:
 qemu_free(new_l1_table);
 qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2);
-return ret < 0 ? ret : -EIO;
+return ret;
 }
 
 void qcow2_l2_cache_reset(BlockDriverState *bs)
@@ -188,17 +188,17 @@ static int write_l1_entry(BDRVQcowState *s, int l1_index)
 {
 uint64_t buf[L1_ENTRIES_PER_SECTOR];
 int l1_start_index;
-int i;
+int i, ret;
 
 l1_start_index = l1_index & ~(L1_ENTRIES_PER_SECTOR - 1);
 for (i = 0; i < L1_ENTRIES_PER_SECTOR; i++) {
 buf[i] = cpu_to_be64(s->l1_table[l1_start_index + i]);
 }
 
-if (bdrv_pwrite(s->hd, s->l1_table_offset + 8 * l1_start_index,
-buf, sizeof(buf)) != sizeof(buf))
-{
-return -1;
+ret = bdrv_pwrite_sync(s->hd, s->l1_table_offset + 8 * l1_start_index,
+buf, sizeof(buf));
+if (ret < 0) {
+return ret;
 }
 
 return 0;
@@ -221,6 +221,7 @@ static uint64_t *l2_allocate(BlockDriverState *bs, int 
l1_index)
 uint64_t old_l2_offset;
 uint64_t *l2_table;
 int64_t l2_offset;
+int ret;
 
 old_l2_offset = s->l1_table[l1_index];
 
@@ -247,10 +248,11 @@ static uint64_t *l2_allocate(BlockDriverState *bs, int 
l1_index)
 goto fail;
 }
 /* write the l2 table to the file */
-if (bdrv_pwrite(s->hd, l2_offset,
-l2_table, s->l2_size * sizeof(uint64_t)) !=
-s->l2_size * sizeof(uint64_t))
+ret = bdrv_pwrite_sync(s->hd, l2_offset, l2_table,
+s->l2_size * sizeof(uint64_t));
+if (ret < 0) {
 goto fail;
+}
 
 /* update the L1 entry */
 s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
@@ -384,8 +386,8 @@ static int copy_sectors(BlockDriverState *bs, uint64_t 
start_sect,
 s->cluster_data, n, 1,
 &s->aes_encrypt_key);
 }
-ret = bdrv_write(s->hd, (cluster_offset >> 9) + n_start,
- s->cluster_data, n);
+ret = bdrv_write_sync(s->hd, (cluster_offset >> 9) + n_start,
+s->cluster_data, n);
 if (ret < 0)
 return ret;
 return 0;
@@ -597,10 +599,10 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 /* compressed clusters never have the copied flag */
 
 l2_table[l2_index] = cpu_to_be64(cluster_offset);
-if (bdrv_pwrite(s->hd,
+if (bdrv_pwrite_sync(s->hd,
 l2_offset + l2_index * sizeof(uint64_t),
 l2_table + l2_index,
-sizeof(uint64_t)) != sizeof(uint64_t))
+sizeof(uint64_t)) < 0)
 return 0;
 
 return cluster_offset;
@@ -618,11 +620,12 @@ static int write_l2_entries(BDRVQcowState *s, uint64_t 
*l2_table,
 int start_offset = (8 * l2_index) & ~511;
 int end_offset = (8 * (l2_index + num) + 511) & ~511;
 size_t len = end_offset - start_offset;
+int ret;
 
-if (bdrv_pwrite(s->hd, l2_offset + start_offset, &l2_table[l2_start_index],
-len) != len)
-{
-return -1;
+ret = bdrv_pwrite_sync(s->hd, l2_offset + start_offset,
+&l2_table[l2_start_index], len);
+if (ret < 0) {
+return ret;
 }
 
 return 0;
diff --git a/block/qcow2-refco

[Qemu-devel] [STABLE][PATCH 05/14] block/vpc: Fix conversion from size to disk geometry

2010-07-14 Thread Kevin Wolf

From: Stefan Weil 

The VHD algorithm calculates a disk geometry
which is usually smaller than the requested size.

QEMU tried to round up but failed for certain sizes:

qemu-img create -f vpc disk.vpc 9437184
would create an image with 9435136 bytes
(which is too small for qemu-img convert).

Instead of hacking the geometry algorithm, the patch
increases the number of sectors until we get enough
sectors.

Cc: Kevin Wolf 
Signed-off-by: Stefan Weil 
Signed-off-by: Kevin Wolf 
(cherry picked from commit dede4188cc817a039154ed2ecd7f3285f6b94056)
---
 block/vpc.c |   21 -
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 950ad58..afe6f1a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -470,9 +470,7 @@ static int calculate_geometry(int64_t total_sectors, 
uint16_t* cyls,
 }
 }
 
-// Note: Rounding up deviates from the Virtual PC behaviour
-// However, we need this to avoid truncating images in qemu-img convert
-*cyls = (cyls_times_heads + *heads - 1) / *heads;
+*cyls = cyls_times_heads / *heads;
 
 return 0;
 }
@@ -484,9 +482,9 @@ static int vpc_create(const char *filename, 
QEMUOptionParameter *options)
 struct vhd_dyndisk_header* dyndisk_header =
 (struct vhd_dyndisk_header*) buf;
 int fd, i;
-uint16_t cyls;
-uint8_t heads;
-uint8_t secs_per_cyl;
+uint16_t cyls = 0;
+uint8_t heads = 0;
+uint8_t secs_per_cyl = 0;
 size_t block_size, num_bat_entries;
 int64_t total_sectors = 0;
 
@@ -503,9 +501,14 @@ static int vpc_create(const char *filename, 
QEMUOptionParameter *options)
 if (fd < 0)
 return -EIO;
 
-// Calculate matching total_size and geometry
-if (calculate_geometry(total_sectors, &cyls, &heads, &secs_per_cyl))
-return -EFBIG;
+/* Calculate matching total_size and geometry. Increase the number of
+   sectors requested until we get enough (or fail). */
+for (i = 0; total_sectors > (int64_t)cyls * heads * secs_per_cyl; i++) {
+if (calculate_geometry(total_sectors + i,
+   &cyls, &heads, &secs_per_cyl)) {
+return -EFBIG;
+}
+}
 total_sectors = (int64_t) cyls * heads * secs_per_cyl;
 
 // Prepare the Hard Disk Footer
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 07/14] qcow2: Restore L1 entry on l2_allocate failure

2010-07-14 Thread Kevin Wolf

If writing the L1 table to disk failed, we need to restore its old content in
memory to avoid inconsistencies.

Reported-by: Juan Quintela 
Signed-off-by: Kevin Wolf 
(cherry picked from commit 68dba0bf455e60061bb3c9c40ef0d82916372664)
---
 block/qcow2-cluster.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b7a5b35..8c67e3c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -266,6 +266,7 @@ static uint64_t *l2_allocate(BlockDriverState *bs, int 
l1_index)
 return l2_table;
 
 fail:
+s->l1_table[l1_index] = old_l2_offset;
 qcow2_l2_cache_reset(bs);
 return NULL;
 }
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 08/14] block: Add bdrv_(p)write_sync

2010-07-14 Thread Kevin Wolf

Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)

Signed-off-by: Kevin Wolf 
(cherry picked from commit f08145fe16470aca09304099888f68cfbc5d1de7)
---
 block.c |   39 +++
 block.h |4 
 block_int.h |1 +
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 298414c..e3b3a55 100644
--- a/block.c
+++ b/block.c
@@ -452,6 +452,8 @@ int bdrv_open2(BlockDriverState *bs, const char *filename, 
int flags,
 (flags & (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
 else
 open_flags = flags & ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
+
+bs->open_flags = open_flags;
 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv))
 ret = -ENOTSUP;
 else
@@ -779,6 +781,43 @@ int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 return count1;
 }
 
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count)
+{
+int ret;
+
+ret = bdrv_pwrite(bs, offset, buf, count);
+if (ret < 0) {
+return ret;
+}
+
+/* No flush needed for cache=writethrough, it uses O_DSYNC */
+if ((bs->open_flags & BDRV_O_CACHE_MASK) != 0) {
+bdrv_flush(bs);
+}
+
+return 0;
+}
+
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors)
+{
+return bdrv_pwrite_sync(bs, BDRV_SECTOR_SIZE * sector_num,
+buf, BDRV_SECTOR_SIZE * nb_sectors);
+}
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
diff --git a/block.h b/block.h
index fa51ddf..762d88a 100644
--- a/block.h
+++ b/block.h
@@ -77,6 +77,10 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset,
void *buf, int count);
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count);
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
diff --git a/block_int.h b/block_int.h
index 9a3b2e0..631e8c5 100644
--- a/block_int.h
+++ b/block_int.h
@@ -127,6 +127,7 @@ struct BlockDriverState {
 int64_t total_sectors; /* if we are reading a disk image, give its
   size in sectors */
 int read_only; /* if true, the media is read only */
+int open_flags; /* flags used to open the file, re-used for re-open */
 int removable; /* if true, the media can be removed */
 int locked;/* if true, the media cannot temporarily be ejected */
 int encrypted; /* if true, the media is encrypted */
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 12/14] vpc: Use bdrv_(p)write_sync for metadata writes

2010-07-14 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 078a458e077d6b0db262c4b05fee51d01de2d1d2)

Conflicts:

block/vpc.c

Signed-off-by: Kevin Wolf 
---
 block/vpc.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index afe6f1a..9e0acf4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -266,7 +266,7 @@ static inline int64_t get_sector_offset(BlockDriverState 
*bs,
 
 s->last_bitmap_offset = bitmap_offset;
 memset(bitmap, 0xff, s->bitmap_size);
-bdrv_pwrite(s->hd, bitmap_offset, bitmap, s->bitmap_size);
+bdrv_pwrite_sync(s->hd, bitmap_offset, bitmap, s->bitmap_size);
 }
 
 //printf("sector: %" PRIx64 ", index: %x, offset: %x, bioff: %" PRIx64 ", 
bloff: %" PRIx64 "\n",
@@ -316,7 +316,7 @@ static int rewrite_footer(BlockDriverState* bs)
 BDRVVPCState *s = bs->opaque;
 int64_t offset = s->free_data_block_offset;
 
-ret = bdrv_pwrite(s->hd, offset, s->footer_buf, HEADER_SIZE);
+ret = bdrv_pwrite_sync(s->hd, offset, s->footer_buf, HEADER_SIZE);
 if (ret < 0)
 return ret;
 
@@ -351,7 +351,8 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t 
sector_num)
 
 // Initialize the block's bitmap
 memset(bitmap, 0xff, s->bitmap_size);
-bdrv_pwrite(s->hd, s->free_data_block_offset, bitmap, s->bitmap_size);
+bdrv_pwrite_sync(s->hd, s->free_data_block_offset, bitmap,
+s->bitmap_size);
 
 // Write new footer (the old one will be overwritten)
 s->free_data_block_offset += s->block_size + s->bitmap_size;
@@ -362,7 +363,7 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t 
sector_num)
 // Write BAT entry to disk
 bat_offset = s->bat_offset + (4 * index);
 bat_value = be32_to_cpu(s->pagetable[index]);
-ret = bdrv_pwrite(s->hd, bat_offset, &bat_value, 4);
+ret = bdrv_pwrite_sync(s->hd, bat_offset, &bat_value, 4);
 if (ret < 0)
 goto fail;
 
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 06/14] block/vdi: Fix image opening and creation for odd disk sizes

2010-07-14 Thread Kevin Wolf

The fix is based on a patch from Kevin Wolf. Here his comment:

"The number of blocks needs to be rounded up to cover all of the virtual hard
disk. Without this fix, we can't even open our own images if their size is not
a multiple of the block size."

While Kevin's patch addressed vdi_create, my modification also fixes
vdi_open which now accepts images with odd disk sizes.

v3:
Don't allow reading of disk images with too large disk sizes.
Neither VBoxManage nor old versions of qemu-img read such images.
This change requires rounding of odd disk sizes before we do the checks.

Cc: Kevin Wolf 
Cc: François Revol 
Signed-off-by: Stefan Weil 
Signed-off-by: Kevin Wolf 
(cherry picked from commit f21dc3a4652eeb82117d7d55d975278fe1444b26)

Conflicts:

block/vdi.c

Signed-off-by: Kevin Wolf 
---
 block/vdi.c |   18 +++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 45aa81c..64bf66f 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -399,6 +399,15 @@ static int vdi_open(BlockDriverState *bs, const char 
*filename, int flags)
 vdi_header_print(&header);
 #endif
 
+if (header.disk_size % SECTOR_SIZE != 0) {
+/* 'VBoxManage convertfromraw' can create images with odd disk sizes.
+   We accept them but round the disk size to the next multiple of
+   SECTOR_SIZE. */
+logout("odd disk size %" PRIu64 " B, round up\n", header.disk_size);
+header.disk_size += SECTOR_SIZE - 1;
+header.disk_size &= ~(SECTOR_SIZE - 1);
+}
+
 if (header.version != VDI_VERSION_1_1) {
 logout("unsupported version %u.%u\n",
header.version >> 16, header.version & 0x);
@@ -417,9 +426,9 @@ static int vdi_open(BlockDriverState *bs, const char 
*filename, int flags)
 } else if (header.block_size != 1 * MiB) {
 logout("unsupported block size %u B\n", header.block_size);
 goto fail;
-} else if (header.disk_size !=
+} else if (header.disk_size >
(uint64_t)header.blocks_in_image * header.block_size) {
-logout("unexpected block number %u B\n", header.blocks_in_image);
+logout("unsupported disk size %" PRIu64 " B\n", header.disk_size);
 goto fail;
 } else if (!uuid_is_null(header.uuid_link)) {
 logout("link uuid != 0, unsupported\n");
@@ -831,7 +840,10 @@ static int vdi_create(const char *filename, 
QEMUOptionParameter *options)
 return -errno;
 }
 
-blocks = bytes / block_size;
+/* We need enough blocks to store the given disk size,
+   so always round up. */
+blocks = (bytes + block_size - 1) / block_size;
+
 bmap_size = blocks * sizeof(uint32_t);
 bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
 
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 13/14] block: Fix early failure in multiwrite

2010-07-14 Thread Kevin Wolf

bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 453f9a1652629e5805995b165be2e634c8487139)

Conflicts:

block.c

Signed-off-by: Kevin Wolf 
---
 block.c |   37 ++---
 1 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index e3b3a55..80f2fae 100644
--- a/block.c
+++ b/block.c
@@ -1802,8 +1802,29 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // Check for mergable requests
 num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
 
-// Run the aio requests
+/*
+ * Run the aio requests. As soon as one request can't be submitted
+ * successfully, fail all requests that are not yet submitted (we must
+ * return failure for all requests anyway)
+ *
+ * num_requests cannot be set to the right value immediately: If
+ * bdrv_aio_writev fails for some request, num_requests would be too high
+ * and therefore multiwrite_cb() would never recognize the multiwrite
+ * request as completed. We also cannot use the loop variable i to set it
+ * when the first request fails because the callback may already have been
+ * called for previously submitted requests. Thus, num_requests must be
+ * incremented for each request that is submitted.
+ *
+ * The problem that callbacks may be called early also means that we need
+ * to take care that num_requests doesn't become 0 before all requests are
+ * submitted - multiwrite_cb() would consider the multiwrite request
+ * completed. A dummy request that is "completed" by a manual call to
+ * multiwrite_cb() takes care of this.
+ */
+mcb->num_requests = 1;
+
 for (i = 0; i < num_reqs; i++) {
+mcb->num_requests++;
 acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
 reqs[i].nb_sectors, multiwrite_cb, mcb);
 
@@ -1811,23 +1832,25 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // We can only fail the whole thing if no request has been
 // submitted yet. Otherwise we'll wait for the submitted AIOs to
 // complete and report the error in the callback.
-if (mcb->num_requests == 0) {
-reqs[i].error = -EIO;
+if (i == 0) {
 goto fail;
 } else {
-mcb->num_requests++;
 multiwrite_cb(mcb, -EIO);
 break;
 }
-} else {
-mcb->num_requests++;
 }
 }
 
+/* Complete the dummy request */
+multiwrite_cb(mcb, 0);
+
 return 0;
 
 fail:
-free(mcb);
+for (i = 0; i < mcb->num_callbacks; i++) {
+reqs[i].error = -EIO;
+}
+qemu_free(mcb);
 return -1;
 }
 
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 14/14] block: Handle multiwrite errors only when all requests have completed

2010-07-14 Thread Kevin Wolf

Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:

* The temporary data that is freed (qiov, possibly zero buffer) is still used
  by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
  that it may free buffers etc. which are still in use.

Just remember the error value and do the cleanup when all requests have
completed.

Signed-off-by: Kevin Wolf 
(cherry picked from commit de189a1b4a471d37a2909e97646654fc9751b52f)
---
 block.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 80f2fae..1694780 100644
--- a/block.c
+++ b/block.c
@@ -1661,14 +1661,11 @@ static void multiwrite_cb(void *opaque, int ret)
 
 if (ret < 0 && !mcb->error) {
 mcb->error = ret;
-multiwrite_user_cb(mcb);
 }
 
 mcb->num_requests--;
 if (mcb->num_requests == 0) {
-if (mcb->error == 0) {
-multiwrite_user_cb(mcb);
-}
+multiwrite_user_cb(mcb);
 qemu_free(mcb);
 }
 }
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 09/14] qcow: Use bdrv_(p)write_sync for metadata writes

2010-07-14 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf 
(cherry picked from commit 5e5557d97026d1d3325e0e7b0ba593366da2f3dc)

Conflicts:

block/qcow.c

Signed-off-by: Kevin Wolf 
---
 block/qcow.c |   18 ++
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 7fc85ae..6a4c30f 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -277,8 +277,9 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 /* update the L1 entry */
 s->l1_table[l1_index] = l2_offset;
 tmp = cpu_to_be64(l2_offset);
-if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
-&tmp, sizeof(tmp)) != sizeof(tmp))
+if (bdrv_pwrite_sync(s->hd,
+s->l1_table_offset + l1_index * sizeof(tmp),
+&tmp, sizeof(tmp)) < 0)
 return 0;
 new_l2_table = 1;
 }
@@ -306,8 +307,8 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 l2_table = s->l2_cache + (min_index << s->l2_bits);
 if (new_l2_table) {
 memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
-if (bdrv_pwrite(s->hd, l2_offset, l2_table, s->l2_size * 
sizeof(uint64_t)) !=
-s->l2_size * sizeof(uint64_t))
+if (bdrv_pwrite_sync(s->hd, l2_offset, l2_table,
+s->l2_size * sizeof(uint64_t)) < 0)
 return 0;
 } else {
 if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * 
sizeof(uint64_t)) !=
@@ -372,8 +373,8 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 /* update L2 table */
 tmp = cpu_to_be64(cluster_offset);
 l2_table[l2_index] = tmp;
-if (bdrv_pwrite(s->hd,
-l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) 
!= sizeof(tmp))
+if (bdrv_pwrite_sync(s->hd, l2_offset + l2_index * sizeof(tmp),
+&tmp, sizeof(tmp)) < 0)
 return 0;
 }
 return cluster_offset;
@@ -821,8 +822,9 @@ static int qcow_make_empty(BlockDriverState *bs)
 int ret;
 
 memset(s->l1_table, 0, l1_length);
-if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
-   return -1;
+if (bdrv_pwrite_sync(s->hd, s->l1_table_offset, s->l1_table,
+l1_length) < 0)
+return -1;
 ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
 if (ret < 0)
 return ret;
-- 
1.7.1.1

[Qemu-devel] [STABLE][PATCH 11/14] vmdk: Use bdrv_(p)write_sync for metadata writes

2010-07-14 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf 
(cherry picked from commit b8852e87d9d113096342c3e0977266cda0fe9ee5)

Conflicts:

block/vmdk.c

Signed-off-by: Kevin Wolf 
---
 block/vmdk.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index d52904a..cd87279 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -153,7 +153,7 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t 
cid)
 pstrcat(desc, sizeof(desc), tmp_desc);
 }
 
-if (bdrv_pwrite(s->hd, 0x200, desc, DESC_SIZE) != DESC_SIZE)
+if (bdrv_pwrite_sync(s->hd, 0x200, desc, DESC_SIZE) < 0)
 return -1;
 return 0;
 }
@@ -482,14 +482,14 @@ static int vmdk_L2update(BlockDriverState *bs, 
VmdkMetaData *m_data)
 BDRVVmdkState *s = bs->opaque;
 
 /* update L2 table */
-if (bdrv_pwrite(s->hd, ((int64_t)m_data->l2_offset * 512) + 
(m_data->l2_index * sizeof(m_data->offset)),
-&(m_data->offset), sizeof(m_data->offset)) != 
sizeof(m_data->offset))
+if (bdrv_pwrite_sync(s->hd, ((int64_t)m_data->l2_offset * 512) + 
(m_data->l2_index * sizeof(m_data->offset)),
+&(m_data->offset), sizeof(m_data->offset)) < 0)
 return -1;
 /* update backup L2 table */
 if (s->l1_backup_table_offset != 0) {
 m_data->l2_offset = s->l1_backup_table[m_data->l1_index];
-if (bdrv_pwrite(s->hd, ((int64_t)m_data->l2_offset * 512) + 
(m_data->l2_index * sizeof(m_data->offset)),
-&(m_data->offset), sizeof(m_data->offset)) != 
sizeof(m_data->offset))
+if (bdrv_pwrite_sync(s->hd, ((int64_t)m_data->l2_offset * 512) + 
(m_data->l2_index * sizeof(m_data->offset)),
+&(m_data->offset), sizeof(m_data->offset)) < 0)
 return -1;
 }
 
-- 
1.7.1.1

[Qemu-devel] [PATCH] Remove incorrect pci_mem_base setting in bonito northbridge

2010-07-14 Thread Huacai Chen

This mistake makes PCI devices can't work correctly.

Signed-off-by: Huacai Chen 
---
 hw/bonito.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/hw/bonito.c b/hw/bonito.c
index 8b81032..dcf0311 100644
--- a/hw/bonito.c
+++ b/hw/bonito.c
@@ -775,7 +775,6 @@ PCIBus *bonito_init(qemu_irq *pic)
  pci_bonito_map_irq, pic, 0x28, 32);
 pcihost->bus = b;
 qdev_init_nofail(dev);
-pci_bus_set_mem_base(pcihost->bus, 0x1000);
 
 d = pci_create_simple(b, PCI_DEVFN(0, 0), "Bonito");
 s = DO_UPCAST(PCIBonitoState, dev, d);
-- 
1.7.0.4

Re: [Qemu-devel] [PATCH 1/2] [virtio-9p] Cleanup legacy 'dotu' variable.

2010-07-14 Thread Venkateswararao Jujjuri (JV)

Arun R Bharadwaj wrote:
> Hi,
> 
> This patch cleans up the legacy 'dotu' variable which is always set to
> 1 by default, since qemu doesnt support legacy 9p clients.

Do we have 2/2 in this series? Also please check the thread-id.
It is showing part of your other patch, TLERROR/RLERROR.

- JV

> 
> Signed-off-by: Arun R Bharadwaj 
> ---
>  hw/virtio-9p-debug.c |   26 +++
>  hw/virtio-9p-debug.h |1 
>  hw/virtio-9p.c   |  119 
> +++
>  3 files changed, 63 insertions(+), 83 deletions(-)
> 
> Index: qemu/hw/virtio-9p-debug.h
> ===
> --- qemu.orig/hw/virtio-9p-debug.h
> +++ qemu/hw/virtio-9p-debug.h
> @@ -1,7 +1,6 @@
>  #ifndef _QEMU_VIRTIO_9P_DEBUG_H
>  #define _QEMU_VIRTIO_9P_DEBUG_H
> 
> -extern int dotu;
>  void pprint_pdu(V9fsPDU *pdu);
> 
>  #endif
> Index: qemu/hw/virtio-9p-debug.c
> ===
> --- qemu.orig/hw/virtio-9p-debug.c
> +++ qemu/hw/virtio-9p-debug.c
> @@ -169,12 +169,10 @@ static void pprint_stat(V9fsPDU *pdu, in
>  pprint_str(pdu, rx, offsetp, ", uid");
>  pprint_str(pdu, rx, offsetp, ", gid");
>  pprint_str(pdu, rx, offsetp, ", muid");
> -if (dotu) {
> -pprint_str(pdu, rx, offsetp, ", extension");
> -pprint_int32(pdu, rx, offsetp, ", uid");
> -pprint_int32(pdu, rx, offsetp, ", gid");
> -pprint_int32(pdu, rx, offsetp, ", muid");
> -}
> +pprint_str(pdu, rx, offsetp, ", extension");
> +pprint_int32(pdu, rx, offsetp, ", uid");
> +pprint_int32(pdu, rx, offsetp, ", gid");
> +pprint_int32(pdu, rx, offsetp, ", muid");
>  fprintf(llogfile, "}");
>  }
> 
> @@ -343,9 +341,7 @@ void pprint_pdu(V9fsPDU *pdu)
>  pprint_int32(pdu, 0, &offset, "afid");
>  pprint_str(pdu, 0, &offset, ", uname");
>  pprint_str(pdu, 0, &offset, ", aname");
> -if (dotu) {
> -pprint_int32(pdu, 0, &offset, ", n_uname");
> -}
> +pprint_int32(pdu, 0, &offset, ", n_uname");
>  break;
>  case P9_RAUTH:
>  fprintf(llogfile, "RAUTH: (");
> @@ -357,9 +353,7 @@ void pprint_pdu(V9fsPDU *pdu)
>  pprint_int32(pdu, 0, &offset, ", afid");
>  pprint_str(pdu, 0, &offset, ", uname");
>  pprint_str(pdu, 0, &offset, ", aname");
> -if (dotu) {
> -pprint_int32(pdu, 0, &offset, ", n_uname");
> -}
> +pprint_int32(pdu, 0, &offset, ", n_uname");
>  break;
>  case P9_RATTACH:
>  fprintf(llogfile, "RATTACH: (");
> @@ -371,9 +365,7 @@ void pprint_pdu(V9fsPDU *pdu)
>  case P9_RERROR:
>  fprintf(llogfile, "RERROR: (");
>  pprint_str(pdu, 1, &offset, "ename");
> -if (dotu) {
> -pprint_int32(pdu, 1, &offset, ", ecode");
> -}
> +pprint_int32(pdu, 1, &offset, ", ecode");
>  break;
>  case P9_TFLUSH:
>  fprintf(llogfile, "TFLUSH: (");
> @@ -408,9 +400,7 @@ void pprint_pdu(V9fsPDU *pdu)
>  pprint_str(pdu, 0, &offset, ", name");
>  pprint_int32(pdu, 0, &offset, ", perm");
>  pprint_int8(pdu, 0, &offset, ", mode");
> -if (dotu) {
> -pprint_str(pdu, 0, &offset, ", extension");
> -}
> +pprint_str(pdu, 0, &offset, ", extension");
>  break;
>  case P9_RCREATE:
>  fprintf(llogfile, "RCREATE: (");
> Index: qemu/hw/virtio-9p.c
> ===
> --- qemu.orig/hw/virtio-9p.c
> +++ qemu/hw/virtio-9p.c
> @@ -18,7 +18,6 @@
>  #include "fsdev/qemu-fsdev.h"
>  #include "virtio-9p-debug.h"
> 
> -int dotu = 1;
>  int debug_9p_pdu;
> 
>  enum {
> @@ -753,9 +752,7 @@ static void complete_pdu(V9fsState *s, V
> 
>  len = 7;
>  len += pdu_marshal(pdu, len, "s", &str);
> -if (dotu) {
> -len += pdu_marshal(pdu, len, "d", err);
> -}
> +len += pdu_marshal(pdu, len, "d", err);
> 
>  id = P9_RERROR;
>  }
> @@ -785,22 +782,20 @@ static mode_t v9mode_to_mode(uint32_t mo
>  ret |= S_IFDIR;
>  }
> 
> -if (dotu) {
> -if (mode & P9_STAT_MODE_SYMLINK) {
> -ret |= S_IFLNK;
> -}
> -if (mode & P9_STAT_MODE_SOCKET) {
> -ret |= S_IFSOCK;
> -}
> -if (mode & P9_STAT_MODE_NAMED_PIPE) {
> -ret |= S_IFIFO;
> -}
> -if (mode & P9_STAT_MODE_DEVICE) {
> -if (extension && extension->data[0] == 'c') {
> -ret |= S_IFCHR;
> -} else {
> -ret |= S_IFBLK;
> -}
> +if (mode & P9_STAT_MODE_SYMLINK) {
> +ret |= S_IFLNK;
> +}
> +if (mode & P9_STAT_MODE_SOCKET) {
> +ret |= S_IFSOCK;
> +}
> +if (mode & P9_STAT_MODE_NAMED_PIPE) {
> +ret |= S_IFIFO;
> +}
> +if (mode & P9_STAT_MODE_DEVICE) {
> +if (extension && exten

Re: [Qemu-devel] [PATCH] Remove incorrect pci_mem_base setting in bonito northbridge

2010-07-14 Thread Aurelien Jarno

On Wed, Jul 14, 2010 at 07:51:49PM +0800, Huacai Chen wrote:
> This mistake makes PCI devices can't work correctly.
> 
> Signed-off-by: Huacai Chen 
> ---
>  hw/bonito.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)

Thanks, applied.

> diff --git a/hw/bonito.c b/hw/bonito.c
> index 8b81032..dcf0311 100644
> --- a/hw/bonito.c
> +++ b/hw/bonito.c
> @@ -775,7 +775,6 @@ PCIBus *bonito_init(qemu_irq *pic)
>   pci_bonito_map_irq, pic, 0x28, 32);
>  pcihost->bus = b;
>  qdev_init_nofail(dev);
> -pci_bus_set_mem_base(pcihost->bus, 0x1000);
>  
>  d = pci_create_simple(b, PCI_DEVFN(0, 0), "Bonito");
>  s = DO_UPCAST(PCIBonitoState, dev, d);
> -- 
> 1.7.0.4
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [STABLE][PULL 00/14] Block patches for 0.12.5

2010-07-14 Thread Aurelien Jarno

On Wed, Jul 14, 2010 at 01:23:59PM +0200, Kevin Wolf wrote:
> Hi Aurelien,
> 
> these are the block related patches that have been applied to git master since
> 0.12.4 was released and that I consider relevant for stable-0.12.
> 

Thanks, pulled.

> 
> 
> The following changes since commit 0c0f53e25cded4b3b1909391f9bf22d334ed8155:
> 
>   qemu-options: add documentation for stdio signal=on|off (2010-07-13 
> 21:18:13 +0200)
> 
> are available in the git repository at:
>   git://repo.or.cz/qemu/kevin.git for-stable-0.12
> 
> Kevin Wolf (13):
>   vmdk: fix double free
>   qcow2: Fix creation of large images
>   vmdk: Fix COW
>   qcow2: Remove abort on free_clusters failure
>   block/vdi: Fix image opening and creation for odd disk sizes
>   qcow2: Restore L1 entry on l2_allocate failure
>   block: Add bdrv_(p)write_sync
>   qcow: Use bdrv_(p)write_sync for metadata writes
>   qcow2: Use bdrv_(p)write_sync for metadata writes
>   vmdk: Use bdrv_(p)write_sync for metadata writes
>   vpc: Use bdrv_(p)write_sync for metadata writes
>   block: Fix early failure in multiwrite
>   block: Handle multiwrite errors only when all requests have completed
> 
> Stefan Weil (1):
>   block/vpc: Fix conversion from size to disk geometry
> 
>  block.c|   81 +--
>  block.h|4 ++
>  block/qcow.c   |   18 ++-
>  block/qcow2-cluster.c  |   46 +++
>  block/qcow2-refcount.c |   33 ++-
>  block/qcow2-snapshot.c |   23 ++---
>  block/qcow2.c  |   41 +++-
>  block/vdi.c|   18 +--
>  block/vmdk.c   |   47 ++--
>  block/vpc.c|   30 ++
>  block_int.h|1 +
>  11 files changed, 219 insertions(+), 123 deletions(-)
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 1/2] virtio-serial: Check if virtio queue is ready before consuming data

2010-07-14 Thread Amit Shah

On (Thu) Jul 01 2010 [14:58:16], Amit Shah wrote:
> If a virtio-serial port is removed before the guest comes up and
> initialises the virtqueues, qemu exits with the message
> 
> Guest moved used index from 0 to 61440
> 
> This happens because we try to clear any pending buffers from the
> virtqueue.
> 
> Ensure the virtqueue is initialised before calling any virtqueue
> operations.
> 
> Signed-off-by: Amit Shah 

Ping (for patches 1 and 2)

Amit

Re: [Qemu-devel] [PATCH] virtio-serial: Fix compat property name

2010-07-14 Thread Amit Shah

On (Wed) Jun 23 2010 [22:49:20], Amit Shah wrote:
> Starting with qemu -M pc-0.12 -device virtio-serial
> 
> results in
> 
> -device virtio-serial: Property 'virtio-serial-pci.max_nr_ports' not found
> 
> The property name 'max_ports' is incorrectly named 'max_nr_ports'. Fix
> that.
> 
> Also fix the ppc440 machine type bamboo-0.12 which has this typo.
> 
> Reported-by: Daniel P. Berrange 
> Signed-off-by: Amit Shah 

Ping

Amit

Re: [Qemu-devel] [PATCH v2] rtc: Remove TARGET_I386 from qemu-config.c, enables driftfix

2010-07-14 Thread Amit Shah

On (Wed) Jun 23 2010 [20:14:04], Amit Shah wrote:
> qemu-config.c doesn't contain any target-specific code, and the
> TARGET_I386 conditional code didn't get compiled as a result. Removing
> this enables the driftfix parameter for rtc.
> 
> Signed-off-by: Amit Shah 

Ping

Amit

[Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Paul Brook

> Memory accesses must go through the IOMMU layer.

No. Devices should not know or care whether an IOMMU is present.

You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
This should then handle IOMMU translation transparently.

You also need to accomodate the the case where multiple IOMMU are present.

Paul

Re: [Qemu-devel] [PATCH 1/2] virtio-serial: Check if virtio queue is ready before consuming data

2010-07-14 Thread Alon Levy


- "Amit Shah"  wrote:

> If a virtio-serial port is removed before the guest comes up and
> initialises the virtqueues, qemu exits with the message
> 
> Guest moved used index from 0 to 61440
> 
> This happens because we try to clear any pending buffers from the
> virtqueue.
> 
> Ensure the virtqueue is initialised before calling any virtqueue
> operations.
> 
> Signed-off-by: Amit Shah 
> ---
>  hw/virtio-serial-bus.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> index 7f9d28f..b89daa6 100644
> --- a/hw/virtio-serial-bus.c
> +++ b/hw/virtio-serial-bus.c
> @@ -139,6 +139,9 @@ static void flush_queued_data(VirtIOSerialPort
> *port, bool discard)
>  {
>  assert(port);
>  
> +if (!virtio_queue_ready(port->ovq)) {
> +return;
> +}
>  do_flush_queued_data(port, port->ovq, &port->vser->vdev,
> discard);
>  }
>  
> -- 
> 1.7.0.1

ACK series

Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-07-14 Thread Cam Macdonell

On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata  wrote:
> On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote:
>> On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata  
>> wrote:
>> > On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote:
>> >> >> > Seabios completely ignore the 64-bitness of the BAR. ?Looks like it 
>> >> >> > also
>> >> >> > thinks the second half of the BAR is an I/O region instead of memory 
>> >> >> > (hence
>> >> >> > the c200, that's part of the pci portio region.
>> >> >
>> >> > I've sent the patches to address it. But they haven't been merged yet.
>> >> > seabios doesn't map BARs beyond 4GB.
>> >> > If bar is mapped beyond 4GB, guest BIOS does it.
>> >>
>> >> Have those patches been merged yet?
>> >
>> > They have been merged into seabios upstream now.
>> > qemu seabios fork hasn't pulled for a while, though.
>> >
>> >
>> >> > To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL
>> >> > in config.h of seabios
>> >>
>> >> Where does the output from seabios end up? ?Inside dmesg?
>> >
>> > It outputs them to the serial console which qemu emulates.
>> > seabios is out of kernel control, so dmesg doesn't show it.
>> >
>> >
>> >> >> pci_read_config: (val) 0x0 <- 0x1c (addr)
>> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> >> >> pci_read_config: (val) 0x <- 0x1c (addr)
>> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> >> >> pci_read_config: (val) 0x0 <- 0x1c (addr)
>> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> >> >
>> >> > seabios BAR3. Not sure how it is mapped from this
>> >> > message.
>> >>
>> >> Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and
>> >> BAR3 to store all 64-bits?
>> >
>> > Yes. Seabios misbehaves. 64bit bar is(was) a missing feature.
>> > --
>> > yamahata
>> >
>> >
>>
>> With the latest seabios git passed via -bios, I no longer see the
>> 48-bit address, but instead a 32-bit address and then
>> .  This guest has 1gb of RAM so the address isn't be
>> mapped beyond 4g.
>
> Can I see the debug log like before?
> (hopefully seabios with CONFIG_DEBUG_LEVEL enabled.)

Here's the dump from SeaBIOS in the region related to the PCI devices.
 The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit.

PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8
region 0: 0xf000
region 1: 0xf200
region 6: 0xf201
PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000
region 0: 0xc020
region 1: 0xf202
region 6: 0xf203
PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
region 0: 0xf204
region 1: 0xf2041000
region 2: 0x

>
> Do you know who sets the BAR to ?

Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM'
lines are from the map function passed to pci_register_bar().  It
looks like SeaBIOS sets the address to 0 and then the potentially
useful e000 address gets mangled into 00.

IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 536870912

...snip...

pci_read_config: (val) 0x4 <- 0x18 (addr)
pci_write_config: (val) 0x0 -> 0x18 (addr)
IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000
pci_read_config: (val) 0xe004 <- 0x18 (addr)
pci_write_config: (val) 0x0 -> 0x18 (addr)
pci_read_config: (val) 0x0 <- 0x1c (addr)
pci_write_config: (val) 0x0 -> 0x1c (addr)
IVSHMEM: guest pci addr = , guest h/w addr =
1090912256, size = 2000
pci_read_config: (val) 0x <- 0x1c (addr)
pci_write_config: (val) 0x0 -> 0x1c (addr)

and with the 64-bit guest I get this error as well (recall the guest
fails to boot on 64-bit)

BUG: kvm_dirty_pages_log_change: invalid parameters
f000-f0ff

> If it's seabios, does the following patch help?

The patch doesn't make a difference.  Perhaps it's Qemu then?

>
> diff --git a/src/pciinit.c b/src/pciinit.c
> index b110531..ce07709 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -117,11 +117,7 @@ static int pci_bios_allocate_region(u16 bdf, int 
> region_num)
>     int is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
>         (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == 
> PCI_BASE_ADDRESS_MEM_TYPE_64;
>     if (is_64bit) {
> -        if (size > 0) {
> -            pci_config_writel(bdf, ofs + 4, 0);
> -        } else {
> -            pci_config_writel(bdf, ofs + 4, ~0);
> -        }
> +        pci_config_writel(bdf, ofs + 4, 0);
>     }
>     return is_64bit;
>  }
>
>
>
>> IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 
>> 2000
>> IVSHMEM: guest pci addr = , guest h/w addr =
>> 1090912256, size = 2000
>>
>> However, booting fails when I use a 64-bit BAR.  Booting is fine with
>> a 32-bit BAR.
>>
>> HPET: 1 timers in total, 0 timers will be used for per-cpu timer
>> divide error:  [#1] SMP last sysfs file:
>> CPU 0
>> Modules linked in:
>>
>> Pid: 1, comm: swapper Not tainted 2.6.35-rc1 #280 /Bochs
>> RIP: 0010:[]  [] h

[Qemu-devel] Re: Virtualization at Plumbers 2010 - Time to submit your proposals!

2010-07-14 Thread Jes Sorensen

Hi,

I would like to remind everybody that the deadline for submitting a
proposal to Linux Plumbers 2010 is nearing, so this is the time to get
your proposals in. The deadline is July 19th!

There has been some confusion about the submission process and I have
only just found out myself what is the correct way to do it. You need to
submit your proposal via the following link:

http://www.linuxplumbersconf.org/2010/ocw/events/LPC2010MC/proposals

If you have already submitted a Virtualization related proposal please
check that it went in to the mini conference submission. If it didn't
please re-submit it. I apologize for this confusion!

Hope to see plenty of proposals showing up before the deadling on the 19th!

Best regards,
Jes




On 06/18/10 14:00, Jes Sorensen wrote:
> Hi,
> 
> I would like to remind people about the Virtualization track at Linux
> Plumbers Conference 2010, held in Cambridge, MA, November 3-5, 2010.
> 
> Please note the deadline for submissions is July 19, 2010.
> 
> LPC is particular well suited for technical presentations, work in
> progress and subjects that needs discussion and collaboration between
> communities (kernel, desktop/gfx, virtualization, etc.), so if you have
> a contentious issue you would like to bring to a wider audience, this is
> the ideal place to do it!
> 
> Note that this track is focusing on general Linux Virtualization, it
> is not hypervisor specific. Submissions related to Xen, KVM, VMware,
> containers, etc. are encouraged. Subjects could include:
> 
>  - Linux Kernel Virtualization enhancements
>  - QEMU
>  - IO performance work
>  - Device management, hotplug
>  - NUMA awareness
>  - Live migration
>  - Support for new hardware features, and/or provide guest access to
>these features.
>  - Device emulation
>  - Para-virtual enhancements: special filesystems, PMU, Windows
>drivers, etc.
>  - Debugging and analysis tools
>  - Containers
>  - QMP and Spice
>  - Virtualization management, user interfaces, and desktop integration
>(GNOME, KDE, etc)
> 
> If you have a subject you would like to present, please submit it as
> soon as possible, and no later than July 19th. Please see the full Call
> For Papers at http://www.linuxplumbersconf.org/2010/ for how to submit.
> 
> You may also want to list your ideas at the LPC Virtualization session
> wiki page at http://wiki.linuxplumbersconf.org/2010:virtualization
> 
> Hope to see you in Cambridge in November!
> 
> Jes

[Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Anthony Liguori

CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori 
---
 block.c |4 ++
 block/raw.c |  115 +++
 block_int.h |1 +
 3 files changed, 120 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 65cf4dc..f837876 100644
--- a/block.c
+++ b/block.c
@@ -511,6 +511,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
   BlockDriver *drv)
 {
 int ret;
+int probed = 0;
 
 if (flags & BDRV_O_SNAPSHOT) {
 BlockDriverState *bs1;
@@ -571,6 +572,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 /* Find the right image format driver */
 if (!drv) {
 drv = find_image_format(filename);
+probed = 1;
 }
 
 if (!drv) {
@@ -584,6 +586,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 goto unlink_and_fail;
 }
 
+bs->probed = probed;
+
 /* If there is a backing file, use it */
 if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
 char backing_filename[PATH_MAX];
diff --git a/block/raw.c b/block/raw.c
index 4406b8c..92e41d8 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -9,15 +9,82 @@ static int raw_open(BlockDriverState *bs, int flags)
 return 0;
 }
 
+/* check for the user attempting to write something that looks like a
+   block format header to the beginning of the image and fail out.
+*/
+static int check_for_block_signature(BlockDriverState *bs, const uint8_t *buf)
+{
+static const uint8_t signatures[][4] = {
+{ 'Q', 'F', 'I', 0xfb }, /* qcow/qcow2 */
+{ 'C', 'O', 'W', 'D' }, /* VMDK3 */
+{ 'V', 'M', 'D', 'K' }, /* VMDK4 */
+{ 'O', 'O', 'O', 'M' }, /* UML COW */
+{}
+};
+int i;
+
+for (i = 0; signatures[i][0] != 0; i++) {
+if (memcmp(buf, signatures[i], 4) == 0) {
+return 1;
+}
+}
+
+return 0;
+}
+
+static int check_write_unsafe(BlockDriverState *bs, int64_t sector_num,
+  const uint8_t *buf, int nb_sectors)
+{
+/* assume that if the user specifies the format explicitly, then assume
+   that they will continue to do so and provide no safety net */
+if (!bs->probed) {
+return 0;
+}
+
+if (sector_num == 0 && nb_sectors > 0) {
+return check_for_block_signature(bs, buf);
+}
+
+return 0;
+}
+
 static int raw_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
 return bdrv_read(bs->file, sector_num, buf, nb_sectors);
 }
 
+static int raw_write_scrubbed_bootsect(BlockDriverState *bs,
+   const uint8_t *buf)
+{
+uint8_t bootsect[512];
+
+/* scrub the dangerous signature */
+memcpy(bootsect, buf, 512);
+memset(bootsect, 0, 4);
+
+return bdrv_write(bs->file, 0, bootsect, 1);
+}
+
 static int raw_write(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors)
 {
+if (check_write_unsafe(bs, sector_num, buf, nb_sectors)) {
+int ret;
+
+ret = raw_write_scrubbed_bootsect(bs, buf);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_write(bs->file, 1, buf + 512, nb_sectors - 1);
+if (ret < 0) {
+return ret;
+}
+
+return ret + 512;
+}
+
 return bdrv_write(bs->file, sector_num, buf, nb_sectors);
 }
 
@@ -28,10 +95,58 @@ static BlockDriverAIOCB *raw_aio_readv(BlockDriverState

[Qemu-devel] Re: [PATCH] set proper migration status on ->write error (v5)

2010-07-14 Thread Luiz Capitulino

On Tue, 13 Jul 2010 21:01:33 -0300
Marcelo Tosatti  wrote:

> 
> If ->write fails, declare migration status as MIG_STATE_ERROR.
> 
> Also, in buffered_file.c, ->close the object in case of an
> error.
> 
> Fixes "migrate -d "exec:dd of=file", where dd fails to open file.
> 
> Signed-off-by: Marcelo Tosatti 

Looks good now, I guess it's good for stable too, isn't it?

Reviewed-by: Luiz Capitulino 

> 
> diff --git a/buffered_file.c b/buffered_file.c
> index 54dc6c2..be147d6 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -222,8 +222,10 @@ static void buffered_rate_tick(void *opaque)
>  {
>  QEMUFileBuffered *s = opaque;
>  
> -if (s->has_error)
> +if (s->has_error) {
> +buffered_close(s);
>  return;
> +}
>  
>  qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
>  
> diff --git a/migration.c b/migration.c
> index b49964c..f8e6325 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -316,8 +316,14 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
> *data, size_t size)
>  if (ret == -1)
>  ret = -(s->get_error(s));
>  
> -if (ret == -EAGAIN)
> +if (ret == -EAGAIN) {
>  qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
> +} else if (ret < 0) {
> +if (s->mon) {
> +monitor_resume(s->mon);
> +}
> +s->state = MIG_STATE_ERROR;
> +}
>  
>  return ret;
>  }
>

[Qemu-devel] Re: [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Kevin Wolf

Am 14.07.2010 18:12, schrieb Anthony Liguori:
> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
> trick the block probing code into accessing arbitrary files in a guest.  To
> mitigate this, we added an explicit format parameter to -drive which disabling
> block probing.
> 
> Fast forward to today, and the vast majority of users do not use this 
> parameter.
> libvirt does not use this by default nor does virt-manager.
> 
> Most users want block probing so we should try to make it safer.
> 
> This patch adds some logic to the raw device which attempts to detect a write
> operation to the beginning of a raw device.  If the first 4 bytes happen to
> match an image file that has a backing file that we support, it scrubs the
> signature to all zeros.  If a user specifies an explicit format parameter, 
> this
> behavior is disabled.
> 
> I contend that while a legitimate guest could write such a signature to the
> header, we would behave incorrectly anyway upon the next invocation of QEMU.
> This simply changes the incorrect behavior to not involve a security
> vulnerability.
> 
> I've tested this pretty extensively both in the positive and negative case.  
> I'm
> not 100% confident in the block layer's ability to deal with zero sized writes
> particularly with respect to the aio functions so some additional eyes would 
> be
> appreciated.
> 
> Even in the case of a single sector write, we have to make sure to invoked the
> completion from a bottom half so just removing the zero sized write is not an
> option.
> 
> Signed-off-by: Anthony Liguori 

I guess something like this makes sense, and the approach looks okay in
general. With the check that we have really probed the format, we still
allow legitimate use cases (whatever they might be).

However, I wonder why you even bother with adjusting buffers and
requests and stuff instead of just returning a straight -EIO. Doing so
would have the additional advantage that the expectation of the guest OS
matches what is really on the disk (garbage) instead of silently
corrupting things.

>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>  BlockDriverCompletionFunc *cb, void *opaque)
>  {
> +if (check_write_unsafe(bs, sector_num, qiov->iov[0].iov_base, 
> nb_sectors)) {

Have you checked that the bad value is always in iov[0]? Could a guest
construct a zero-length iov[0] and do the bad access in iov[1]? Or use
two two-byte buffers to write the magic number?

I'm not saying that any of these work, I honestly don't know, but did
you consider them?

Kevin

Re: [Qemu-devel] Re: [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Anthony Liguori


On 07/14/2010 11:42 AM, Kevin Wolf wrote:

Am 14.07.2010 18:12, schrieb Anthony Liguori:
   

CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori
 

I guess something like this makes sense, and the approach looks okay in
general. With the check that we have really probed the format, we still
allow legitimate use cases (whatever they might be).

However, I wonder why you even bother with adjusting buffers and
requests and stuff instead of just returning a straight -EIO. Doing so
would have the additional advantage that the expectation of the guest OS
matches what is really on the disk (garbage) instead of silently
corrupting things.
   


I started with that approach.  My concern is that it would trigger the 
stop-on-error behavior and the result would be far too difficult for a 
management tool/person to deal with.


Scrubbing seemed like a easier-to-use solution.


  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
  BlockDriverCompletionFunc *cb, void *opaque)
  {
+if (check_write_unsafe(bs, sector_num, qiov->iov[0].iov_base, nb_sectors)) 
{
 

Have you checked that the bad value is always in iov[0]? Could a guest
construct a zero-length iov[0] and do the bad access in iov[1]? Or use
two two-byte buffers to write the magic number?

I'm not saying that any of these work, I honestly don't know, but did
you consider them?
   


No, and it's certainly worth being a bit more paranoid.

Regards,

Anthony Liguori


Kevin

[Qemu-devel] [PATCH] Make default invocation of block drivers safer (v2)

2010-07-14 Thread Anthony Liguori

CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - be more paranoid about empty iovecs
---
 block.c |4 ++
 block/raw.c |  129 +++
 block_int.h |1 +
 3 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 65cf4dc..f837876 100644
--- a/block.c
+++ b/block.c
@@ -511,6 +511,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
   BlockDriver *drv)
 {
 int ret;
+int probed = 0;
 
 if (flags & BDRV_O_SNAPSHOT) {
 BlockDriverState *bs1;
@@ -571,6 +572,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 /* Find the right image format driver */
 if (!drv) {
 drv = find_image_format(filename);
+probed = 1;
 }
 
 if (!drv) {
@@ -584,6 +586,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 goto unlink_and_fail;
 }
 
+bs->probed = probed;
+
 /* If there is a backing file, use it */
 if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
 char backing_filename[PATH_MAX];
diff --git a/block/raw.c b/block/raw.c
index 4406b8c..d250c0d 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -9,15 +9,82 @@ static int raw_open(BlockDriverState *bs, int flags)
 return 0;
 }
 
+/* check for the user attempting to write something that looks like a
+   block format header to the beginning of the image and fail out.
+*/
+static int check_for_block_signature(BlockDriverState *bs, const uint8_t *buf)
+{
+static const uint8_t signatures[][4] = {
+{ 'Q', 'F', 'I', 0xfb }, /* qcow/qcow2 */
+{ 'C', 'O', 'W', 'D' }, /* VMDK3 */
+{ 'V', 'M', 'D', 'K' }, /* VMDK4 */
+{ 'O', 'O', 'O', 'M' }, /* UML COW */
+{}
+};
+int i;
+
+for (i = 0; signatures[i][0] != 0; i++) {
+if (memcmp(buf, signatures[i], 4) == 0) {
+return 1;
+}
+}
+
+return 0;
+}
+
+static int check_write_unsafe(BlockDriverState *bs, int64_t sector_num,
+  const uint8_t *buf, int nb_sectors)
+{
+/* assume that if the user specifies the format explicitly, then assume
+   that they will continue to do so and provide no safety net */
+if (!bs->probed) {
+return 0;
+}
+
+if (sector_num == 0 && nb_sectors > 0) {
+return check_for_block_signature(bs, buf);
+}
+
+return 0;
+}
+
 static int raw_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
 return bdrv_read(bs->file, sector_num, buf, nb_sectors);
 }
 
+static int raw_write_scrubbed_bootsect(BlockDriverState *bs,
+   const uint8_t *buf)
+{
+uint8_t bootsect[512];
+
+/* scrub the dangerous signature */
+memcpy(bootsect, buf, 512);
+memset(bootsect, 0, 4);
+
+return bdrv_write(bs->file, 0, bootsect, 1);
+}
+
 static int raw_write(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors)
 {
+if (check_write_unsafe(bs, sector_num, buf, nb_sectors)) {
+int ret;
+
+ret = raw_write_scrubbed_bootsect(bs, buf);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_write(bs->file, 1, buf + 512, nb_sectors - 1);
+if (ret < 0) {
+return ret;
+}
+
+return ret + 512;
+}
+
 return bdrv_write(bs->file, sector_num, buf, nb_sectors);
 }
 
@@ -28,10 +95,72 @@ sta

[Qemu-devel] [PATCH] Create USB buses and devices based on USB version.

2010-07-14 Thread David Ahern

Create USB buses and devices based on USB version.

Signed-off-by: David Ahern 
---
 hw/usb-bus.c|   70 ---
 hw/usb-msd.c|2 +-
 hw/usb-net.c|2 +-
 hw/usb-ohci.c   |2 +-
 hw/usb-serial.c |4 +-
 hw/usb-uhci.c   |2 +-
 hw/usb.h|7 +++--
 usb-bsd.c   |2 +-
 usb-linux.c |2 +-
 9 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/hw/usb-bus.c b/hw/usb-bus.c
index b692503..f4849f8 100644
--- a/hw/usb-bus.c
+++ b/hw/usb-bus.c
@@ -4,6 +4,12 @@
 #include "sysemu.h"
 #include "monitor.h"
 
+enum {
+USB_VERSION_NONE,
+USB_VERSION_1_1,
+};
+
+
 static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent);
 
 static struct BusInfo usb_bus_info = {
@@ -14,27 +20,46 @@ static struct BusInfo usb_bus_info = {
 static int next_usb_bus = 0;
 static QTAILQ_HEAD(, USBBus) busses = QTAILQ_HEAD_INITIALIZER(busses);
 
-void usb_bus_new(USBBus *bus, DeviceState *host)
+static void usb_bus_new(USBBus *bus, int version, DeviceState *host)
 {
 qbus_create_inplace(&bus->qbus, &usb_bus_info, host, NULL);
 bus->busnr = next_usb_bus++;
 bus->qbus.allow_hotplug = 1; /* Yes, we can */
+bus->version = version;
 QTAILQ_INIT(&bus->free);
 QTAILQ_INIT(&bus->used);
 QTAILQ_INSERT_TAIL(&busses, bus, next);
 }
 
-USBBus *usb_bus_find(int busnr)
+void usb_bus_new_v1(USBBus *bus, DeviceState *host)
+{
+usb_bus_new(bus, USB_VERSION_1_1, host);
+}
+
+static USBBus *usb_bus_find(int busnr)
+{
+USBBus *bus;
+
+QTAILQ_FOREACH(bus, &busses, next) {
+if (bus->busnr == busnr) {
+break;
+}
+}
+
+return bus;
+}
+
+/* device creation should be using this one */
+USBBus *usb_bus_find_version(int version)
 {
 USBBus *bus;
 
-if (-1 == busnr)
-return QTAILQ_FIRST(&busses);
 QTAILQ_FOREACH(bus, &busses, next) {
-if (bus->busnr == busnr)
-return bus;
+if (bus->version == version) {
+break;
+}
 }
-return NULL;
+return bus;
 }
 
 static int usb_qdev_init(DeviceState *qdev, DeviceInfo *base)
@@ -80,20 +105,15 @@ void usb_qdev_register_many(USBDeviceInfo *info)
 }
 }
 
-USBDevice *usb_create(USBBus *bus, const char *name)
+static USBDevice *usb_create(USBBus *bus, const char *name)
 {
 DeviceState *dev;
 
-#if 1
-/* temporary stopgap until all usb is properly qdev-ified */
 if (!bus) {
-bus = usb_bus_find(-1);
-if (!bus)
-return NULL;
-fprintf(stderr, "%s: no bus specified, using \"%s\" for \"%s\"\n",
-__FUNCTION__, bus->qbus.name, name);
+fprintf(stderr, "%s: no bus specified for \"%s\"\n",
+__FUNCTION__, name);
+return NULL;
 }
-#endif
 
 dev = qdev_create(&bus->qbus, name);
 return DO_UPCAST(USBDevice, qdev, dev);
@@ -101,7 +121,13 @@ USBDevice *usb_create(USBBus *bus, const char *name)
 
 USBDevice *usb_create_simple(USBBus *bus, const char *name)
 {
-USBDevice *dev = usb_create(bus, name);
+USBDevice *dev;
+
+/* if bus not given default to USB 1.1 */
+if (!bus)
+bus = usb_bus_find_version(USB_VERSION_1_1);
+
+dev = usb_create(bus, name);
 if (!dev) {
 hw_error("Failed to create USB device '%s'\n", name);
 }
@@ -109,6 +135,13 @@ USBDevice *usb_create_simple(USBBus *bus, const char *name)
 return dev;
 }
 
+/* create USB device attached to USB 1.1 controller */
+USBDevice *usb_create_v1(const char *name)
+{
+USBBus *bus = usb_bus_find_version(USB_VERSION_1_1);
+return usb_create(bus, name);
+}
+
 void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
usb_attachfn attach)
 {
@@ -260,7 +293,6 @@ void usb_info(Monitor *mon)
 /* handle legacy -usbdevice cmd line option */
 USBDevice *usbdevice_create(const char *cmdline)
 {
-USBBus *bus = usb_bus_find(-1 /* any */);
 DeviceInfo *info;
 USBDeviceInfo *usb;
 char driver[32];
@@ -302,7 +334,7 @@ USBDevice *usbdevice_create(const char *cmdline)
 error_report("usbdevice %s accepts no params", driver);
 return NULL;
 }
-return usb_create_simple(bus, usb->qdev.name);
+return usb_create_simple(NULL, usb->qdev.name);
 }
 return usb->usbdevice_init(params);
 }
diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 65e9624..d8b68f7 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -606,7 +606,7 @@ static USBDevice *usb_msd_init(const char *filename)
 }
 
 /* create guest device */
-dev = usb_create(NULL /* FIXME */, "usb-storage");
+dev = usb_create_v1("usb-storage");
 if (!dev) {
 return NULL;
 }
diff --git a/hw/usb-net.c b/hw/usb-net.c
index a43bd17..b8936c5 100644
--- a/hw/usb-net.c
+++ b/hw/usb-net.c
@@ -1484,7 +1484,7 @@ static USBDevice *usb_net_init(const char *cmdline)
 return NULL;
 }
 
-dev = usb_create(NULL

[Qemu-devel] [PATCH] loadvm: improve tests before bdrv_snapshot_goto()

2010-07-14 Thread Miguel Di Ciurcio Filho

This patch improves the resilience of the load_vmstate() function, doing
further and better ordered tests.

In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the
error is on VM state device, load_vmstate() will return zero and the VM will be
started with major corruption chances.

The current process:
- test if there is any writable device without snapshot support
- if exists return -error
- get the device that saves the VM state, possible return -error but unlikely
because it was tested earlier
- flush I/O
- run bdrv_snapshot_goto() on devices
- if fails, give an warning and goes to the next (not good!)
- if fails on the VM state device, return zero (not good!)
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero

New behavior:
- get the device that saves the VM state
- if fails return -error
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- test if there is any writable device without snapshot support
- if exists return -error
- test if the devices with snapshot support have the requested snapshot
- if anyone fails, return -error
- flush I/O
- run snapshot_goto() on devices
- if anyone fails, return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero

do_loadvm must not call vm_start if any error has occurred in load_vmstate.

Signed-off-by: Miguel Di Ciurcio Filho 
---
 monitor.c |3 +-
 savevm.c  |   71 
 2 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/monitor.c b/monitor.c
index 45fd482..aa60cfa 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2270,8 +2270,9 @@ static void do_loadvm(Monitor *mon, const QDict *qdict)
 
 vm_stop(0);
 
-if (load_vmstate(name) >= 0 && saved_vm_running)
+if (load_vmstate(name) == 0 && saved_vm_running) {
 vm_start();
+}
 }
 
 int monitor_get_fd(Monitor *mon, const char *fdname)
diff --git a/savevm.c b/savevm.c
index ee27989..9f29cb0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1804,12 +1804,25 @@ void do_savevm(Monitor *mon, const QDict *qdict)
 
 int load_vmstate(const char *name)
 {
-BlockDriverState *bs, *bs1;
+BlockDriverState *bs, *bs_vm_state;
 QEMUSnapshotInfo sn;
 QEMUFile *f;
 int ret;
 
-/* Verify if there is a device that doesn't support snapshots and is 
writable */
+bs_vm_state = bdrv_snapshots();
+if (!bs_vm_state) {
+error_report("No block device supports snapshots");
+return -EINVAL;
+}
+
+/* Don't even try to load empty VM states */
+ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
+if ((ret >= 0) && (sn.vm_state_size == 0)) {
+return -EINVAL;
+}
+
+/* Verify if there is any device that doesn't support snapshots and is
+writable and check if the requested snapshot is available too. */
 bs = NULL;
 while ((bs = bdrv_next(bs))) {
 
@@ -1821,64 +1834,46 @@ int load_vmstate(const char *name)
 error_report("Device '%s' is writable but does not support 
snapshots.",
bdrv_get_device_name(bs));
 return -ENOTSUP;
+} else {
+ret = bdrv_snapshot_find(bs, &sn, name);
+if (ret < 0) {
+error_report("Device '%s' does not have the requested snapshot 
'%s'",
+   bdrv_get_device_name(bs), name);
+return ret;
+}
 }
 }
 
-bs = bdrv_snapshots();
-if (!bs) {
-error_report("No block device supports snapshots");
-return -EINVAL;
-}
-
 /* Flush all IO requests so they don't interfere with the new state.  */
 qemu_aio_flush();
 
-bs1 = NULL;
-while ((bs1 = bdrv_next(bs1))) {
-if (bdrv_can_snapshot(bs1)) {
-ret = bdrv_snapshot_goto(bs1, name);
+bs = NULL;
+while ((bs = bdrv_next(bs))) {
+if (bdrv_can_snapshot(bs)) {
+ret = bdrv_snapshot_goto(bs, name);
 if (ret < 0) {
-switch(ret) {
-case -ENOTSUP:
-error_report("%sSnapshots not supported on device '%s'",
- bs != bs1 ? "Warning: " : "",
- bdrv_get_device_name(bs1));
-break;
-case -ENOENT:
-error_report("%sCould not find snapshot '%s' on device 
'%s'",
- bs != bs1 ? "Warning: " : "",
- name, bdrv_get_device_name(bs1));
-break;
-default:
-error_report("%sError %d while activ

[Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Joerg Roedel

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

There are real devices that care very much about an IOMMU. Basically all
devices supporting ATS care about that. So I don't see a problem if the
device emulation code of qemu also cares about present IOMMUs.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.

That's not a good idea imho. With an IOMMU the device no longer accesses
cpu physical memory. It accesses device virtual memory. Using
cpu_physical_memory* functions in device code becomes misleading when
the device virtual address space differs from cpu physical. So different
functions for devices make a lot of sense here. Another reason for
seperate functions is that we can extend them later to support emulation
of ATS devices.

> You also need to accomodate the the case where multiple IOMMU are present.

This, indeed, is something transparent to the device. This should be
handled inside the iommu emulation code.

Joerg

Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Christoph Hellwig

Err, strong NACK.  Please don't start messing with the contents of the
data plane, we're getting into real trouble there.  It's perfectly
valid for a guest to create an image inside an image, and with hardware
support for nested virtualization I guess this use case will become
rather common, just as it already is on S/390 with VM.

Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Anthony Liguori


On 07/14/2010 01:43 PM, Christoph Hellwig wrote:

Err, strong NACK.  Please don't start messing with the contents of the
data plane, we're getting into real trouble there.  It's perfectly
valid for a guest to create an image inside an image, and with hardware
support for nested virtualization I guess this use case will become
rather common, just as it already is on S/390 with VM.
   


Then we have to remove block format probing.

The two things are fundamentally incompatible.

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Anthony Liguori


On 07/14/2010 01:43 PM, Christoph Hellwig wrote:

Err, strong NACK.  Please don't start messing with the contents of the
data plane, we're getting into real trouble there.  It's perfectly
valid for a guest to create an image inside an image, and with hardware
support for nested virtualization I guess this use case will become
rather common, just as it already is on S/390 with VM.
   


Note, this is limited to raw.

If you use qcow2 in L0, the guest can do whatever it wants to do.

But if we autoprobe a raw image, we simply cannot let a guest turn the 
raw image into something else.


A user can still use raw images and let them contain anything, but they 
have to explicitly specify that it's a raw image.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Aurelien Jarno

On Wed, Jul 14, 2010 at 08:43:11PM +0200, Christoph Hellwig wrote:
> Err, strong NACK.  Please don't start messing with the contents of the
> data plane, we're getting into real trouble there.  It's perfectly
> valid for a guest to create an image inside an image, and with hardware
> support for nested virtualization I guess this use case will become
> rather common, just as it already is on S/390 with VM.
> 

Maybe it should only be done on the hard drive used to boot?


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer

2010-07-14 Thread Anthony Liguori


On 07/14/2010 01:54 PM, Aurelien Jarno wrote:

On Wed, Jul 14, 2010 at 08:43:11PM +0200, Christoph Hellwig wrote:
   

Err, strong NACK.  Please don't start messing with the contents of the
data plane, we're getting into real trouble there.  It's perfectly
valid for a guest to create an image inside an image, and with hardware
support for nested virtualization I guess this use case will become
rather common, just as it already is on S/390 with VM.

 

Maybe it should only be done on the hard drive used to boot?
   


It's just as dangerous on any other disk.  My use of "bootsector" in the 
code is probably misleading.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH v2] QMP: Introduce the documentation for query-qdm

2010-07-14 Thread Luiz Capitulino

On Tue, 13 Jul 2010 18:20:20 +0200
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Tue, 13 Jul 2010 13:49:46 +0200
> > Markus Armbruster  wrote:
> >
> >> [cc: kraxel]
> >> 
> >> I didn't get around to review v1.  Sorry.
> >> 
> >> Miguel Di Ciurcio Filho  writes:
> >> 
> >> > Changelog from v1
> >> > -
> >> > - renamed "props" to "properties"
> >> > - updated the examples
> >> > - reworded the explanations of "name" and "description"
> >> > - split "type" into a json-object, adding "qmp" and "qdev"
> >> > - list all possible values for "bus"
> >> > - list all possible values for "qdev" on "type"
> >> > - list all possible values for "qmp" on "type"
> >> > ---
> >> >  qemu-monitor.hx |   88 
> >> > +++
> >> >  1 files changed, 88 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/qemu-monitor.hx b/qemu-monitor.hx
> >> > index 2af3de6..0055d0a 100644
> >> > --- a/qemu-monitor.hx
> >> > +++ b/qemu-monitor.hx
> >> > @@ -2490,6 +2490,94 @@ STEXI
> >> >  show device tree
> >> >  @item info qdm
> >> >  show qdev device model list
> >> > +ETEXI
> >> > +SQMP
> >> > +query-qdm
> >> > +-
> >> > +
> >> > +Describe the capabilities of all devices registered with qdev.
> >> > +
> >> > +The returned output is a json-array, each element is a json-object 
> >> > describing
> >> > +a single device type.
> >> > +
> >> > +Each json-object contains the following:
> >> > +
> >> > +- "name": name of the device (json-string)
> >> > +- "bus": the name of the bus type for the device (json-string)
> >> > +- Possible values: PCI, SCSI, I2C, ISA, SSI, USB, 
> >> > virtio-serial-bus, System
> >> 
> >> Missing: IDE (hw/ide/qdev.c) and s390-virtio (hw/s390-virtio-bus.c).
> >
> > What about "System", is it ok to expose it?
> 
> What makes it special?

Not sure I got what you meant.

> 
> >> > +- "alias": an alias by which the device is also known (json-string, 
> >> > optional)
> >> > +- "description": description of the device  (json-string, optional)
> >> > +- "creatable": whether this device can be created on command line 
> >> > (json-boolean)
> >> 
> >> "on the command line" is misleading; it applies to monitor (human & QMP)
> >> as well.
> >> 
> >> "by the user"?
> >
> > When is a device not "creatable"?
> 
> Stuff like i440FX: if your board comes with it, you already got it, if
> not, you can't have it.

Ok, so it's about devices that can be dynamic created and hot plugged, right?
In this case I think the text should read something like:

 "the device can be dynamically created"

> >> > +- "properties": a list where each element is an json-object that 
> >> > describes a
> >> > +  property of the device. Each json-object contains the following:
> >> > + - "name": the name of the property (json-string)
> >> > + - "type": a json-object that contains the following:
> >> > +- "qdev": the internal name of the type of the property 
> >> > (json-string)
> >> > +- Possible values: uint8, uint16, uint32, uint64, int32, 
> >> > macaddr,
> >> > +  drive, chr, string, netdev, bit, taddr
> >
> > Didn't see this before, but we should not use too short names like "chr" and
> > "taddr".
> >
> >> > +- "qmp": the json equivalent type of the internal type 
> >> > (json-string)
> >> > +- Possible values: integer, string, boolean
> >> 
> >> Fairly close to JSON Schema, but there are differences.
> >> 
> >> Do we need "qdev"?  Is exposing it wise?  Smells a bit too much of
> >> internal detail for comfort...
> >
> > Agreed, although this was a suggestion from me.
> >
> >> Could we use "type" just like JSON Schema?  Drop "qdev" or move it out
> >> of "type", then make "type" what its member "qmp" is now.
> >
> > The most important point here is to try to figure out how clients are
> > going to use this. Actually, the fact that libvirt doesn't actually
> > need this makes me wonder if we should have it in the first place, at
> > the same time listing the property name w/o saying what type it accepts
> > doesn't seem to be that useful either.
> 
> Adding stuff we need when we need it is easier and less painfull than
> taking out stuff when we realize we shouldn't have added it :)
> 
> > Let's look at all possibilities:
> >
> >  1. Drop "qmp" key and rename "qdev" to "type" (ie. use the qdev type only)
> >
> > This was the original proposal. I don't like it because its values
> > are quite low-level and have no meaning in json.
> >
> > If this is needed, then I think it needs a complement specifying the
> > json type for it.
> >
> >  2. Drop "qdev" key and rename "qmp" to "type" (ie. use the json type only)
> 
> Let's start with this, and add more information as needed.

I would be ok with that, I guess.

> > This is what makes sense when self-documenting the protocol, however,
> > given the original proposal, I wonder if this is going to be enough.
> >
> > That is, say

[Qemu-devel] [PATCH] kvm: Don't walk memory_size == 0 slots in kvm_client_migration_log

2010-07-14 Thread Alex Williamson

If we've unregistered a memory area, we should avoid calling
qemu_get_ram_ptr() on the left over phys_offset cruft in the
slot array.  Now that we support removing ramblocks, the
phys_offset ram_addr_t can go away and cause a lookup fault
and abort.

Signed-off-by: Alex Williamson 
---

 kvm-all.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index fec6d05..2922292 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -282,6 +282,9 @@ static int kvm_set_migration_log(int enable)
 for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
 mem = &s->slots[i];
 
+if (!mem->memory_size) {
+continue;
+}
 if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {
 continue;
 }

Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Paul Brook

> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > > Memory accesses must go through the IOMMU layer.
> > 
> > No. Devices should not know or care whether an IOMMU is present.
> 
> There are real devices that care very much about an IOMMU. Basically all
> devices supporting ATS care about that. So I don't see a problem if the
> device emulation code of qemu also cares about present IOMMUs.
> 
> > You should be adding a DeviceState argument to
> > cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> > transparently.
> 
> That's not a good idea imho. With an IOMMU the device no longer accesses
> cpu physical memory. It accesses device virtual memory. Using
> cpu_physical_memory* functions in device code becomes misleading when
> the device virtual address space differs from cpu physical. 

Well, ok, the function name needs fixing too.  However I think the only thing 
missing from the current API is that it does not provide a way to determine 
which device is performing the access.

Depending how the we decide to handle IOMMU invalidation, it may also be 
necessary to augment the memory_map API to allow the system to request a 
mapping be revoked.  However this issue is not specific to the IOMMU 
implementation. Such bugs are already present on any system that allows 
dynamic reconfiguration of the address space, e.g. by changing PCI BARs.

> So different
> functions for devices make a lot of sense here. Another reason for
> seperate functions is that we can extend them later to support emulation
> of ATS devices.

I disagree. ATS should be an independent feature, and is inherently bus 
specific.  As usual the PCI spec is not publicly available, but based on the 
AMD IOMMU docs I'd say that ATS is completely independent of memory accesses - 
the convention being that you trust an ATS capable device to DTRT, and 
configure the bus IOMMU to apply a flat mapping for accesses from such 
devices.

> > You also need to accomodate the the case where multiple IOMMU are
> > present.
> 
> This, indeed, is something transparent to the device. This should be
> handled inside the iommu emulation code.

I think you've got the abstraction boundaries all wrong.

A device performs a memory access on its local bus. It has no knowledge of how 
that access is routed to its destination.  The device should not be aware of 
any IOMMUs, in the same way that it doesn't know whether it happens to be 
accessing RAM or memory mapped peripherals on another device.

Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
between a PCI bus and the system bus. It provides a address mapping from one 
bus to another. 

There should be no direct interaction between an IOMMU and a device (ignoring 
ATS, which is effectively a separate data channel).  Everything should be done 
via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
IOMMUs there should be no direct interatcion between these. 
cpu_physical_memory_* should walk the device/bus tree to determine where the 
access terminates, applying mappings appropriately.

Paul

Re: [Qemu-devel] [RFC PATCH 2/7] AMD IOMMU emulation

2010-07-14 Thread Paul Brook

> +  --enable-amd-iommu-emul) amd_iommu="yes"

> +#ifdef CONFIG_AMD_IOMMU
> +amd_iommu_init(pci_bus);
> +#endif

This should not be a configure option.

Paul

Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Anthony Liguori


On 07/14/2010 03:13 PM, Paul Brook wrote:

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
 

Memory accesses must go through the IOMMU layer.
 

No. Devices should not know or care whether an IOMMU is present.
   

There are real devices that care very much about an IOMMU. Basically all
devices supporting ATS care about that. So I don't see a problem if the
device emulation code of qemu also cares about present IOMMUs.

 

You should be adding a DeviceState argument to
cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
transparently.
   

That's not a good idea imho. With an IOMMU the device no longer accesses
cpu physical memory. It accesses device virtual memory. Using
cpu_physical_memory* functions in device code becomes misleading when
the device virtual address space differs from cpu physical.
 

Well, ok, the function name needs fixing too.  However I think the only thing
missing from the current API is that it does not provide a way to determine
which device is performing the access.
   


I agree with Paul.

The right approach IMHO is to convert devices to use bus-specific 
functions to access memory.  The bus specific functions should have a 
device argument as the first parameter.


For PCI-based IOMMUs, the implementation exists solely within the PCI 
bus.  For platforms (like SPARC) that have lower level IOMMUs, we would 
need to probably introduce a sysbus memory access layer and then provide 
a hook to implement an IOMMU there.



Depending how the we decide to handle IOMMU invalidation, it may also be
necessary to augment the memory_map API to allow the system to request a
mapping be revoked.  However this issue is not specific to the IOMMU
implementation. Such bugs are already present on any system that allows
dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
   


That's why the memory_map API today does not allow mappings to persist 
after trips back to the main loop.


Regards,

Anthony Liguori


So different
functions for devices make a lot of sense here. Another reason for
seperate functions is that we can extend them later to support emulation
of ATS devices.
 

I disagree. ATS should be an independent feature, and is inherently bus
specific.  As usual the PCI spec is not publicly available, but based on the
AMD IOMMU docs I'd say that ATS is completely independent of memory accesses -
the convention being that you trust an ATS capable device to DTRT, and
configure the bus IOMMU to apply a flat mapping for accesses from such
devices.

   

You also need to accomodate the the case where multiple IOMMU are
present.
   

This, indeed, is something transparent to the device. This should be
handled inside the iommu emulation code.
 

I think you've got the abstraction boundaries all wrong.

A device performs a memory access on its local bus. It has no knowledge of how
that access is routed to its destination.  The device should not be aware of
any IOMMUs, in the same way that it doesn't know whether it happens to be
accessing RAM or memory mapped peripherals on another device.

Each IOMMU is fundamentally part of a bus bridge. For example the bridge
between a PCI bus and the system bus. It provides a address mapping from one
bus to another.

There should be no direct interaction between an IOMMU and a device (ignoring
ATS, which is effectively a separate data channel).  Everything should be done
via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested
IOMMUs there should be no direct interatcion between these.
cpu_physical_memory_* should walk the device/bus tree to determine where the
access terminates, applying mappings appropriately.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Chris Wright

* Anthony Liguori (anth...@codemonkey.ws) wrote:
> On 07/14/2010 03:13 PM, Paul Brook wrote:
> >>On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> Memory accesses must go through the IOMMU layer.
> >>>No. Devices should not know or care whether an IOMMU is present.
> >>There are real devices that care very much about an IOMMU. Basically all
> >>devices supporting ATS care about that. So I don't see a problem if the
> >>device emulation code of qemu also cares about present IOMMUs.
> >>
> >>>You should be adding a DeviceState argument to
> >>>cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> >>>transparently.
> >>That's not a good idea imho. With an IOMMU the device no longer accesses
> >>cpu physical memory. It accesses device virtual memory. Using
> >>cpu_physical_memory* functions in device code becomes misleading when
> >>the device virtual address space differs from cpu physical.
> >Well, ok, the function name needs fixing too.  However I think the only thing
> >missing from the current API is that it does not provide a way to determine
> >which device is performing the access.
> 
> I agree with Paul.

I do too.

> The right approach IMHO is to convert devices to use bus-specific
> functions to access memory.  The bus specific functions should have
> a device argument as the first parameter.

As for ATS, the internal api to handle the device's dma reqeust needs
a notion of a translated vs. an untranslated request.  IOW, if qemu ever
had a device with ATS support, the device would use its local cache to
translate the dma address and then submit a translated request to the
pci bus (effectively doing a raw cpu physical memory* in that case).

thanks,
-chris

Re: [Qemu-devel] [RFC PATCH 1/7] Generic IOMMU layer

2010-07-14 Thread Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 10:07:20AM +0400, malc wrote:
> On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:
> 
> > This provides an API for abstracting IOMMU functions. Hardware emulation
> > code can use it to request address translation and access checking. In
> > the absence of an emulated IOMMU, no translation/checking happens and
> > I/O goes through as before.
> > 
> > IOMMU emulation code must provide implementation-specific hooks for this
> > layer.
> > 
> 
> [..snip..]
> 
> > +int __iommu_rw(struct iommu *iommu,
> > +   DeviceState *dev,
> > +   target_phys_addr_t addr,
> > +   uint8_t *buf,
> > +   int len,
> > +   int is_write)
> 
> Do not use leading double underscore.
> 
> [..snip..]
> 
> -- 
> mailto:av1...@comtv.ru

Thanks, will fix it.


Eduard

Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks

2010-07-14 Thread Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 04:37:39PM +0900, Isaku Yamahata wrote:
> On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:

[snip]

> >  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> > int instance_size, int devfn,
> > PCIConfigReadFunc *config_read,
> > PCIConfigWriteFunc *config_write)
> >  {
> >  PCIDevice *pci_dev;
> > +int err;
> >  
> >  pci_dev = qemu_mallocz(instance_size);
> >  pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> > @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char 
> > *name,
> >  if (pci_dev == NULL) {
> >  hw_error("PCI: can't register device\n");
> >  }
> > +
> > +err = pci_iommu_register_device(bus, pci_dev);
> > +if (err) {
> > +hw_error("PCI: can't register device with IOMMU\n");
> > +return NULL;
> > +}
> > +
> >  return pci_dev;
> >  }
> 
> pci_register_device() is pre-qdev api.
> qdev'fied device doesn't call pci_register_device().
> So please move the initialization hook into do_pci_register_device()
> which are commonly used by pci_register_device() and pci_qdev_init().
> -- 
> yamahata

Thanks, I didn't need the functionality and missed this.


Eduard

[Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

They don't really care. iommu_get() et al. are convenience functions
which can and do return NULL when there's no IOMMU and device code can
pass that NULL around without checking. I could've probably made the r/w
functions take only the DeviceState in addition to normal args, but
wanted to avoid looking up the related structures on each I/O operation.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.
> 
> You also need to accomodate the the case where multiple IOMMU are present.
> 
> Paul

We don't assume there's a single IOMMU in the generic layer. The
callbacks within 'struct iommu' could very well dispatch the request to
one of multiple, coexisting IOMMUs.

Eduard

Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:

> Well, ok, the function name needs fixing too.  However I think the only thing 
> missing from the current API is that it does not provide a way to determine 
> which device is performing the access.
> 
> Depending how the we decide to handle IOMMU invalidation, it may also be 
> necessary to augment the memory_map API to allow the system to request a 
> mapping be revoked.  However this issue is not specific to the IOMMU 
> implementation. Such bugs are already present on any system that allows 
> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.

Yeah, having a way to alter existing maps would be good. Basically it's
the only place where we truly care about the existence of an IOMMU, and
that's due to AIO.

But it's really tricky to do, unfortunately. I also think having an
abort_doing_io_on_map() kind of notifier would be sufficient. It should
notify back when I/O has been completely stopped.

This should be enough, since we don't expect the results of IDE-like DMA
to be recoverable in case a mapping change occurs, even on real hardware.

> I disagree. ATS should be an independent feature, and is inherently bus 
> specific.  As usual the PCI spec is not publicly available, but based on the 
> AMD IOMMU docs I'd say that ATS is completely independent of memory accesses 
> - 
> the convention being that you trust an ATS capable device to DTRT, and 
> configure the bus IOMMU to apply a flat mapping for accesses from such 
> devices.

ATS is documented in the Hypertransport specs which are publicly
available.

[snip]

> A device performs a memory access on its local bus. It has no knowledge of 
> how 
> that access is routed to its destination.  The device should not be aware of 
> any IOMMUs, in the same way that it doesn't know whether it happens to be 
> accessing RAM or memory mapped peripherals on another device.
> 
> Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
> between a PCI bus and the system bus. It provides a address mapping from one 
> bus to another. 
> 
> There should be no direct interaction between an IOMMU and a device (ignoring 
> ATS, which is effectively a separate data channel).  Everything should be 
> done 
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple 
> nested 
> IOMMUs there should be no direct interatcion between these. 
> cpu_physical_memory_* should walk the device/bus tree to determine where the 
> access terminates, applying mappings appropriately.
> 
> Paul

Admittedly I could make __iommu_rw() repeateadly call itself instead of
doing cpu_physical_memory_rw(). That's what I actually intended, and it
should handle IOMMU nesting. It's a trivial change to do so.

Note that emulating hardware realistically defeats some performance
purposes. It'd make AIO impossible, if we imagine some sort of message
passing scenario (which would be just like the real thing, but a lot
slower).

Eduard

[Qemu-devel] Prestiti Veloci

2010-07-14 Thread Soluzioni finanziarie



Clicca qui per accedere alle soluzioni finanziarie


Novità: Prestito personale Veloce ( delibera entro 60 minuti )
 prestito con cessione del quinto e prestito con delega
Prestito Ipotecario Vitalizio: prendi i soldi e non li rendi 

Aziende in crisi: gestione dei debiti e rilancio dell'attività 
Finanziamenti aziendali garantiti dal fondo di garanzia statale 
Prestito per casalinghe e studenti 
Finanziamenti per aziende e privati
Commercialista On-line 
Hai ricevuto questa e-mail perché hai richiesto l'invio della news
Elimina indirizzo dalla news

[Qemu-devel] [Bug 604872] Re: qemu-system-arm segfaults emulating versatile machine after running debootstrap --second-stage inside vm

2010-07-14 Thread Serge Hallyn

** Changed in: qemu-kvm (Ubuntu)
   Importance: Undecided => Medium

-- 
qemu-system-arm segfaults emulating versatile machine after running debootstrap 
--second-stage inside vm
https://bugs.launchpad.net/bugs/604872
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New
Status in “qemu-kvm” package in Ubuntu: New

Bug description:
Binary package hint: qemu-kvm

As I'm now implementing the support for creating a rootstock rootfs without 
requiring root, I need to run the deboostrap' second stage inside a VM, to 
correctly install the packages into the rootfs.

qemu-system-arm fails right after debootstrap finish the second stage, giving a 
segmentation fault.

Command:
qemu-system-arm -M versatilepb -cpu cortex-a8 -kernel vmlinuz -no-reboot 
-nographic -drive file=qemu-armel-201007122016.img,aio=native,cache=none -m 256 
-append 'console=ttyAMA0,115200n8 root=/dev/sda rw mem=256M devtmpfs.mount=0 
init=/bin/installer'
Uncompressing 
Linux.
 done, booting the kernel.
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32-21-versatile (bui...@cushaw) (gcc version 
4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #32-Ubuntu Fri Apr 16 08:14:53 UTC 2010 (Ubuntu 
2.6.32-21.32-versatile 2.6.32.11+drm33.2)
...
I: Base system installed successfully.
I: Starting basic services in VM
Segmentation fault (core dumped)

[492816.197352] qemu-system-arm[16024]: segfault at cf6ba8fc ip 
cf6ba8fc sp 7fffd0e68680 error 14

Image:
 * rootfs: http://rsalveti.net/pub/ubuntu/rootstock/qemu-armel-201007122016.img 
(md5 1d063ac8a65c798bb004cd1c4c7970c5)
 * kernel: 
http://ports.ubuntu.com/ubuntu-ports/dists/lucid/main/installer-armel/current/images/versatile/netboot/vmlinuz

I'm able to reproduce the bug on Maverick (amd64) and Lucid (x86).

Maverick qemu-kvm-extras: 0.12.4+noroms-0ubuntu4
Lucid qemu-kvm-extras: 0.12.3+noroms-0ubuntu9.2

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: qemu-kvm-extras 0.12.4+noroms-0ubuntu4
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
Architecture: amd64
Date: Mon Jul 12 18:55:35 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
KvmCmdLine: Error: command ['ps', '-C', 'kvm', '-F'] failed with exit code 1: 
UIDPID  PPID  CSZ   RSS PSR STIME TTY  TIME CMD
MachineType: LENOVO 2764CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.35-6-generic root=/dev/mapper/primary-root 
ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: qemu-kvm
dmi.bios.date: 04/19/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 7UET86WW (3.16 )
dmi.board.name: 2764CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: 
dmi:bvnLENOVO:bvr7UET86WW(3.16):bd04/19/2010:svnLENOVO:pn2764CTO:pvrThinkPadT400:rvnLENOVO:rn2764CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2764CTO
dmi.product.version: ThinkPad T400
dmi.sys.vendor: LENOVO

[Qemu-devel] [Bug 604872] Re: qemu-system-arm segfaults emulating versatile machine after running debootstrap --second-stage inside vm

2010-07-14 Thread Serge Hallyn

** Changed in: qemu-kvm (Ubuntu)
   Importance: Medium => Undecided

-- 
qemu-system-arm segfaults emulating versatile machine after running debootstrap 
--second-stage inside vm
https://bugs.launchpad.net/bugs/604872
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New
Status in “qemu-kvm” package in Ubuntu: New

Bug description:
Binary package hint: qemu-kvm

As I'm now implementing the support for creating a rootstock rootfs without 
requiring root, I need to run the deboostrap' second stage inside a VM, to 
correctly install the packages into the rootfs.

qemu-system-arm fails right after debootstrap finish the second stage, giving a 
segmentation fault.

Command:
qemu-system-arm -M versatilepb -cpu cortex-a8 -kernel vmlinuz -no-reboot 
-nographic -drive file=qemu-armel-201007122016.img,aio=native,cache=none -m 256 
-append 'console=ttyAMA0,115200n8 root=/dev/sda rw mem=256M devtmpfs.mount=0 
init=/bin/installer'
Uncompressing 
Linux.
 done, booting the kernel.
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32-21-versatile (bui...@cushaw) (gcc version 
4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #32-Ubuntu Fri Apr 16 08:14:53 UTC 2010 (Ubuntu 
2.6.32-21.32-versatile 2.6.32.11+drm33.2)
...
I: Base system installed successfully.
I: Starting basic services in VM
Segmentation fault (core dumped)

[492816.197352] qemu-system-arm[16024]: segfault at cf6ba8fc ip 
cf6ba8fc sp 7fffd0e68680 error 14

Image:
 * rootfs: http://rsalveti.net/pub/ubuntu/rootstock/qemu-armel-201007122016.img 
(md5 1d063ac8a65c798bb004cd1c4c7970c5)
 * kernel: 
http://ports.ubuntu.com/ubuntu-ports/dists/lucid/main/installer-armel/current/images/versatile/netboot/vmlinuz

I'm able to reproduce the bug on Maverick (amd64) and Lucid (x86).

Maverick qemu-kvm-extras: 0.12.4+noroms-0ubuntu4
Lucid qemu-kvm-extras: 0.12.3+noroms-0ubuntu9.2

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: qemu-kvm-extras 0.12.4+noroms-0ubuntu4
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
Architecture: amd64
Date: Mon Jul 12 18:55:35 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
KvmCmdLine: Error: command ['ps', '-C', 'kvm', '-F'] failed with exit code 1: 
UIDPID  PPID  CSZ   RSS PSR STIME TTY  TIME CMD
MachineType: LENOVO 2764CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.35-6-generic root=/dev/mapper/primary-root 
ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: qemu-kvm
dmi.bios.date: 04/19/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 7UET86WW (3.16 )
dmi.board.name: 2764CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: 
dmi:bvnLENOVO:bvr7UET86WW(3.16):bd04/19/2010:svnLENOVO:pn2764CTO:pvrThinkPadT400:rvnLENOVO:rn2764CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2764CTO
dmi.product.version: ThinkPad T400
dmi.sys.vendor: LENOVO

Re: [Qemu-devel] [PATCH 1/2] [virtio-9p] Cleanup legacy 'dotu' variable.

2010-07-14 Thread Arun R Bharadwaj

* Venkateswararao Jujjuri (JV)  [2010-07-14 04:25:21]:

> Arun R Bharadwaj wrote:
> > Hi,
> > 
> > This patch cleans up the legacy 'dotu' variable which is always set to
> > 1 by default, since qemu doesnt support legacy 9p clients.
> 
> Do we have 2/2 in this series? Also please check the thread-id.
> It is showing part of your other patch, TLERROR/RLERROR.
> 
> - JV
> 

Hi JV,

The cleanup patch is the 1/2 and the TLERROR/RLERROR patch is the 2/2.
I have put them this way because the TLERROR patch needs to be applied
after the cleanup patch is applied. Hence the same thread-id.

-arun

58 matches

Mail list logo