Re: [Qemu-devel] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-register RTAS call

2014-11-14 Thread Aravinda Prasad


On Friday 14 November 2014 06:12 AM, David Gibson wrote:
 On Thu, Nov 13, 2014 at 08:06:55PM +0530, Aravinda Prasad wrote:


 On Thursday 13 November 2014 06:14 PM, David Gibson wrote:
 On Thu, Nov 13, 2014 at 05:18:16PM +0530, Aravinda Prasad wrote:
 On Thursday 13 November 2014 04:02 PM, David Gibson wrote:
 On Thu, Nov 13, 2014 at 11:28:30AM +0530, Aravinda Prasad wrote:
 [snip]
 Having to retry the hcall from here seems very awkward.  This is a
 private hcall, so you can define it to do whatever retries are
 necessary internally (and I don't think your current implementation
 can fail anyway).

 Retrying is required in the cases when multi-processors experience
 machine check at or about the same time. As per PAPR, subsequent
 processors should serialize and wait for the first processor to issue
 the ibm,nmi-interlock call. The second processor retries if the first
 processor which received a machine check is still reading the error log
 and is yet to issue ibm,nmi-interlock call.

 Hmm.. ok.  But I don't see any mechanism in the patches by which
 H_REPORT_MC_ERR will report failure if another CPU has an MC in
 progress.

 h_report_mc_err returns 0 if another VCPU is processing machine check
 and in that case we retry. h_report_mc_err returns error log address if
 no other VCPU is processing machine check.

 Uh.. how?  I'm only seeing one return statement in the implementation
 in 3/4.

 This part is in 4/4 which handles ibm,nmi-interlock call in
 h_report_mc_err()

 +if (mc_in_progress == 1) {
 +return 0;
 +}

 Ah, right, missed the change to h_report_mc_err() in the later patch.

 Retrying cannot be done internally in h_report_mc_err hcall: only one
 thread can succeed entering qemu upon parallel hcall and hence retrying
 inside the hcall will not allow the ibm,nmi-interlock from first CPU to
 succeed.

 It's possible, but would require some fiddling inside the h_call to
 unlock and wait for the other CPUs to finish, so yes, it might be more
 trouble than it's worth.



 +mtsprg  2,4

 Um.. doesn't this clobber the value of r3 you saved in SPRG2 just 
 above.

 The r3 saved in SPRG2 is moved to rtas area in the private hcall and
 hence it is fine to clobber r3 here

 Ok, if you're going to do some magic register saving inside the HCALL,
 why not do the SRR[01] and CR restoration inside there as well.

 SRR0/1 is clobbered while returning from HCALL and hence cannot be
 restored in HCALL. For CR, we need to do the restoration here as we
 clobber CR after returning from HCALL (the instruction checking the
 return value of hcall clobbers CR).

 Hrm.  AFAICT SRR0/1 shouldn't be clobbered when returning from an

 As hcall is an interrupt, SRR0 is set to nip and SRR1 to msr just before
 executing rfid.

 AFAICT the return path from the hypervisor - including for hcalls -
 uses HSSR0/1 and hrfid, so ordinary SRR0/SRR1 should be ok.

 I see SRR0 and SRR1 clobbered when the HCALL from guest returns.
 Previous discussions on this is in the link below:

 http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg01148.html
 
 Hrm.  Well, I guess if it happened it happened, but Alex's explanation
 for why doesn't make sense to me.
 
 Did you execute cpu_synchronize_state() *before* attempting to set
 SRR0/1 in the hcall?

Yes I did.

 
 Further I searched QEMU source code but could not find whether it is
 using rfid/hrfid. However, ISA for sc instruction mentions that SRR0 and
 SRR1 are modified.
 
 Well of course it isn't in the qemu source, the low-level return to
 guest is within the host kernel, specifically fast_guest_return in
 arch/powerpc/kvm/book3s_hv_rmhandlers.S which uses hrfid.
 
 If I'm reading the ISA correctly then yes, SRR0/1 are clobbered on
 entry, but that's on *entry* so can be overwritten by the hcall
 handler itself.

Hmm.. ok. I need to take a look into it in detail.

 
 

-- 
Regards,
Aravinda




Re: [Qemu-devel] [PATCH] gtk: fix possible memory leak about local_err

2014-11-14 Thread Gerd Hoffmann
On Fr, 2014-11-14 at 11:25 +0800, zhanghailiang wrote:
 local_err in gd_vc_gfx_init() is not freed, and we don't use it,
 so remove it.

Added to gtk patch queue.

thanks,
  Gerd





Re: [Qemu-devel] [PATCH] vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info

2014-11-14 Thread Max Reitz

On 2014-11-14 at 05:09, Fam Zheng wrote:

When extent types don't match, we return -ENOTSUP. In this case, be
polite to the caller and don't modify bdi.

Signed-off-by: Fam Zheng f...@redhat.com
---
  block/vmdk.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] iothread object hotplug ?

2014-11-14 Thread Paolo Bonzini


On 14/11/2014 08:25, Alexandre DERUMIER wrote:
 Hi,
 
 I would like to known if it's possible to hot-add|hot-plug an iothread object 
 on a  running guest ?

Yes, there is an object_add/object-add command (respectively HMP and
QMP), but I don't think libvirt has bindings already.

Paolo



Re: [Qemu-devel] [PATCH] l2tpv3: fix possible double free

2014-11-14 Thread Paolo Bonzini


On 14/11/2014 02:39, zhanghailiang wrote:
 freeaddrinfo(result) does not assign result = NULL, after frees it.
 There will be a double free when it goes error case.
 It is reported by covertiy.
 
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 ---
  net/l2tpv3.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/net/l2tpv3.c b/net/l2tpv3.c
 index 528d95b..f9e0c98 100644
 --- a/net/l2tpv3.c
 +++ b/net/l2tpv3.c
 @@ -661,6 +661,7 @@ int net_init_l2tpv3(const NetClientOptions *opts,
  fd = -errno;
  error_report(l2tpv3_open : socket creation failed, errno = %d, 
 -fd);
  freeaddrinfo(result);
 +result = NULL;

You can just remove the call to freeaddrinfo().  I made the change and
applied the patch.

Paolo

  goto outerr;
  }
  if (bind(fd, (struct sockaddr *) result-ai_addr, result-ai_addrlen)) {
 



[Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-14 Thread Leen Keus
Under Ubuntu 14.04 the USB devices where attached to the virtual
machines. After upgrading to Ubuntu 14.10 the USB devices are not
visible in the virtual machines anymore. Same issue appeard after
upgrading from Ubuntu 13.04 to 13.10, but I can not find that bug report
anymore.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1392504

Title:
  USB Passthrough is not working anymore

Status in QEMU:
  New

Bug description:
  After upgrading from Ubuntu 14.04 to Ubuntu 14.10 USB passthrough in
  QEMU (version is now 2.1.0 - Debian2.1+dfsg-4ubuntu6.1) is not working
  any more. Already tried to remove and attach the USB devices. I use 1
  USB2 HDD  +  1 USB3 HDD to a virtual linux machine; 1 USB2 HDD to a
  virual FNAS machine and a USB 2camera to virtual Win7 machine. All
  these devices are not working any more.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1392504/+subscriptions



[Qemu-devel] [PATCH v2 1/3] chardev: Add -qmp-pretty

2014-11-14 Thread Max Reitz
Add a command line option for adding a QMP monitor using pretty JSON
formatting.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 qemu-options.hx |  8 
 vl.c| 15 ++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index da9851d..bc7b4c2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2788,6 +2788,14 @@ STEXI
 @findex -qmp
 Like -monitor but opens in 'control' mode.
 ETEXI
+DEF(qmp-pretty, HAS_ARG, QEMU_OPTION_qmp_pretty, \
+-qmp-pretty dev like -qmp but uses pretty JSON formatting\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -qmp-pretty @var{dev}
+@findex -qmp-pretty
+Like -qmp but uses pretty JSON formatting.
+ETEXI
 
 DEF(mon, HAS_ARG, QEMU_OPTION_mon, \
 -mon [chardev=]name[,mode=readline|control][,default]\n, QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index f4a6e5e..c7bebad 100644
--- a/vl.c
+++ b/vl.c
@@ -2284,7 +2284,7 @@ static int mon_init_func(QemuOpts *opts, void *opaque)
 return 0;
 }
 
-static void monitor_parse(const char *optarg, const char *mode)
+static void monitor_parse(const char *optarg, const char *mode, bool pretty)
 {
 static int monitor_device_index = 0;
 QemuOpts *opts;
@@ -2314,6 +2314,7 @@ static void monitor_parse(const char *optarg, const char 
*mode)
 }
 qemu_opt_set(opts, mode, mode);
 qemu_opt_set(opts, chardev, label);
+qemu_opt_set_bool(opts, pretty, pretty);
 if (def)
 qemu_opt_set(opts, default, on);
 monitor_device_index++;
@@ -3292,11 +3293,15 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_monitor:
 default_monitor = 0;
 if (strncmp(optarg, none, 4)) {
-monitor_parse(optarg, readline);
+monitor_parse(optarg, readline, false);
 }
 break;
 case QEMU_OPTION_qmp:
-monitor_parse(optarg, control);
+monitor_parse(optarg, control, false);
+default_monitor = 0;
+break;
+case QEMU_OPTION_qmp_pretty:
+monitor_parse(optarg, control, true);
 default_monitor = 0;
 break;
 case QEMU_OPTION_mon:
@@ -3994,7 +3999,7 @@ int main(int argc, char **argv, char **envp)
 add_device_config(DEV_SCLP, stdio);
 }
 if (default_monitor)
-monitor_parse(stdio, readline);
+monitor_parse(stdio, readline, false);
 }
 } else {
 if (default_serial)
@@ -4002,7 +4007,7 @@ int main(int argc, char **argv, char **envp)
 if (default_parallel)
 add_device_config(DEV_PARALLEL, vc:80Cx24C);
 if (default_monitor)
-monitor_parse(vc:80Cx24C, readline);
+monitor_parse(vc:80Cx24C, readline, false);
 if (default_virtcon)
 add_device_config(DEV_VIRTCON, vc:80Cx24C);
 if (default_sclp) {
-- 
1.9.3




[Qemu-devel] [PATCH v2 3/3] iotests: Use -qmp-pretty in 067

2014-11-14 Thread Max Reitz
067 invokes query-block, resulting in a reference output with really
long lines (which may pose a problem in email patches and always poses a
problem when the output changes, because it is hard to see what has
actually changed). Use -qmp-pretty to mitigate this issue.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 tests/qemu-iotests/067 |   2 +-
 tests/qemu-iotests/067.out | 779 +
 2 files changed, 723 insertions(+), 58 deletions(-)

diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index d025192..29cd6b5 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -39,7 +39,7 @@ _supported_os Linux
 function do_run_qemu()
 {
 echo Testing: $@
-$QEMU -nographic -qmp stdio -serial none $@
+$QEMU -nographic -qmp-pretty stdio -serial none $@
 echo
 }
 
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 0f72dcf..929dc74 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -4,77 +4,742 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 === -drive/-device and device_del ===
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device 
virtio-blk-pci,drive=disk,id=virtio0
-QMP_VERSION
-{return: {}}
-{return: [{io-status: ok, device: disk, locked: false, 
removable: false, inserted: {iops_rd: 0, detect_zeroes: off, image: 
{virtual-size: 134217728, filename: TEST_DIR/t.qcow2, cluster-size: 
65536, format: qcow2, actual-size: SIZE, format-specific: {type: 
qcow2, data: {compat: 1.1, lazy-refcounts: false, corrupt: false}}, 
dirty-flag: false}, iops_wr: 0, ro: false, backing_file_depth: 0, 
drv: qcow2, iops: 0, bps_wr: 0, encrypted: false, bps: 0, bps_rd: 
0, file: TEST_DIR/t.qcow2, encryption_key_missing: false}, type: 
unknown}, {io-status: ok, device: ide1-cd0, locked: false, 
removable: true, tray_open: false, type: unknown}, {device: 
floppy0, locked: false, removable: true, tray_open: false, type: 
unknown}, {device: sd0, locked: false, removable: true, tray_open: 
false, type: unknown}]}
-{return: {}}
-{return: {}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_DELETED, data: {path: 
/machine/peripheral/virtio0/virtio-backend}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_DELETED, data: {device: virtio0, path: 
/machine/peripheral/virtio0}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
RESET}
-{return: [{io-status: ok, device: ide1-cd0, locked: false, 
removable: true, tray_open: false, type: unknown}, {device: 
floppy0, locked: false, removable: true, tray_open: false, type: 
unknown}, {device: sd0, locked: false, removable: true, tray_open: 
false, type: unknown}]}
-{return: {}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
SHUTDOWN}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_TRAY_MOVED, data: {device: ide1-cd0, tray-open: true}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_TRAY_MOVED, data: {device: floppy0, tray-open: true}}
+{
+QMP_VERSION
+}
+{
+return: {
+}
+}
+{
+return: [
+{
+io-status: ok,
+device: disk,
+locked: false,
+removable: false,
+inserted: {
+iops_rd: 0,
+detect_zeroes: off,
+image: {
+virtual-size: 134217728,
+filename: TEST_DIR/t.qcow2,
+cluster-size: 65536,
+format: qcow2,
+actual-size: SIZE,
+format-specific: {
+type: qcow2,
+data: {
+compat: 1.1,
+lazy-refcounts: false,
+corrupt: false
+}
+},
+dirty-flag: false
+},
+iops_wr: 0,
+ro: false,
+backing_file_depth: 0,
+drv: qcow2,
+iops: 0,
+bps_wr: 0,
+encrypted: false,
+bps: 0,
+bps_rd: 0,
+file: TEST_DIR/t.qcow2,
+encryption_key_missing: false
+},
+type: unknown
+},
+{
+io-status: ok,
+device: ide1-cd0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+},
+{
+device: floppy0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+},
+{
+device: sd0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+}
+]
+}
+{
+

[Qemu-devel] [PATCH v2 0/3] chardev: Add -qmp-pretty

2014-11-14 Thread Max Reitz
This series does not add new functionality. Adding a QMP monitor with
prettily formatted JSON output can be done as follows:

$ qemu -chardev stdio,id=mon0 -mon chardev=mon0,mode=control,pretty=on

However, this is rather cumbersome, so this series (its first patch)
adds a shortcut in the form of the new command line option -qmp-pretty.

Since the argument given to a monitor command line option (such as -qmp)
is parsed depending on its prefix and probably also depending on the
current phase of the moon, this is cleaner than trying to add a switch
to -qmp itself (in the form of -qmp stdio,pretty=on).


Patch 3 makes uses of the new option in qemu-iotest 067 to greatly
increase maintainability of its reference output. Patch 2 extends the
QMP filter for qemu-iotests so it is able to filter out the QMP version
object in pretty mode.


v2:
- Patch 2: Replaced the multi-line QMP_VERSION replacement written in
  bash by a nice sed script [Eric]


Max Reitz (3):
  chardev: Add -qmp-pretty
  iotests: _filter_qmp for pretty JSON output
  iotests: Use -qmp-pretty in 067

 qemu-options.hx  |   8 +
 tests/qemu-iotests/067   |   2 +-
 tests/qemu-iotests/067.out   | 779 ---
 tests/qemu-iotests/common.filter |   5 +-
 vl.c |  15 +-
 5 files changed, 745 insertions(+), 64 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v2 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-14 Thread Max Reitz
_filter_qmp should be able to correctly filter out the QMP version
object for pretty JSON output.

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/common.filter | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3acdb30..63d15b3 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -165,9 +165,12 @@ _filter_qemu()
 # replace problematic QMP output like timestamps
 _filter_qmp()
 {
+discard=0
+
 _filter_win32 | \
 sed -e 's#\(\(micro\)\?seconds: \)[0-9]\+#\1 TIMESTAMP#g' \
--e 's#^{QMP:.*}$#QMP_VERSION#'
+-e 's#^{QMP:.*}$#QMP_VERSION#' \
+-e '/^QMP: {\s*$/, /^}\s*$/ c\QMP_VERSION'
 }
 
 # replace driver-specific options in the Formatting... line
-- 
1.9.3




Re: [Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-14 Thread Gonglei
On 2014/11/14 16:47, Leen Keus wrote:

 Under Ubuntu 14.04 the USB devices where attached to the virtual
 machines. After upgrading to Ubuntu 14.10 the USB devices are not
 visible in the virtual machines anymore. Same issue appeard after
 upgrading from Ubuntu 13.04 to 13.10, but I can not find that bug report
 anymore.
 

Please show your command line booting VM.
Or you can get the latest Qemu master code to reproduce your
problem. Maybe it is a resolved issue. :)

Best regards,
-Gonglei




Re: [Qemu-devel] [PATCH v2 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-14 Thread Max Reitz

On 2014-11-14 at 10:02, Max Reitz wrote:

_filter_qmp should be able to correctly filter out the QMP version
object for pretty JSON output.

Signed-off-by: Max Reitz mre...@redhat.com
---
  tests/qemu-iotests/common.filter | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3acdb30..63d15b3 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -165,9 +165,12 @@ _filter_qemu()
  # replace problematic QMP output like timestamps
  _filter_qmp()
  {
+discard=0
+


Urgh, self-NACK. We don't need that anymore.

Max


  _filter_win32 | \
  sed -e 's#\(\(micro\)\?seconds: \)[0-9]\+#\1 TIMESTAMP#g' \
--e 's#^{QMP:.*}$#QMP_VERSION#'
+-e 's#^{QMP:.*}$#QMP_VERSION#' \
+-e '/^QMP: {\s*$/, /^}\s*$/ c\QMP_VERSION'
  }
  
  # replace driver-specific options in the Formatting... line





[Qemu-devel] [PATCH v3 1/3] chardev: Add -qmp-pretty

2014-11-14 Thread Max Reitz
Add a command line option for adding a QMP monitor using pretty JSON
formatting.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 qemu-options.hx |  8 
 vl.c| 15 ++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index da9851d..bc7b4c2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2788,6 +2788,14 @@ STEXI
 @findex -qmp
 Like -monitor but opens in 'control' mode.
 ETEXI
+DEF(qmp-pretty, HAS_ARG, QEMU_OPTION_qmp_pretty, \
+-qmp-pretty dev like -qmp but uses pretty JSON formatting\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -qmp-pretty @var{dev}
+@findex -qmp-pretty
+Like -qmp but uses pretty JSON formatting.
+ETEXI
 
 DEF(mon, HAS_ARG, QEMU_OPTION_mon, \
 -mon [chardev=]name[,mode=readline|control][,default]\n, QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index f4a6e5e..c7bebad 100644
--- a/vl.c
+++ b/vl.c
@@ -2284,7 +2284,7 @@ static int mon_init_func(QemuOpts *opts, void *opaque)
 return 0;
 }
 
-static void monitor_parse(const char *optarg, const char *mode)
+static void monitor_parse(const char *optarg, const char *mode, bool pretty)
 {
 static int monitor_device_index = 0;
 QemuOpts *opts;
@@ -2314,6 +2314,7 @@ static void monitor_parse(const char *optarg, const char 
*mode)
 }
 qemu_opt_set(opts, mode, mode);
 qemu_opt_set(opts, chardev, label);
+qemu_opt_set_bool(opts, pretty, pretty);
 if (def)
 qemu_opt_set(opts, default, on);
 monitor_device_index++;
@@ -3292,11 +3293,15 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_monitor:
 default_monitor = 0;
 if (strncmp(optarg, none, 4)) {
-monitor_parse(optarg, readline);
+monitor_parse(optarg, readline, false);
 }
 break;
 case QEMU_OPTION_qmp:
-monitor_parse(optarg, control);
+monitor_parse(optarg, control, false);
+default_monitor = 0;
+break;
+case QEMU_OPTION_qmp_pretty:
+monitor_parse(optarg, control, true);
 default_monitor = 0;
 break;
 case QEMU_OPTION_mon:
@@ -3994,7 +3999,7 @@ int main(int argc, char **argv, char **envp)
 add_device_config(DEV_SCLP, stdio);
 }
 if (default_monitor)
-monitor_parse(stdio, readline);
+monitor_parse(stdio, readline, false);
 }
 } else {
 if (default_serial)
@@ -4002,7 +4007,7 @@ int main(int argc, char **argv, char **envp)
 if (default_parallel)
 add_device_config(DEV_PARALLEL, vc:80Cx24C);
 if (default_monitor)
-monitor_parse(vc:80Cx24C, readline);
+monitor_parse(vc:80Cx24C, readline, false);
 if (default_virtcon)
 add_device_config(DEV_VIRTCON, vc:80Cx24C);
 if (default_sclp) {
-- 
1.9.3




[Qemu-devel] [PATCH v3 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-14 Thread Max Reitz
_filter_qmp should be able to correctly filter out the QMP version
object for pretty JSON output.

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/common.filter | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3acdb30..93642f3 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -167,7 +167,8 @@ _filter_qmp()
 {
 _filter_win32 | \
 sed -e 's#\(\(micro\)\?seconds: \)[0-9]\+#\1 TIMESTAMP#g' \
--e 's#^{QMP:.*}$#QMP_VERSION#'
+-e 's#^{QMP:.*}$#QMP_VERSION#' \
+-e '/^QMP: {\s*$/, /^}\s*$/ c\QMP_VERSION'
 }
 
 # replace driver-specific options in the Formatting... line
-- 
1.9.3




[Qemu-devel] [PATCH v3 0/3] chardev: Add -qmp-pretty

2014-11-14 Thread Max Reitz
This series does not add new functionality. Adding a QMP monitor with
prettily formatted JSON output can be done as follows:

$ qemu -chardev stdio,id=mon0 -mon chardev=mon0,mode=control,pretty=on

However, this is rather cumbersome, so this series (its first patch)
adds a shortcut in the form of the new command line option -qmp-pretty.

Since the argument given to a monitor command line option (such as -qmp)
is parsed depending on its prefix and probably also depending on the
current phase of the moon, this is cleaner than trying to add a switch
to -qmp itself (in the form of -qmp stdio,pretty=on).


Patch 3 makes uses of the new option in qemu-iotest 067 to greatly
increase maintainability of its reference output. Patch 2 extends the
QMP filter for qemu-iotests so it is able to filter out the QMP version
object in pretty mode.


v3:
- Patch 2: Cull useless discard=0

v2:
- Patch 2: Replaced the multi-line QMP_VERSION replacement written in
  bash by a nice sed script [Eric]


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[] [--] 'chardev: Add -qmp-pretty'
002/3:[0015] [FC] 'iotests: _filter_qmp for pretty JSON output'
003/3:[] [--] 'iotests: Use -qmp-pretty in 067'


Max Reitz (3):
  chardev: Add -qmp-pretty
  iotests: _filter_qmp for pretty JSON output
  iotests: Use -qmp-pretty in 067

 qemu-options.hx  |   8 +
 tests/qemu-iotests/067   |   2 +-
 tests/qemu-iotests/067.out   | 779 ---
 tests/qemu-iotests/common.filter |   3 +-
 vl.c |  15 +-
 5 files changed, 743 insertions(+), 64 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v3 3/3] iotests: Use -qmp-pretty in 067

2014-11-14 Thread Max Reitz
067 invokes query-block, resulting in a reference output with really
long lines (which may pose a problem in email patches and always poses a
problem when the output changes, because it is hard to see what has
actually changed). Use -qmp-pretty to mitigate this issue.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 tests/qemu-iotests/067 |   2 +-
 tests/qemu-iotests/067.out | 779 +
 2 files changed, 723 insertions(+), 58 deletions(-)

diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index d025192..29cd6b5 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -39,7 +39,7 @@ _supported_os Linux
 function do_run_qemu()
 {
 echo Testing: $@
-$QEMU -nographic -qmp stdio -serial none $@
+$QEMU -nographic -qmp-pretty stdio -serial none $@
 echo
 }
 
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 0f72dcf..929dc74 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -4,77 +4,742 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 === -drive/-device and device_del ===
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device 
virtio-blk-pci,drive=disk,id=virtio0
-QMP_VERSION
-{return: {}}
-{return: [{io-status: ok, device: disk, locked: false, 
removable: false, inserted: {iops_rd: 0, detect_zeroes: off, image: 
{virtual-size: 134217728, filename: TEST_DIR/t.qcow2, cluster-size: 
65536, format: qcow2, actual-size: SIZE, format-specific: {type: 
qcow2, data: {compat: 1.1, lazy-refcounts: false, corrupt: false}}, 
dirty-flag: false}, iops_wr: 0, ro: false, backing_file_depth: 0, 
drv: qcow2, iops: 0, bps_wr: 0, encrypted: false, bps: 0, bps_rd: 
0, file: TEST_DIR/t.qcow2, encryption_key_missing: false}, type: 
unknown}, {io-status: ok, device: ide1-cd0, locked: false, 
removable: true, tray_open: false, type: unknown}, {device: 
floppy0, locked: false, removable: true, tray_open: false, type: 
unknown}, {device: sd0, locked: false, removable: true, tray_open: 
false, type: unknown}]}
-{return: {}}
-{return: {}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_DELETED, data: {path: 
/machine/peripheral/virtio0/virtio-backend}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_DELETED, data: {device: virtio0, path: 
/machine/peripheral/virtio0}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
RESET}
-{return: [{io-status: ok, device: ide1-cd0, locked: false, 
removable: true, tray_open: false, type: unknown}, {device: 
floppy0, locked: false, removable: true, tray_open: false, type: 
unknown}, {device: sd0, locked: false, removable: true, tray_open: 
false, type: unknown}]}
-{return: {}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
SHUTDOWN}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_TRAY_MOVED, data: {device: ide1-cd0, tray-open: true}}
-{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
DEVICE_TRAY_MOVED, data: {device: floppy0, tray-open: true}}
+{
+QMP_VERSION
+}
+{
+return: {
+}
+}
+{
+return: [
+{
+io-status: ok,
+device: disk,
+locked: false,
+removable: false,
+inserted: {
+iops_rd: 0,
+detect_zeroes: off,
+image: {
+virtual-size: 134217728,
+filename: TEST_DIR/t.qcow2,
+cluster-size: 65536,
+format: qcow2,
+actual-size: SIZE,
+format-specific: {
+type: qcow2,
+data: {
+compat: 1.1,
+lazy-refcounts: false,
+corrupt: false
+}
+},
+dirty-flag: false
+},
+iops_wr: 0,
+ro: false,
+backing_file_depth: 0,
+drv: qcow2,
+iops: 0,
+bps_wr: 0,
+encrypted: false,
+bps: 0,
+bps_rd: 0,
+file: TEST_DIR/t.qcow2,
+encryption_key_missing: false
+},
+type: unknown
+},
+{
+io-status: ok,
+device: ide1-cd0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+},
+{
+device: floppy0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+},
+{
+device: sd0,
+locked: false,
+removable: true,
+tray_open: false,
+type: unknown
+}
+]
+}
+{
+

Re: [Qemu-devel] iothread object hotplug ?

2014-11-14 Thread Alexandre DERUMIER
Yes, there is an object_add/object-add command (respectively HMP and
QMP), but I don't think libvirt has bindings already.

Thanks Paolo ! I'm currently implementing iothread on proxmox, so no libvirt.

Alexandre.


- Mail original -

De: Paolo Bonzini pbonz...@redhat.com
À: Alexandre DERUMIER aderum...@odiso.com, qemu-devel 
qemu-devel@nongnu.org
Envoyé: Vendredi 14 Novembre 2014 09:37:02
Objet: Re: iothread object hotplug ?



On 14/11/2014 08:25, Alexandre DERUMIER wrote:
 Hi,

 I would like to known if it's possible to hot-add|hot-plug an iothread object
 on a running guest ?

Yes, there is an object_add/object-add command (respectively HMP and
QMP), but I don't think libvirt has bindings already.

Paolo



Re: [Qemu-devel] [PATCH] mips/gdbstub: Correct the handling of register #72 on writes

2014-11-14 Thread Leon Alrae
On 03/11/2014 18:47, Maciej W. Rozycki wrote:
 Fix an off-by-one error in `mips_cpu_gdb_write_register' for register 
 #72 that is handled further down in that function rather than here, 
 matching how `mips_cpu_gdb_read_register' handles it.  This register 
 slot is a fake anyway, there's nothing in hardware that corresponds to 
 it.
 
 Signed-off-by: Maciej W. Rozycki ma...@codesourcery.com
 ---
  I have a further change down the queue to clean up 
 `mips_cpu_gdb_read_register' and `mips_cpu_gdb_write_register' and make 
 them more consistent with respect to each other as far as the handling 
 of FP registers is concerned.  For now please apply this obvious change.  
 Thanks.
 
   Maciej
 
 qemu-mips-fpreg72.diff
 Index: qemu-git-trunk/target-mips/gdbstub.c
 ===
 --- qemu-git-trunk.orig/target-mips/gdbstub.c 2013-07-29 11:23:07.048742983 
 +0100
 +++ qemu-git-trunk/target-mips/gdbstub.c  2014-10-27 04:17:19.159003270 
 +
 @@ -97,7 +97,7 @@ int mips_cpu_gdb_write_register(CPUState
  return sizeof(target_ulong);
  }
  if (env-CP0_Config1  (1  CP0C1_FP)
 - n = 38  n  73) {
 + n = 38  n  72) {
  if (n  70) {
  if (env-CP0_Status  (1  CP0St_FR)) {
  env-active_fpu.fpr[n - 38].d = tmp;
 

Reviewed-by: Leon Alrae leon.al...@imgtec.com




Re: [Qemu-devel] [PATCH] mips/gdbstub: Make CP1.FIR read-only here too

2014-11-14 Thread Leon Alrae
On 03/11/2014 18:51, Maciej W. Rozycki wrote:
 CP1.FIR is read-only in hardware so gdbstub must respect it.  We already 
 respect it for CTC1 instructions, so do it here too.
 
 Signed-off-by: Maciej W. Rozycki ma...@codesourcery.com
 ---
 Not much to say about it here.  Please apply.
 
   Maciej
 
 qemu-mips-fir.diff
 Index: qemu-git-trunk/target-mips/gdbstub.c
 ===
 --- qemu-git-trunk.orig/target-mips/gdbstub.c 2014-11-02 17:51:33.458968203 
 +
 +++ qemu-git-trunk/target-mips/gdbstub.c  2014-11-02 17:51:35.958924223 
 +
 @@ -112,7 +112,7 @@ int mips_cpu_gdb_write_register(CPUState
  RESTORE_ROUNDING_MODE;
  break;
  case 71:
 -env-active_fpu.fcr0 = tmp;
 +/* FIR is read-only.  Ignore writes.  */
  break;
  }
  return sizeof(target_ulong);
 

Reviewed-by: Leon Alrae leon.al...@imgtec.com




Re: [Qemu-devel] [PATCH] vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info

2014-11-14 Thread Stefan Hajnoczi
On Fri, Nov 14, 2014 at 12:09:21PM +0800, Fam Zheng wrote:
 When extent types don't match, we return -ENOTSUP. In this case, be
 polite to the caller and don't modify bdi.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/vmdk.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

At this stage of the release process it would be very helpful to include
some context for this patch: is it a pure cleanup or actually a bug fix
that needs to go into QEMU 2.2?

Since you are a regular contributor and I therefore trust you, I have
merged this patch.  But please be clear in commit descriptions about
whether or not a patch is a bug fix (how to reproduce the bug, if a
previous commit's regression is being fixed, etc).

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpeDt8fK8gEK.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v2] libcacard: fix resource leak

2014-11-14 Thread Markus Armbruster
zhanghailiang zhang.zhanghaili...@huawei.com writes:

 In function connect_to_qemu(), getaddrinfo() will allocate memory
 that is stored into server, it should be freed by using freeaddrinfo()
 before connect_to_qemu() return.

 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 ---
 v2:
 - fix typo in title ;)
 ---
  libcacard/vscclient.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

 diff --git a/libcacard/vscclient.c b/libcacard/vscclient.c
 index 80111df..fa6041d 100644
 --- a/libcacard/vscclient.c
 +++ b/libcacard/vscclient.c
 @@ -597,7 +597,7 @@ connect_to_qemu(
  const char *port
  ) {
  struct addrinfo hints;
 -struct addrinfo *server;
 +struct addrinfo *server = NULL;
  int ret, sock;
  
  sock = socket(AF_INET, SOCK_STREAM, 0);
 @@ -629,9 +629,14 @@ connect_to_qemu(
  if (verbose) {
  printf(Connected (sizeof Header=%zd)!\n, sizeof(VSCMsgHeader));
  }
 +
 +freeaddrinfo(server);
  return sock;
  
  cleanup_socket:
 +if (server) {
 +freeaddrinfo(server);
 +}
  closesocket(sock);
  return -1;
  }

Reviewed-by: Markus Armbruster arm...@redhat.com

Aside: this code uses the first result from getaddrinfo(), and fails if
it can't connect.  This is a common misuse of getaddrinfo().

Consider a remote host with both an IPv4 and an IPv6 address, but only
one of them actually connects.  If getaddrinfo() puts the connectable
one first, we succeed.  Else, we fail.

We should try the addresses in order until connect() succeeds, like
qemu-sockets.c does.



Re: [Qemu-devel] iothread object hotplug ?

2014-11-14 Thread Stefan Hajnoczi
On Fri, Nov 14, 2014 at 09:37:02AM +0100, Paolo Bonzini wrote:
 On 14/11/2014 08:25, Alexandre DERUMIER wrote:
  Hi,
  
  I would like to known if it's possible to hot-add|hot-plug an iothread 
  object 
  on a  running guest ?
 
 Yes, there is an object_add/object-add command (respectively HMP and
 QMP), but I don't think libvirt has bindings already.

Right, libvirt does not support iothread hotplug at the moment.

I should add that there hasn't been much testing of iothread hotplug in
QEMU either yet, even though the object-add/object-del commands make it
possible.

Please let me know if you encounter any problems.

Stefan


pgpSLEdzgribF.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 for-xen-4.5] xen_disk: fix unmapping of persistent grants

2014-11-14 Thread Stefano Stabellini
On Thu, 13 Nov 2014, Roger Pau Monne wrote:
 This patch fixes two issues with persistent grants and the disk PV backend
 (Qdisk):
 
  - Keep track of memory regions where persistent grants have been mapped
since we need to unmap them as a whole. It is not possible to unmap a
single grant if it has been batch-mapped. A new check has also been added
to make sure persistent grants are only used if the whole mapped region
can be persistently mapped in the batch_maps case.
  - Unmap persistent grants before switching to the closed state, so the
frontend can also free them.
 
 Signed-off-by: Roger Pau Monné roger@citrix.com
 Reported-by: George Dunlap george.dun...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Acked-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

I'll send a pull request and backport to qemu-xen.


 Xen release exception: this is obviously an important bug-fix that should be
 included in 4.5. Without this fix, trying to do a block-detach of a Qdisk
 from a running guest can lead to guest crash depending on the blkfront
 implementation.
 ---
 Changea since v2:
  - Expand the commit message to mention the newly added check.
  - Don't use persistent_regions in the !batch_maps case, we can use the
previous implementation which works fine in that case.
 
 Changes since v1:
  - Instead of disabling batch mappings when using persistent grants, keep
track of the persistently mapped regions and unmap them on disconnection.
This prevents unmapping single persistent grants, but the current
implementation doesn't require it.
 
 This new version is slightly faster than the previous one, the following
 test results are in IOPS from 20 runs of a block-attach, fio run,
 block-detach test loop:
 
 x batch
 + no_batch
 +--+
 |  +   + xx + + +  x xx x  |
 |+  +   x x+  *+++   *  +x* +++x*  x xx x *|
 |  |_A_M__||
 ||A_M___|  |
 +--+
 N   Min   MaxMedian   AvgStddev
 x  20 52941 63822 62396  61180.15 2673.6497
 +  20 49967 63849 60322   59116.9 3456.3455
 Difference at 95.0% confidence
   -2063.25 +/- 1977.66
   -3.37242% +/- 3.23252%
   (Student's t, pooled s = 3089.88)
 ---
  hw/block/xen_disk.c | 72 
 -
  1 file changed, 66 insertions(+), 6 deletions(-)
 
 diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
 index 231e9a7..21842a0 100644
 --- a/hw/block/xen_disk.c
 +++ b/hw/block/xen_disk.c
 @@ -59,6 +59,13 @@ struct PersistentGrant {
  
  typedef struct PersistentGrant PersistentGrant;
  
 +struct PersistentRegion {
 +void *addr;
 +int num;
 +};
 +
 +typedef struct PersistentRegion PersistentRegion;
 +
  struct ioreq {
  blkif_request_t req;
  int16_t status;
 @@ -118,6 +125,7 @@ struct XenBlkDev {
  gbooleanfeature_discard;
  gbooleanfeature_persistent;
  GTree   *persistent_gnts;
 +GSList  *persistent_regions;
  unsigned intpersistent_gnt_count;
  unsigned intmax_grants;
  
 @@ -177,6 +185,23 @@ static void destroy_grant(gpointer pgnt)
  g_free(grant);
  }
  
 +static void remove_persistent_region(gpointer data, gpointer dev)
 +{
 +PersistentRegion *region = data;
 +struct XenBlkDev *blkdev = dev;
 +XenGnttab gnt = blkdev-xendev.gnttabdev;
 +
 +if (xc_gnttab_munmap(gnt, region-addr, region-num) != 0) {
 +xen_be_printf(blkdev-xendev, 0,
 +  xc_gnttab_munmap region %p failed: %s\n,
 +  region-addr, strerror(errno));
 +}
 +xen_be_printf(blkdev-xendev, 3,
 +  unmapped grant region %p with %d pages\n,
 +  region-addr, region-num);
 +g_free(region);
 +}
 +
  static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
  {
  struct ioreq *ioreq = NULL;
 @@ -343,6 +368,7 @@ static int ioreq_map(struct ioreq *ioreq)
  void *page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
  int i, j, new_maps = 0;
  PersistentGrant *grant;
 +PersistentRegion *region;
  /* domids and refs variables will contain the information necessary
   * to map the grants that are needed to fulfill this request.
   *
 @@ -421,7 +447,22 @@ static int ioreq_map(struct ioreq *ioreq)
  }
  }
  }
 -if (ioreq-blkdev-feature_persistent) {
 +

[Qemu-devel] [PULL for-2.2 02/11] ide: repair PIO transfers for cases where nsector 1

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

Currently, for emulated PIO transfers through the AHCI device,
any attempt made to request more than a single sector's worth
of data will result in the same sector being transferred over
and over.

For example, if we request 8 sectors via PIO READ SECTORS, the
AHCI device will give us the same sector eight times.

This patch adds offset tracking into the PIO pathways so that
we can fulfill these requests appropriately.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1414785819-26209-2-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 2 +-
 hw/ide/core.c | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 1f3f951..dbd6773 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1093,7 +1093,7 @@ static void ahci_start_transfer(IDEDMA *dma)
 goto out;
 }
 
-if (!ahci_populate_sglist(ad, s-sg, 0)) {
+if (!ahci_populate_sglist(ad, s-sg, s-io_buffer_offset)) {
 has_sglist = 1;
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index d316ccf..dab21f0 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -592,6 +592,7 @@ static void ide_sector_read_cb(void *opaque, int ret)
 
 ide_set_sector(s, ide_get_sector(s) + n);
 s-nsector -= n;
+s-io_buffer_offset += 512 * n;
 }
 
 void ide_sector_read(IDEState *s)
@@ -832,6 +833,8 @@ static void ide_sector_write_cb(void *opaque, int ret)
 n = s-req_nb_sectors;
 }
 s-nsector -= n;
+s-io_buffer_offset += 512 * n;
+
 if (s-nsector == 0) {
 /* no more sectors to write */
 ide_transfer_stop(s);
@@ -1824,6 +1827,7 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 
 s-status = READY_STAT | BUSY_STAT;
 s-error = 0;
+s-io_buffer_offset = 0;
 
 complete = ide_cmd_table[val].handler(s, val);
 if (complete) {
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 04/11] ide: Correct handling of malformed/short PRDTs

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

This impacts both BMDMA and AHCI HBA interfaces for IDE.
Currently, we confuse the difference between a PRDT having
0 bytes and a PRDT having 0 complete sectors.

When we receive an incomplete sector, inconsistent error checking
leads to an infinite loop wherein the call succeeds, but it
didn't give us enough bytes -- leading us to re-call the
DMA chain over and over again. This leads to, in the BMDMA case,
leaked memory for short PRDTs, and infinite loops and resource
usage in the AHCI case.

The .prepare_buf() callback is reworked to return the number of
bytes that it successfully prepared. 0 is a valid, non-error
answer that means the table was empty and described no bytes.
-1 indicates an error.

Our current implementation uses the io_buffer in IDEState to
ultimately describe the size of a prepared scatter-gather list.
Even though the AHCI PRDT/SGList can be as large as 256GiB, the
AHCI command header limits transactions to just 4GiB. ATA8-ACS3,
however, defines the largest transaction to be an LBA48 command
that transfers 65,536 sectors. With a 512 byte sector size, this
is just 32MiB.

Since our current state structures use the int type to describe
the size of the buffer, and this state is migrated as int32, we
are limited to describing 2GiB buffer sizes unless we change the
migration protocol.

For this reason, this patch begins to unify the assertions in the
IDE pathways that the scatter-gather list provided by either the
AHCI PRDT or the PCI BMDMA PRDs can only describe, at a maximum,
2GiB. This should be resilient enough unless we need a sector
size that exceeds 32KiB.

Further, the likelihood of any guest operating system actually
attempting to transfer this much data in a single operation is
very slim.

To this end, the IDEState variables have been updated to more
explicitly clarify our maximum supported size. Callers to the
prepare_buf callback have been reworked to understand the new
return code, and all versions of the prepare_buf callback have
been adjusted accordingly.

Lastly, the ahci_populate_sglist helper, relied upon by the
AHCI implementation of .prepare_buf() as well as the PCI
implementation of the callback have had overflow assertions
added to help make clear the reasonings behind the various
type changes.

[Added %d - %PRId64 fix John sent because off_pos changed from int to
int64_t.
--Stefan]

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1414785819-26209-4-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 33 ++---
 hw/ide/core.c | 10 --
 hw/ide/internal.h | 13 +++--
 hw/ide/macio.c|  7 ++-
 hw/ide/pci.c  | 27 +--
 5 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 28aa105..9647d94 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -730,7 +730,8 @@ static int prdt_tbl_entry_size(const AHCI_SG *tbl)
 return (le32_to_cpu(tbl-flags_size)  AHCI_PRDT_SIZE_MASK) + 1;
 }
 
-static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist, int offset)
+static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist,
+int32_t offset)
 {
 AHCICmdHdr *cmd = ad-cur_cmd;
 uint32_t opts = le32_to_cpu(cmd-opts);
@@ -741,13 +742,21 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist, int offset)
 uint8_t *prdt;
 int i;
 int r = 0;
-int sum = 0;
+uint64_t sum = 0;
 int off_idx = -1;
-int off_pos = -1;
+int64_t off_pos = -1;
 int tbl_entry_size;
 IDEBus *bus = ad-port;
 BusState *qbus = BUS(bus);
 
+/*
+ * Note: AHCI PRDT can describe up to 256GiB. SATA/ATA only support
+ * transactions of up to 32MiB as of ATA8-ACS3 rev 1b, assuming a
+ * 512 byte sector size. We limit the PRDT in this implementation to
+ * a reasonably large 2GiB, which can accommodate the maximum transfer
+ * request for sector sizes up to 32K.
+ */
+
 if (!sglist_alloc_hint) {
 DPRINTF(ad-port_no, no sg list given by guest: 0x%08x\n, opts);
 return -1;
@@ -782,7 +791,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist, int offset)
 }
 if ((off_idx == -1) || (off_pos  0) || (off_pos  tbl_entry_size)) {
 DPRINTF(ad-port_no, %s: Incorrect offset! 
-off_idx: %d, off_pos: %d\n,
+off_idx: %d, off_pos: %PRId64\n,
 __func__, off_idx, off_pos);
 r = -1;
 goto out;
@@ -797,6 +806,13 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist, int offset)
 /* flags_size is zero-based */
 qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
 prdt_tbl_entry_size(tbl[i]));
+if 

[Qemu-devel] [PULL for-2.2 01/11] ahci: Fix byte count regression for ATAPI/PIO

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

This patch fixes a regression caused by commit
659142ecf71a0da240ab0ff7cf929ee25c32b9bc.
The problem occurs when we wish to return early
from the ahci_start_transfer function, but are now
updating the transferred byte count in the AHCI
command header via ahci_commit_buf.

This will cause problems in the Windows 8 installer.

Don't update the byte count in the command header
for the transmission of ATAPI packets: These commands
will distort the final byte count of the actual data
payload.

The call to ahci_commit_buf remains in the out
portion of the call in order to clean up the sglist.
The byte count is maintained by forcing size to be 0.

Signed-off-by: John Snow js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 61dbed1..1f3f951 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1089,6 +1089,7 @@ static void ahci_start_transfer(IDEDMA *dma)
 if (is_atapi  !ad-done_atapi_packet) {
 /* already prepopulated iobuffer */
 ad-done_atapi_packet = true;
+size = 0;
 goto out;
 }
 
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 00/11] Block patches

2014-11-14 Thread Stefan Hajnoczi
The following changes since commit c52e67924fbdadfa00668248f5c075542943c54c:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-13 15:44:16 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 5f58330790b72c4705b373ee0646fb3adf800b4e:

  vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info (2014-11-14 09:20:45 
+)





Fam Zheng (2):
  block: Fix max nb_sectors in bdrv_make_zero
  vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info

John Snow (9):
  ahci: Fix byte count regression for ATAPI/PIO
  ide: repair PIO transfers for cases where nsector  1
  ahci: unify sglist preparation
  ide: Correct handling of malformed/short PRDTs
  ahci: add is_ncq predicate helper
  ahci: Fix FIS decomposition
  ahci: Reorder error cases in handle_cmd
  ahci: Check cmd_fis[1] more explicitly
  ahci: factor out FIS decomposition from handle_cmd

 block.c   |   4 +-
 block/vmdk.c  |  20 ++--
 hw/ide/ahci.c | 269 ++
 hw/ide/ahci.h |   3 +
 hw/ide/core.c |  14 ++-
 hw/ide/internal.h |  13 +--
 hw/ide/macio.c|   7 +-
 hw/ide/pci.c  |  27 --
 8 files changed, 214 insertions(+), 143 deletions(-)

-- 
2.1.0




[Qemu-devel] [PULL for-2.2 08/11] ahci: Check cmd_fis[1] more explicitly

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

Instead of checking for a known byte, inspect the
fields of this byte explicitly to produce more meaningful
error messages and improve the readability of this section.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1415058979-16604-5-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 578a93b..d6b012c 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1004,17 +1004,18 @@ static int handle_cmd(AHCIState *s, int port, int slot)
 break;
 }
 
-switch (cmd_fis[1]) {
-case SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER:
-break;
-case 0:
-break;
-default:
-DPRINTF(port, unknown command cmd_fis[0]=%02x cmd_fis[1]=%02x 
-  cmd_fis[2]=%02x\n, cmd_fis[0], cmd_fis[1],
-  cmd_fis[2]);
-goto out;
-break;
+if (cmd_fis[1]  0x0F) {
+DPRINTF(port, Port Multiplier not supported.
+ cmd_fis[0]=%02x cmd_fis[1]=%02x cmd_fis[2]=%02x\n,
+cmd_fis[0], cmd_fis[1], cmd_fis[2]);
+goto out;
+}
+
+if (cmd_fis[1]  0x70) {
+DPRINTF(port, Reserved flags set in H2D Register FIS.
+ cmd_fis[0]=%02x cmd_fis[1]=%02x cmd_fis[2]=%02x\n,
+cmd_fis[0], cmd_fis[1], cmd_fis[2]);
+goto out;
 }
 
 if (!(cmd_fis[1]  SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER)) {
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 03/11] ahci: unify sglist preparation

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

The intent of this patch is to further unify the creation and
deletion of the sglist used for all AHCI transfers, including
emulated PIO, ATAPI R/W, and native DMA R/W.

By replacing ahci_start_transfer's call to ahci_populate_sglist
with ahci_dma_prepare_buf, we reduce the number of direct calls
where we manipulate the scatter-gather list in the AHCI code.

To make this switch, the constant 0 passed as an offset
in ahci_dma_prepare_buf is adjusted to use io_buffer_offset.

For DMA pathways, this has no effect: io_buffer_offset is always
updated to 0 at the beginning of a DMA transfer loop regardless.
DMA pathways through ide_dma_cb() update the io_buffer_offset
accordingly, and for circumstances where we might make several
trips through this loop, this may actually correct a design flaw.

For PIO pathways, the newly updated ahci_dma_prepare_buf will
now prepare the sglist at the correct offset. It will also set
io_buffer_size, but this is not used in the cmd_read_pio or
cmd_write_pio pathways.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1414785819-26209-3-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index dbd6773..28aa105 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1093,7 +1093,7 @@ static void ahci_start_transfer(IDEDMA *dma)
 goto out;
 }
 
-if (!ahci_populate_sglist(ad, s-sg, s-io_buffer_offset)) {
+if (ahci_dma_prepare_buf(dma, is_write)) {
 has_sglist = 1;
 }
 
@@ -1145,7 +1145,7 @@ static int ahci_dma_prepare_buf(IDEDMA *dma, int is_write)
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
 IDEState *s = ad-port.ifs[0];
 
-ahci_populate_sglist(ad, s-sg, 0);
+ahci_populate_sglist(ad, s-sg, s-io_buffer_offset);
 s-io_buffer_size = s-sg.size;
 
 DPRINTF(ad-port_no, len=%#x\n, s-io_buffer_size);
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 06/11] ahci: Fix FIS decomposition

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

This patch introduces a few changes to how FIS packets are
deciphered in the AHCI virtual device. The summary of
changes can be grouped into two pieces:

[A] Changes to how we apply a preliminary sieve to FISes,
[B] Changes in how we internalize a decomposed FIS.

== Changes to how we apply a preliminary sieve to FISes ==

(1) Packets may now either update the Control register or
the Command register, but not both. This is according
to the SATA 3.2 specification which states:
...the device either initiates processing of the command
indicated in the Command register or initiates processing
of the control request indicated [...] depending on the
state of the C bit in the FIS.

See SATA 3.2 section 10.5.5.4, Reception in the 10.5.5
Register Host to Device FIS section.

This change accounts for the first two regions of change
within the diff. All other changes belong to the following
changes.

== Changes in how we internalize a decomposed FIS ==

(2) Instead of trying to extract the sector number out of the
FIS from bytes 4-10 and setting it with ide_set_sector,
we set the appropriate IDEState registers and trust that
ide_get_sector can retrieve the correct sector later.

By constructing the sector for use with ide_set_sector,
we are duplicating the mechanisms of ide_get_sector.
This change makes the FIS decomposition more obvious.

SATA 3.2 as a specification does not make the legacy
register mapping with respect to the D2H FIS obvious.
However, SATA 3.2 section 10.5.5.1 Register Host to
Device FIS layout describes all of the cmd_fis
bytes:

0 - FIS Type (0x27)
1 - Port Multiplier Port and Command Update flag
2 - ATA Command
3 - Features_Low
4 - LBA 7:0
5 - LBA 15:8
6 - LBA 23:16
7 - Device, AKA Drive Select.
8 - LBA 31:24
9 - LBA 39:32
10 - LBA 47:40
11 - Features_High
12 - Count Low
13 - Count High
14 - ICC
15 - Control
16-19 - Auxiliary (for NCQ, defined per-command)

Most of these registers map to existing IDEState registers
in obvious ways, especially features, select, hob_features,
and nsector (count). ICC is reserved in older specifications
but is not supported in our implementation, and remains
unused here. The Control register is not valid for a command
that is trying to update the command register and is to be
considered reserved at this point.

What is not obvious is the LBA register mappings, but SATA 1.0
can help inform of us legacy device support, see SATA 1.0 section
8.5.2 Register - Host to Device.

LBA 7:0   - Sector Number(sector)
LBA 15:8  - Cyl Low  (lcyl)
LBA 23:16 - Cyl High (hcyl)
LBA 31:24 - Sector Num Exp.  (hob_sector)
LBA 39:32 - Cyl Low Exp. (hob_lcyl)
LBA 47:40 - Cyl High Exp.(hob_hcyl)

These mappings help guide which registers the FIS should be decomposed
into/towards for CHS, LBA28 and LBA48 commands.

As a note: The prior confusion that can be seen in the documentation
arises from the fact that CHS and LBA28 commands use the low nybble
of the drive select register to store LBA 27:24, whereas LNA48 commands
use the hob_sector, hob_lcyl and hob_hcyl registers as explained above.

The decomposition as it stands now will correctly decompose CHS, LBA28
and LBA48 commands into their appropriate registers where the core
IDE/ATAPI layers can deal with them correctly.

See the below point for more information.

(3) We save cmd_fis[7] as ide_state-select, which informs
decisions about if we are using LBA or CHS.
This corrects a bug in AHCI wherein we attempt to set and/or
retrieve the sector number by using ide_set_sector and
ide_get_sector, which depend on the select register to
determine if we are using LBA or CHS.

Without this adjustment, LBA48 read/writes are currently
broken. Thanks to Eniac Zheng @ HP for pointing this out.

(4) Save cmd_fis[11] as ide_state-hob_feature, as defined in SATA 3.2.

(5) For several ATA commands, the sector count register set to 0
is a magic number that means 256 sectors. For LBA48 commands,
this means 65,536 sectors. We drop the magic sector correction
here, and trust the ide core layer to handle the conversion
appropriately, in ide_cmd_lba48_transform(). As it stands,
the current AHCI code is only compliant with LBA28 commands.
By simply removing the magic, it will work with LBA28 and LBA48.

(6) We expand FIS decomposition to include both ATAPI and IDE devices.
We leave the logic of determining if the fields are valid or not
to the respective layers.

This change intends to make it clearer that AHCI is only a
composition mechanism for the FIS packets: the meanings of
the registers is best left to the implementation layers for
those devices.

(7) Forcefully 

[Qemu-devel] [PULL for-2.2 05/11] ahci: add is_ncq predicate helper

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

A small helper to determine which S/ATA commands
are destined to be routed to the NCQ pathways.

This references SATA 3.2 section 13.6,
Native Command Queueing. See sections 13.6.4,
13.6.5, 13.6.6, 13.6.7 and 13.6.8 for all
SATA commands considered to be part of the
NCQ feature set. This is summarized in a small
list in section 13.6.3.1 and again in 13.6.3.2.

Not all of these NCQ commands are currently supported,
so the error pathways are adjusted slightly to be more
informative in the case they are encountered.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1415058979-16604-2-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 28 
 hw/ide/ahci.h |  3 +++
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 9647d94..f2acddb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -854,6 +854,21 @@ static void ncq_cb(void *opaque, int ret)
 ncq_tfs-used = 0;
 }
 
+static int is_ncq(uint8_t ata_cmd)
+{
+/* Based on SATA 3.2 section 13.6.3.2 */
+switch (ata_cmd) {
+case READ_FPDMA_QUEUED:
+case WRITE_FPDMA_QUEUED:
+case NCQ_NON_DATA:
+case RECEIVE_FPDMA_QUEUED:
+case SEND_FPDMA_QUEUED:
+return 1;
+default:
+return 0;
+}
+}
+
 static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
 int slot)
 {
@@ -919,9 +934,15 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
ncq_cb, ncq_tfs);
 break;
 default:
-DPRINTF(port, error: tried to process non-NCQ command as NCQ\n);
+if (is_ncq(cmd_fis[2])) {
+DPRINTF(port,
+error: unsupported NCQ command (0x%02x) received\n,
+cmd_fis[2]);
+} else {
+DPRINTF(port,
+error: tried to process non-NCQ command as NCQ\n);
+}
 qemu_sglist_destroy(ncq_tfs-sglist);
-break;
 }
 }
 
@@ -1013,8 +1034,7 @@ static int handle_cmd(AHCIState *s, int port, int slot)
 if (cmd_fis[1] == SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER) {
 
 /* Check for NCQ command */
-if ((cmd_fis[2] == READ_FPDMA_QUEUED) ||
-(cmd_fis[2] == WRITE_FPDMA_QUEUED)) {
+if (is_ncq(cmd_fis[2])) {
 process_ncq_command(s, port, cmd_fis, slot);
 goto out;
 }
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index b123237..e0d2eb8 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -186,6 +186,9 @@
 
 #define READ_FPDMA_QUEUED  0x60
 #define WRITE_FPDMA_QUEUED 0x61
+#define NCQ_NON_DATA   0x63
+#define RECEIVE_FPDMA_QUEUED   0x65
+#define SEND_FPDMA_QUEUED  0x64
 
 #define RES_FIS_DSFIS  0x00
 #define RES_FIS_PSFIS  0x20
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 07/11] ahci: Reorder error cases in handle_cmd

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

Error checking in ahci's handle_cmd is re-ordered so that we
initialize as few things as possible before we've done our
sanity checking. This simplifies returning from this call
in case of an error.

A check to make sure the DMA memory map succeeds with the
correct size is also added, and the debug print of the
command fis is cleaned up with its size corrected.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1415058979-16604-4-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 43da363..578a93b 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -961,38 +961,37 @@ static int handle_cmd(AHCIState *s, int port, int slot)
 return -1;
 }
 
-cmd = ((AHCICmdHdr *)s-dev[port].lst)[slot];
-
 if (!s-dev[port].lst) {
 DPRINTF(port, error: lst not given but cmd handled);
 return -1;
 }
-
+cmd = ((AHCICmdHdr *)s-dev[port].lst)[slot];
 /* remember current slot handle for later */
 s-dev[port].cur_cmd = cmd;
 
+/* The device we are working for */
+ide_state = s-dev[port].port.ifs[0];
+if (!ide_state-blk) {
+DPRINTF(port, error: guest accessed unused port);
+return -1;
+}
+
 opts = le32_to_cpu(cmd-opts);
 tbl_addr = le64_to_cpu(cmd-tbl_addr);
-
 cmd_len = 0x80;
 cmd_fis = dma_memory_map(s-as, tbl_addr, cmd_len,
  DMA_DIRECTION_FROM_DEVICE);
-
 if (!cmd_fis) {
 DPRINTF(port, error: guest passed us an invalid cmd fis\n);
 return -1;
-}
-
-/* The device we are working for */
-ide_state = s-dev[port].port.ifs[0];
-
-if (!ide_state-blk) {
-DPRINTF(port, error: guest accessed unused port);
+} else if (cmd_len != 0x80) {
+ahci_trigger_irq(s, s-dev[port], PORT_IRQ_HBUS_ERR);
+DPRINTF(port, error: dma_memory_map failed: 
+(len(%02PRIx64) != 0x80)\n,
+cmd_len);
 goto out;
 }
-
-debug_print_fis(cmd_fis, 0x90);
-//debug_print_fis(cmd_fis, (opts  AHCI_CMD_HDR_CMD_FIS_LEN) * 4);
+debug_print_fis(cmd_fis, 0x80);
 
 switch (cmd_fis[0]) {
 case SATA_FIS_TYPE_REGISTER_H2D:
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 09/11] ahci: factor out FIS decomposition from handle_cmd

2014-11-14 Thread Stefan Hajnoczi
From: John Snow js...@redhat.com

In order to make handle_cmd more readable at the macro level,
the details of how to decompose particular types of FIS packets
are left to helper functions.

In our case, the only type of FIS packet we currently expect to
see is a Register H2D FIS packet, but the gory details of its
decomposition are of no particular interest in handle_cmd.

This patch keeps the receipt of FIS packets and the decomposition
thereof separated to two different functions.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Message-id: 1415058979-16604-6-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/ide/ahci.c | 169 ++
 1 file changed, 86 insertions(+), 83 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index d6b012c..94f28e6 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -946,10 +946,94 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 }
 }
 
+static void handle_reg_h2d_fis(AHCIState *s, int port,
+   int slot, uint8_t *cmd_fis)
+{
+IDEState *ide_state = s-dev[port].port.ifs[0];
+AHCICmdHdr *cmd = s-dev[port].cur_cmd;
+uint32_t opts = le32_to_cpu(cmd-opts);
+
+if (cmd_fis[1]  0x0F) {
+DPRINTF(port, Port Multiplier not supported.
+ cmd_fis[0]=%02x cmd_fis[1]=%02x cmd_fis[2]=%02x\n,
+cmd_fis[0], cmd_fis[1], cmd_fis[2]);
+return;
+}
+
+if (cmd_fis[1]  0x70) {
+DPRINTF(port, Reserved flags set in H2D Register FIS.
+ cmd_fis[0]=%02x cmd_fis[1]=%02x cmd_fis[2]=%02x\n,
+cmd_fis[0], cmd_fis[1], cmd_fis[2]);
+return;
+}
+
+if (!(cmd_fis[1]  SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER)) {
+switch (s-dev[port].port_state) {
+case STATE_RUN:
+if (cmd_fis[15]  ATA_SRST) {
+s-dev[port].port_state = STATE_RESET;
+}
+break;
+case STATE_RESET:
+if (!(cmd_fis[15]  ATA_SRST)) {
+ahci_reset_port(s, port);
+}
+break;
+}
+return;
+}
+
+/* Check for NCQ command */
+if (is_ncq(cmd_fis[2])) {
+process_ncq_command(s, port, cmd_fis, slot);
+return;
+}
+
+/* Decompose the FIS:
+ * AHCI does not interpret FIS packets, it only forwards them.
+ * SATA 1.0 describes how to decode LBA28 and CHS FIS packets.
+ * Later specifications, e.g, SATA 3.2, describe LBA48 FIS packets.
+ *
+ * ATA4 describes sector number for LBA28/CHS commands.
+ * ATA6 describes sector number for LBA48 commands.
+ * ATA8 deprecates CHS fully, describing only LBA28/48.
+ *
+ * We dutifully convert the FIS into IDE registers, and allow the
+ * core layer to interpret them as needed. */
+ide_state-feature = cmd_fis[3];
+ide_state-sector = cmd_fis[4];  /* LBA 7:0 */
+ide_state-lcyl = cmd_fis[5];/* LBA 15:8  */
+ide_state-hcyl = cmd_fis[6];/* LBA 23:16 */
+ide_state-select = cmd_fis[7];  /* LBA 27:24 (LBA28) */
+ide_state-hob_sector = cmd_fis[8];  /* LBA 31:24 */
+ide_state-hob_lcyl = cmd_fis[9];/* LBA 39:32 */
+ide_state-hob_hcyl = cmd_fis[10];   /* LBA 47:40 */
+ide_state-hob_feature = cmd_fis[11];
+ide_state-nsector = (int64_t)((cmd_fis[13]  8) | cmd_fis[12]);
+/* 14, 16, 17, 18, 19: Reserved (SATA 1.0) */
+/* 15: Only valid when UPDATE_COMMAND not set. */
+
+/* Copy the ACMD field (ATAPI packet, if any) from the AHCI command
+ * table to ide_state-io_buffer */
+if (opts  AHCI_CMD_ATAPI) {
+memcpy(ide_state-io_buffer, cmd_fis[AHCI_COMMAND_TABLE_ACMD], 0x10);
+debug_print_fis(ide_state-io_buffer, 0x10);
+s-dev[port].done_atapi_packet = false;
+/* XXX send PIO setup FIS */
+}
+
+ide_state-error = 0;
+
+/* Reset transferred byte counter */
+cmd-status = 0;
+
+/* We're ready to process the command in FIS byte 2. */
+ide_exec_cmd(s-dev[port].port, cmd_fis[2]);
+}
+
 static int handle_cmd(AHCIState *s, int port, int slot)
 {
 IDEState *ide_state;
-uint32_t opts;
 uint64_t tbl_addr;
 AHCICmdHdr *cmd;
 uint8_t *cmd_fis;
@@ -976,7 +1060,6 @@ static int handle_cmd(AHCIState *s, int port, int slot)
 return -1;
 }
 
-opts = le32_to_cpu(cmd-opts);
 tbl_addr = le64_to_cpu(cmd-tbl_addr);
 cmd_len = 0x80;
 cmd_fis = dma_memory_map(s-as, tbl_addr, cmd_len,
@@ -995,93 +1078,13 @@ static int handle_cmd(AHCIState *s, int port, int slot)
 
 switch (cmd_fis[0]) {
 case SATA_FIS_TYPE_REGISTER_H2D:
+handle_reg_h2d_fis(s, port, slot, cmd_fis);
 break;
 default:
 DPRINTF(port, unknown command cmd_fis[0]=%02x cmd_fis[1]=%02x 
   cmd_fis[2]=%02x\n, cmd_fis[0], 

[Qemu-devel] [PULL for-2.2 10/11] block: Fix max nb_sectors in bdrv_make_zero

2014-11-14 Thread Stefan Hajnoczi
From: Fam Zheng f...@redhat.com

In bdrv_rw_co we report -EINVAL for nb_sectors  INT_MAX /
BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Markus Armbruster arm...@redhat.com
Message-id: 1415603264-21497-1-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index c612826..a612594 100644
--- a/block.c
+++ b/block.c
@@ -2790,8 +2790,8 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags 
flags)
 if (nb_sectors = 0) {
 return 0;
 }
-if (nb_sectors  INT_MAX) {
-nb_sectors = INT_MAX;
+if (nb_sectors  INT_MAX / BDRV_SECTOR_SIZE) {
+nb_sectors = INT_MAX / BDRV_SECTOR_SIZE;
 }
 ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n);
 if (ret  0) {
-- 
2.1.0




[Qemu-devel] [PULL for-2.2 11/11] vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info

2014-11-14 Thread Stefan Hajnoczi
From: Fam Zheng f...@redhat.com

When extent types don't match, we return -ENOTSUP. In this case, be
polite to the caller and don't modify bdi.

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Message-id: 1415938161-16217-1-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/vmdk.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 673d3f5..2cbfd3e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2137,23 +2137,29 @@ static ImageInfoSpecific 
*vmdk_get_specific_info(BlockDriverState *bs)
 return spec_info;
 }
 
+static bool vmdk_extents_type_eq(const VmdkExtent *a, const VmdkExtent *b)
+{
+return a-flat == b-flat 
+   a-compressed == b-compressed 
+   (a-flat || a-cluster_sectors == b-cluster_sectors);
+}
+
 static int vmdk_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 int i;
 BDRVVmdkState *s = bs-opaque;
 assert(s-num_extents);
-bdi-needs_compressed_writes = s-extents[0].compressed;
-if (!s-extents[0].flat) {
-bdi-cluster_size = s-extents[0].cluster_sectors  BDRV_SECTOR_BITS;
-}
+
 /* See if we have multiple extents but they have different cases */
 for (i = 1; i  s-num_extents; i++) {
-if (bdi-needs_compressed_writes != s-extents[i].compressed ||
-(bdi-cluster_size  bdi-cluster_size !=
-s-extents[i].cluster_sectors  BDRV_SECTOR_BITS)) {
+if (!vmdk_extents_type_eq(s-extents[0], s-extents[i])) {
 return -ENOTSUP;
 }
 }
+bdi-needs_compressed_writes = s-extents[0].compressed;
+if (!s-extents[0].flat) {
+bdi-cluster_size = s-extents[0].cluster_sectors  BDRV_SECTOR_BITS;
+}
 return 0;
 }
 
-- 
2.1.0




[Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-14 Thread Fabio Fantoni
dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf 
with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and 
qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging

Qemu crash on fedora 20 lxde (with software updates of some days ago) 
boot with this backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x55689607 in vmport_ioport_read (opaque=0x56440a20, 
addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73

73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689607 in vmport_ioport_read (opaque=0x56440a20, 
addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73

s = 0x56440a20
cs = 0x0
cpu = 0x0
__func__ = vmport_ioport_read
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655b9c in memory_region_read_accessor 
(mr=0x56440aa8, addr=0, value=0x7fffd8c0, size=4, shift=0, 
mask=4294967295)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x55655e8f in access_with_adjusted_size (addr=0, 
value=0x7fffd8c0, size=4, access_size_min=4, access_size_max=4,
access=0x55655b3a memory_region_read_accessor, 
mr=0x56440aa8) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480

access_mask = 4294967295
access_size = 4
i = 0
#3  0x55658cc1 in memory_region_dispatch_read1 
(mr=0x56440aa8, addr=0, size=4) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077

data = 0
#4  0x55658d89 in memory_region_dispatch_read 
(mr=0x56440aa8, addr=0, pval=0x7fffd998, size=4) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099

No locals.
#5  0x5565c794 in io_mem_read (mr=0x56440aa8, addr=0, 
pval=0x7fffd998, size=4) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962

No locals.
#6  0x55609fae in address_space_rw (as=0x55eae840, 
addr=22104, buf=0x7fffda40 \377\377\377\377, len=4, is_write=false)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2169
l = 4
ptr = 0x0
val = 7964229952888770560
addr1 = 0
mr = 0x56440aa8
error = false
#7  0x5560a173 in address_space_read (as=0x55eae840, 
addr=22104, buf=0x7fffda40 \377\377\377\377, len=4) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2207

No locals.
#8  0x5564fac7 in cpu_inl (addr=22104) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117

buf = \377\377\377\377
val = 21845
#9  0x5567084b in do_inp (addr=22104, size=4) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684

No locals.
#10 0x55670ab8 in cpu_ioreq_pio (req=0x77ff3000) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747

i = 0
#11 0x5567108b in handle_ioreq (state=0x563c1590, 
req=0x77ff3000) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

---Type return to continue, or q return to quit---
No locals.
#12 0x556713fe in cpu_handle_ioreq (opaque=0x563c1590) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931

state = 0x563c1590
req = 0x77ff3000
#13 0x5596d874 in qemu_iohandler_poll (pollfds=0x56388a30, 
ret=1) at iohandler.c:143

revents = 1
pioh = 0x563f3c90
ioh = 0x56515f80
#14 0x5596d450 in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3418165
#15 0x557567b7 in main_loop () at vl.c:1882
nonblocking = false
last_io = 1
#16 0x5575e4c1 in main (argc=62, argv=0x7fffe038, 
envp=0x7fffe230) at vl.c:4400

i = 128
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x55a485c6 
boot_order = 0x563864e0 dc
ds = 0x564c71b0
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = 0x0
opts = 0x56386430
machine_opts = 0x56388090
icount_opts = 0x0
olist = 0x55e56da0
optind = 62
optarg = 0x7fffe914 
file=/mnt/vm/disks/FEDORA19.disk1.xm,if=ide,index=0,media=disk,format=raw,cache=writeback

loadvm = 0x0
machine_class = 0x5637c5c0
cpu_model = 0x0
vga_model = 0x0
qtest_chrdev = 0x0
---Type return to continue, or q return to quit---
qtest_log = 0x0
pid_file = 0x0
incoming = 0x0
show_vnc_port = 0
defconfig = true
userconfig = true
log_mask = 0x0
log_file = 0x0
mem_trace = {malloc = 0x55759e7a malloc_and_trace, 
realloc = 0x55759ed2 realloc_and_trace, free = 0x55759f39 
free_and_trace, calloc = 0,

  try_malloc = 0, try_realloc = 0}
trace_events = 0x0
  

[Qemu-devel] [PULL for-2.2 0/2] Xen tree 2014-11-14

2014-11-14 Thread Stefano Stabellini
The following changes since commit c52e67924fbdadfa00668248f5c075542943c54c:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-13 15:44:16 +)

are available in the git repository at:


  git://xenbits.xen.org/people/sstabellini/qemu-dm.git xen-2014-11-14

for you to fetch changes up to 2f01dfacb56bc7a0d4639adc9dff9aae131e6216:

  xen_disk: fix unmapping of persistent grants (2014-11-14 11:12:38 +)


Igor Mammedov (1):
  pc: piix4_pm: init legacy PCI hotplug when running on Xen

Roger Pau Monne (1):
  xen_disk: fix unmapping of persistent grants

 hw/acpi/piix4.c |4 +++
 hw/block/xen_disk.c |   72 ++-
 hw/i386/pc_piix.c   |   11 
 3 files changed, 70 insertions(+), 17 deletions(-)



[Qemu-devel] [PULL for-2.2 1/2] pc: piix4_pm: init legacy PCI hotplug when running on Xen

2014-11-14 Thread Stefano Stabellini
From: Igor Mammedov imamm...@redhat.com

If user starts QEMU with -machine pc,accel=xen, then
compat property in xenfv won't work and it would cause error:
Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set
when PCI device is added with -device on QEMU CLI.

From: Igor Mammedov imamm...@redhat.com

In case of Xen instead of using compat property, just use the fact
that xen doesn't use QEMU's fw_cfg/acpi tables to switch piix4_pm
into legacy PCI hotplug mode when Xen is enabled.

Signed-off-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Li Liang liang.z...@intel.com
Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/acpi/piix4.c   |4 
 hw/i386/pc_piix.c |   11 ---
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 78c0a6d..481a16c 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -36,6 +36,7 @@
 #include hw/mem/pc-dimm.h
 #include hw/acpi/memory_hotplug.h
 #include hw/acpi/acpi_dev_interface.h
+#include hw/xen/xen.h
 
 //#define DEBUG
 
@@ -501,6 +502,9 @@ I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t 
smb_io_base,
 s-irq = sci_irq;
 s-smi_irq = smi_irq;
 s-kvm_enabled = kvm_enabled;
+if (xen_enabled()) {
+s-use_acpi_pci_hotplug = false;
+}
 
 qdev_init_nofail(dev);
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b559181..7bb97a4 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -916,17 +916,6 @@ static QEMUMachine xenfv_machine = {
 .max_cpus = HVM_MAX_VCPUS,
 .default_machine_opts = accel=xen,
 .hot_add_cpu = pc_hot_add_cpu,
-.compat_props = (GlobalProperty[]) {
-/* xenfv has no fwcfg and so does not load acpi from QEMU.
- * as such new acpi features don't work.
- */
-{
-.driver   = PIIX4_PM,
-.property = acpi-pci-hotplug-with-bridge-support,
-.value= off,
-},
-{ /* end of list */ }
-},
 };
 #endif
 
-- 
1.7.10.4




[Qemu-devel] [PULL for-2.2 2/2] xen_disk: fix unmapping of persistent grants

2014-11-14 Thread Stefano Stabellini
From: Roger Pau Monne roger@citrix.com

This patch fixes two issues with persistent grants and the disk PV backend
(Qdisk):

 - Keep track of memory regions where persistent grants have been mapped
   since we need to unmap them as a whole. It is not possible to unmap a
   single grant if it has been batch-mapped. A new check has also been added
   to make sure persistent grants are only used if the whole mapped region
   can be persistently mapped in the batch_maps case.
 - Unmap persistent grants before switching to the closed state, so the
   frontend can also free them.

Signed-off-by: Roger Pau Monné roger@citrix.com
Reported-by: George Dunlap george.dun...@eu.citrix.com
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Kevin Wolf kw...@redhat.com
Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 hw/block/xen_disk.c |   72 ++-
 1 file changed, 66 insertions(+), 6 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 231e9a7..21842a0 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -59,6 +59,13 @@ struct PersistentGrant {
 
 typedef struct PersistentGrant PersistentGrant;
 
+struct PersistentRegion {
+void *addr;
+int num;
+};
+
+typedef struct PersistentRegion PersistentRegion;
+
 struct ioreq {
 blkif_request_t req;
 int16_t status;
@@ -118,6 +125,7 @@ struct XenBlkDev {
 gbooleanfeature_discard;
 gbooleanfeature_persistent;
 GTree   *persistent_gnts;
+GSList  *persistent_regions;
 unsigned intpersistent_gnt_count;
 unsigned intmax_grants;
 
@@ -177,6 +185,23 @@ static void destroy_grant(gpointer pgnt)
 g_free(grant);
 }
 
+static void remove_persistent_region(gpointer data, gpointer dev)
+{
+PersistentRegion *region = data;
+struct XenBlkDev *blkdev = dev;
+XenGnttab gnt = blkdev-xendev.gnttabdev;
+
+if (xc_gnttab_munmap(gnt, region-addr, region-num) != 0) {
+xen_be_printf(blkdev-xendev, 0,
+  xc_gnttab_munmap region %p failed: %s\n,
+  region-addr, strerror(errno));
+}
+xen_be_printf(blkdev-xendev, 3,
+  unmapped grant region %p with %d pages\n,
+  region-addr, region-num);
+g_free(region);
+}
+
 static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 {
 struct ioreq *ioreq = NULL;
@@ -343,6 +368,7 @@ static int ioreq_map(struct ioreq *ioreq)
 void *page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 int i, j, new_maps = 0;
 PersistentGrant *grant;
+PersistentRegion *region;
 /* domids and refs variables will contain the information necessary
  * to map the grants that are needed to fulfill this request.
  *
@@ -421,7 +447,22 @@ static int ioreq_map(struct ioreq *ioreq)
 }
 }
 }
-if (ioreq-blkdev-feature_persistent) {
+if (ioreq-blkdev-feature_persistent  new_maps != 0 
+(!batch_maps || (ioreq-blkdev-persistent_gnt_count + new_maps =
+ioreq-blkdev-max_grants))) {
+/*
+ * If we are using persistent grants and batch mappings only
+ * add the new maps to the list of persistent grants if the whole
+ * area can be persistently mapped.
+ */
+if (batch_maps) {
+region = g_malloc0(sizeof(*region));
+region-addr = ioreq-pages;
+region-num = new_maps;
+ioreq-blkdev-persistent_regions = g_slist_append(
+ioreq-blkdev-persistent_regions,
+region);
+}
 while ((ioreq-blkdev-persistent_gnt_count  
ioreq-blkdev-max_grants)
new_maps) {
 /* Go through the list of newly mapped grants and add as many
@@ -447,6 +488,7 @@ static int ioreq_map(struct ioreq *ioreq)
   grant);
 ioreq-blkdev-persistent_gnt_count++;
 }
+assert(!batch_maps || new_maps == 0);
 }
 for (i = 0; i  ioreq-v.niov; i++) {
 ioreq-v.iov[i].iov_base += (uintptr_t)page[i];
@@ -971,7 +1013,10 @@ static int blk_connect(struct XenDevice *xendev)
 blkdev-max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
 blkdev-persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
  NULL, NULL,
+ batch_maps ?
+ (GDestroyNotify)g_free :
  (GDestroyNotify)destroy_grant);
+blkdev-persistent_regions = NULL;
 blkdev-persistent_gnt_count = 0;
 }
 
@@ -1000,6 +1045,26 @@ static void blk_disconnect(struct XenDevice *xendev)
 blkdev-cnt_map--;
 blkdev-sring = NULL;
 

Re: [Qemu-devel] [PULL for-2.2 00/11] Block patches

2014-11-14 Thread Peter Maydell
On 14 November 2014 11:05, Stefan Hajnoczi stefa...@redhat.com wrote:
 The following changes since commit c52e67924fbdadfa00668248f5c075542943c54c:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-11-13 15:44:16 +)

 are available in the git repository at:

   git://github.com/stefanha/qemu.git tags/block-pull-request

 for you to fetch changes up to 5f58330790b72c4705b373ee0646fb3adf800b4e:

   vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info (2014-11-14 09:20:45 
 +)

 

 

Applied, thanks.

-- PMM



[Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-14 Thread Leen Keus
Command lines generated by virsh from the libvirt xml files (let me know if you 
also need the xml configuration files):
 
qemu-system-x86_64 -enable-kvm -name fsg -S -machine pc-1.2,accel=kvm,usb=off 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
30619811-d285-944a-026a-cdd4a753b77a -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/fsg.monitor,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot 
strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x5 -drive 
file=/vm2/fsgorg/disk-sda2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/vm2/fsgorg/disk-sdb.qcow2,if=none,id=drive-virtio-disk1,format=qcow2 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device 
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 -netdev 
tap,fd=23,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:76:fd:b6,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -vnc 0.0.0.0:1 -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
usb-host,hostbus=1,hostaddr=4,id=hostdev0 -device 
usb-host,hostbus=4,hostaddr=2,id=hostdev1 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on


qemu-system-x86_64 -enable-kvm -name fnas -S -machine pc-
i440fx-1.4,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp
2,sockets=2,cores=1,threads=1 -uuid 7c2703e5-b8b8-acd9-df8b-6284382fb58b
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/fnas.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-
shutdown -boot menu=off,strict=on -device ich9-usb-
ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-
uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
-device ich9-usb-
uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9
-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -drive
file=/vm2/FreeNAS/fnas.qcow2,if=none,id=drive-ide0-0-0,format=qcow2
-device ide-hd,bus=ide.0,unit=0,drive=drive-
ide0-0-0,id=ide0-0-0,bootindex=1 -drive
file=/vm2/FreeNAS/fnas2.qcow2,if=none,id=drive-ide0-0-1,format=qcow2
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive
if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-
cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=23,id=hostnet0 -device
rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:d8:31:b5,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-
serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:5 -device cirrus-
vga,id=video0,bus=pci.0,addr=0x2 -device intel-
hda,id=sound0,bus=pci.0,addr=0x4 -device hda-
duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device usb-
host,hostbus=2,hostaddr=3,id=hostdev0 -device virtio-balloon-
pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on


qemu-system-x86_64 -enable-kvm -name win7 -S -machine pc-1.2,accel=kvm,usb=off 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
717d41d2-cfc9-5d70-e2f5-817c2ed1a90a -nouser-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 
-drive file=/vm/Windows7/win7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/vm/Windows7/joespb.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=23,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:7b:7f,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -vnc 0.0.0.0:2 -device 
VGA,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
usb-host,hostbus=1,hostaddr=3,id=hostdev0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -usb -msg timestamp=on

Regards,
Leen

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1392504

Title:
  USB Passthrough is 

[Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info

2014-11-14 Thread Max Reitz
Add the bit width of every refcount entry to the format-specific
information.

In contrast to lazy_refcounts and the corrupt flag, this should be
always emitted, even for compat=0.10 although it does not support any
refcount width other than 16 bits. This is because if a boolean is
optional, one normally assumes it to be false when omitted; but if an
integer is not specified, it is rather difficult to guess its value.

This new field breaks some test outputs, fix them.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2.c  |  4 +++-
 qapi/block-core.json   |  5 -
 tests/qemu-iotests/060.out |  1 +
 tests/qemu-iotests/065 | 23 +++
 tests/qemu-iotests/067.out |  5 +
 tests/qemu-iotests/082.out |  7 +++
 tests/qemu-iotests/089.out |  2 ++
 7 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f57aff9..d70e927 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2475,7 +2475,8 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 };
 if (s-qcow_version == 2) {
 *spec_info-qcow2 = (ImageInfoSpecificQCow2){
-.compat = g_strdup(0.10),
+.compat = g_strdup(0.10),
+.refcount_width = s-refcount_bits,
 };
 } else if (s-qcow_version == 3) {
 *spec_info-qcow2 = (ImageInfoSpecificQCow2){
@@ -2486,6 +2487,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 .corrupt= s-incompatible_features 
   QCOW2_INCOMPAT_CORRUPT,
 .has_corrupt= true,
+.refcount_width = s-refcount_bits,
 };
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b7083fb..e3a3cb7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,13 +41,16 @@
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #   compat = 1.1 (since 2.2)
 #
+# @refcount-width: width of a refcount entry in bits (since 2.3)
+#
 # Since: 1.7
 ##
 { 'type': 'ImageInfoSpecificQCow2',
   'data': {
   'compat': 'str',
   '*lazy-refcounts': 'bool',
-  '*corrupt': 'bool'
+  '*corrupt': 'bool',
+  'refcount-width': 'int'
   } }
 
 ##
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 9419da1..17b3eaf 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -19,6 +19,7 @@ cluster_size: 65536
 Format specific information:
 compat: 1.1
 lazy refcounts: false
+refcount width: 16
 corrupt: true
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot 
be opened read/write
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 8d3a9c9..8539aeb 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
 class TestQCow2(TestQemuImgInfo):
 '''Testing a qcow2 version 2 image'''
 img_options = 'compat=0.10'
-json_compare = { 'compat': '0.10' }
-human_compare = [ 'compat: 0.10' ]
+json_compare = { 'compat': '0.10', 'refcount-width': 16 }
+human_compare = [ 'compat: 0.10', 'refcount width: 16' ]
 
 class TestQCow3NotLazy(TestQemuImgInfo):
 '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
 img_options = 'compat=1.1,lazy_refcounts=off'
-json_compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': 
False }
-human_compare = [ 'compat: 1.1', 'lazy refcounts: false', 'corrupt: false' 
]
+json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
+ 'refcount-width': 16, 'corrupt': False }
+human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
+  'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
 '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
 img_options = 'compat=1.1,lazy_refcounts=on'
-json_compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False 
}
-human_compare = [ 'compat: 1.1', 'lazy refcounts: true', 'corrupt: false' ]
+json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
+ 'refcount-width': 16, 'corrupt': False }
+human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
+  'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
 '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
with lazy refcounts enabled'''
 img_options = 'compat=1.1,lazy_refcounts=off'
 qemu_options = 'lazy-refcounts=on'
-compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
+compare = { 'compat': '1.1', 'lazy-refcounts': False,
+'refcount-width': 16, 'corrupt': False }
+
 
 class TestQCow3LazyQMP(TestQMP):
 '''Testing a qcow2 version 3 image with lazy refcounts 

[Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes()

2014-11-14 Thread Max Reitz
qcow2_update_cluster_refcount() may fail, and qcow2_alloc_bytes() should
mind that case.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-refcount.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6e06531..be4e5fe 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,7 +761,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t 
offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
 BDRVQcowState *s = bs-opaque;
-int64_t offset, cluster_offset;
+int64_t offset, cluster_offset, new_cluster;
+int64_t ret;
 int free_in_cluster;
 
 BLKDBG_EVENT(bs-file, BLKDBG_CLUSTER_ALLOC_BYTES);
@@ -783,23 +784,32 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 free_in_cluster -= size;
 if (free_in_cluster == 0)
 s-free_byte_offset = 0;
-if (offset_into_cluster(s, offset) != 0)
-qcow2_update_cluster_refcount(bs, offset  s-cluster_bits, 1,
-  QCOW2_DISCARD_NEVER);
+if (offset_into_cluster(s, offset) != 0) {
+ret = qcow2_update_cluster_refcount(bs, offset  s-cluster_bits,
+1, QCOW2_DISCARD_NEVER);
+if (ret  0) {
+return ret;
+}
+}
 } else {
-offset = qcow2_alloc_clusters(bs, s-cluster_size);
-if (offset  0) {
-return offset;
+new_cluster = qcow2_alloc_clusters(bs, s-cluster_size);
+if (new_cluster  0) {
+return new_cluster;
 }
 cluster_offset = start_of_cluster(s, s-free_byte_offset);
-if ((cluster_offset + s-cluster_size) == offset) {
+if ((cluster_offset + s-cluster_size) == new_cluster) {
 /* we are lucky: contiguous data */
 offset = s-free_byte_offset;
-qcow2_update_cluster_refcount(bs, offset  s-cluster_bits, 1,
-  QCOW2_DISCARD_NEVER);
+ret = qcow2_update_cluster_refcount(bs, offset  s-cluster_bits,
+1, QCOW2_DISCARD_NEVER);
+if (ret  0) {
+qcow2_free_clusters(bs, new_cluster, s-cluster_size,
+QCOW2_DISCARD_NEVER);
+return ret;
+}
 s-free_byte_offset += size;
 } else {
-s-free_byte_offset = offset;
+s-free_byte_offset = new_cluster;
 goto redo;
 }
 }
-- 
1.9.3




[Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4

2014-11-14 Thread Max Reitz
Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c  |  20 
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/049.out | 112 ++---
 tests/qemu-iotests/082.out |  41 ++---
 tests/qemu-iotests/085.out |  38 +++
 5 files changed, 130 insertions(+), 82 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6dc1984..657d558 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2065,6 +2065,17 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto finish;
 }
 
+refcount_width = qemu_opt_get_number_del(opts, BLOCK_OPT_REFCOUNT_WIDTH,
+ refcount_width);
+if (refcount_width = 0 || refcount_width  64 ||
+!is_power_of_2(refcount_width))
+{
+error_setg(errp, Refcount width must be a power of two and may not 
+   exceed 64 bits);
+ret = -EINVAL;
+goto finish;
+}
+
 if (version  3  refcount_width != 16) {
 error_setg(errp, Different refcount widths than 16 bits require 
compatibility level 1.1 or above (use compat=1.1 or 
@@ -2704,6 +2715,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(desc-name, lazy_refcounts)) {
 lazy_refcounts = qemu_opt_get_bool(opts, lazy_refcounts,
lazy_refcounts);
+} else if (!strcmp(desc-name, refcount_width)) {
+error_report(Cannot change refcount entry width);
+return -ENOTSUP;
 } else {
 /* if this assertion fails, this probably means a new option was
  * added without having it covered here */
@@ -2873,6 +2887,12 @@ static QemuOptsList qcow2_create_opts = {
 .help = Postpone refcount updates,
 .def_value_str = off
 },
+{
+.name = BLOCK_OPT_REFCOUNT_WIDTH,
+.type = QEMU_OPT_NUMBER,
+.help = Width of a reference count entry in bits,
+.def_value_str = 16
+},
 { /* end of list */ }
 }
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..c34d610 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -56,6 +56,7 @@
 #define BLOCK_OPT_ADAPTER_TYPE  adapter_type
 #define BLOCK_OPT_REDUNDANCYredundancy
 #define BLOCK_OPT_NOCOW nocow
+#define BLOCK_OPT_REFCOUNT_WIDTHrefcount_width
 
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 09ca0ae..9369c12 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off 
cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 

[Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation

2014-11-14 Thread Max Reitz
Add a helper function for reallocating a refcount array, independently
of the refcount order. The newly allocated space is zeroed and the
function handles failed reallocations gracefully.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-refcount.c | 121 +
 1 file changed, 72 insertions(+), 49 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 66c78c0..18fcd0d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1108,6 +1108,54 @@ fail:
 /* refcount checking functions */
 
 
+static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
+{
+if (s-refcount_order  3) {
+/* sub-byte width */
+int shift = 3 - s-refcount_order;
+return (entries + (1  shift) - 1)  shift;
+} else if (s-refcount_order == 3) {
+/* byte width */
+return entries;
+} else {
+/* multiple bytes wide */
+
+/* This assertion holds because there is no way we can address more 
than
+ * 2^(64 - 9) clusters at once (with cluster size 512 = 2^9, and 
because
+ * offsets have to be representable in bytes); due to every cluster
+ * corresponding to one refcount entry and because refcount_order has 
to
+ * be below 7, we are far below that limit */
+assert(!(entries  (64 - (s-refcount_order - 3;
+
+return entries  (s-refcount_order - 3);
+}
+}
+
+static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
+  int64_t *size, int64_t new_size)
+{
+/* Round to clusters so the array can be directly written to disk */
+size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
+s-cluster_size);
+size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
+s-cluster_size);
+uint16_t *new_ptr;
+
+new_ptr = g_try_realloc(*array, new_byte_size);
+if (new_byte_size  !new_ptr) {
+return -ENOMEM;
+}
+
+if (new_ptr) {
+memset((void *)((uintptr_t)new_ptr + old_byte_size), 0,
+   new_byte_size - old_byte_size);
+}
+
+*array = new_ptr;
+*size  = new_size;
+
+return 0;
+}
 
 /*
  * Increases the refcount for a range of clusters in a given refcount table.
@@ -1124,6 +1172,7 @@ static int inc_refcounts(BlockDriverState *bs,
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t start, last, cluster_offset, k;
+int ret;
 
 if (size = 0) {
 return 0;
@@ -1135,23 +1184,12 @@ static int inc_refcounts(BlockDriverState *bs,
 cluster_offset += s-cluster_size) {
 k = cluster_offset  s-cluster_bits;
 if (k = *refcount_table_size) {
-int64_t old_refcount_table_size = *refcount_table_size;
-uint16_t *new_refcount_table;
-
-*refcount_table_size = k + 1;
-new_refcount_table = g_try_realloc(*refcount_table,
-   *refcount_table_size *
-   sizeof(**refcount_table));
-if (!new_refcount_table) {
-*refcount_table_size = old_refcount_table_size;
+ret = realloc_refcount_array(s, refcount_table,
+ refcount_table_size, k + 1);
+if (ret  0) {
 res-check_errors++;
-return -ENOMEM;
+return ret;
 }
-*refcount_table = new_refcount_table;
-
-memset(*refcount_table + old_refcount_table_size, 0,
-   (*refcount_table_size - old_refcount_table_size) *
-   sizeof(**refcount_table));
 }
 
 if (++(*refcount_table)[k] == 0) {
@@ -1518,8 +1556,7 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix  BDRV_FIX_ERRORS ? Repairing : ERROR, i);
 
 if (fix  BDRV_FIX_ERRORS) {
-int64_t old_nb_clusters = *nb_clusters;
-uint16_t *new_refcount_table;
+int64_t new_nb_clusters;
 
 if (offset  INT64_MAX - s-cluster_size) {
 ret = -EINVAL;
@@ -1536,22 +1573,15 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
 goto resize_fail;
 }
 
-*nb_clusters = size_to_clusters(s, size);
-assert(*nb_clusters = old_nb_clusters);
+new_nb_clusters = size_to_clusters(s, size);
+assert(new_nb_clusters = *nb_clusters);
 
-new_refcount_table = g_try_realloc(*refcount_table,
-   *nb_clusters *
-   sizeof(**refcount_table));
-if (!new_refcount_table) {
-*nb_clusters = old_nb_clusters;
+ 

[Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values

2014-11-14 Thread Max Reitz
Refcounts may have a width of up to 64 bits, so qemu should use the same
width to represent refcount values internally.

Since for instance qcow2_get_refcount() signals an error by returning a
negative value, refcount values are generally signed to be able to
represent those error values correctly. This limits the maximum refcount
value supported by qemu to INT64_MAX (= 63 bits), as established in
qcow2: Add two new fields to BDRVQcowState.

This limitation should have no implications in practice for normal valid
images. If the MSb in a 64 bit refcount value is set, we can safely
assume the value to be invalid (because reaching such high refcounts is
impossible due to other limitations of the qcow2 format).

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-cluster.c  |  9 ++---
 block/qcow2-refcount.c | 37 -
 block/qcow2.h  |  7 ---
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index df0b2c9..ab43902 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 for (i = 0; i  l1_size; i++) {
 uint64_t l2_offset = l1_table[i]  L1E_OFFSET_MASK;
 bool l2_dirty = false;
-int l2_refcount;
+int64_t l2_refcount;
 
 if (!l2_offset) {
 /* unallocated */
@@ -1696,14 +1696,17 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 }
 
 if (l2_refcount  1) {
+int64_t ret64;
+
 /* For shared L2 tables, set the refcount accordingly (it 
is
  * already 1 and needs to be l2_refcount) */
-ret = qcow2_update_cluster_refcount(bs,
+ret64 = qcow2_update_cluster_refcount(bs,
 offset  s-cluster_bits, l2_refcount - 1,
 QCOW2_DISCARD_OTHER);
-if (ret  0) {
+if (ret64  0) {
 qcow2_free_clusters(bs, offset, s-cluster_size,
 QCOW2_DISCARD_OTHER);
+ret = ret64;
 goto fail;
 }
 }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6016211..6e06531 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -91,14 +91,14 @@ static int load_refcount_block(BlockDriverState *bs,
  * return value is the refcount of the cluster, negative values are -errno
  * and indicate an error.
  */
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t refcount_table_index, block_index;
 int64_t refcount_block_offset;
 int ret;
 uint16_t *refcount_block;
-uint16_t refcount;
+int64_t refcount;
 
 refcount_table_index = cluster_index  s-refcount_block_bits;
 if (refcount_table_index = s-refcount_table_size)
@@ -556,9 +556,10 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 for(cluster_offset = start; cluster_offset = last;
 cluster_offset += s-cluster_size)
 {
-int block_index, refcount;
+int block_index;
 int64_t cluster_index = cluster_offset  s-cluster_bits;
 int64_t table_index = cluster_index  s-refcount_block_bits;
+int64_t refcount;
 
 /* Load the refcount block and allocate it if needed */
 if (table_index != old_table_index) {
@@ -634,10 +635,10 @@ fail:
  * If the return value is non-negative, it is the new refcount of the cluster.
  * If it is negative, it is -errno and indicates an error.
  */
-int qcow2_update_cluster_refcount(BlockDriverState *bs,
-  int64_t cluster_index,
-  int addend,
-  enum qcow2_discard_type type)
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+  int64_t cluster_index,
+  int addend,
+  enum qcow2_discard_type type)
 {
 BDRVQcowState *s = bs-opaque;
 int ret;
@@ -663,7 +664,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, 
uint64_t size)
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t i, nb_clusters;
-int refcount;
+int64_t refcount;
 
 nb_clusters = size_to_clusters(s, size);
 retry:
@@ -722,7 +723,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t 
offset,
 BDRVQcowState *s = bs-opaque;
 uint64_t cluster_index;
 uint64_t i;
-int refcount, ret;
+int64_t refcount;
+int ret;
 
 assert(nb_clusters = 0);
 if 

[Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2

2014-11-14 Thread Max Reitz
Add a refcount_order parameter to qcow2_create2(), use that value for
the image header and for calculating the size required for
preallocation.

For now, always pass 4.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2.c | 41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 528d696..6dc1984 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1775,7 +1775,7 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version,
+ QemuOpts *opts, int version, int refcount_order,
  Error **errp)
 {
 /* Calculate cluster_bits */
@@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 int64_t meta_size = 0;
 uint64_t nreftablee, nrefblocke, nl1e, nl2e;
 int64_t aligned_total_size = align_offset(total_size, cluster_size);
+int refblock_bits, refblock_size;
+/* refcount entry size in bytes */
+double rces = (1  refcount_order) / 8.;
+
+/* see qcow2_open() */
+refblock_bits = cluster_bits - (refcount_order - 3);
+refblock_size = 1  refblock_bits;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
@@ -1835,20 +1842,20 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  *   c = cluster size
  *   y1 = number of refcount blocks entries
  *   y2 = meta size including everything
+ *   rces = refcount entry size in bytes
  * then,
  *   y1 = (y2 + a)/c
- *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
+ *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
  * we can get y1:
- *   y1 = (a + m) / (c - sizeof(u16) - sizeof(u16) * sizeof(u64) / c)
+ *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
  */
-nrefblocke = (aligned_total_size + meta_size + cluster_size) /
-(cluster_size - sizeof(uint16_t) -
- 1.0 * sizeof(uint16_t) * sizeof(uint64_t) / cluster_size);
-nrefblocke = align_offset(nrefblocke, cluster_size / sizeof(uint16_t));
-meta_size += nrefblocke * sizeof(uint16_t);
+nrefblocke = (aligned_total_size + meta_size + cluster_size)
+   / (cluster_size - rces - rces * sizeof(uint64_t)
+ / cluster_size);
+meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
 
 /* total size of refcount tables */
-nreftablee = nrefblocke * sizeof(uint16_t) / cluster_size;
+nreftablee = nrefblocke / refblock_size;
 nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
 meta_size += nreftablee * sizeof(uint64_t);
 
@@ -1883,7 +1890,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 .l1_size= cpu_to_be32(0),
 .refcount_table_offset  = cpu_to_be64(cluster_size),
 .refcount_table_clusters= cpu_to_be32(1),
-.refcount_order = cpu_to_be32(4),
+.refcount_order = cpu_to_be32(refcount_order),
 .header_length  = cpu_to_be32(sizeof(*header)),
 };
 
@@ -2003,6 +2010,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
 PreallocMode prealloc;
 int version = 3;
+int refcount_width = 16, refcount_order;
 Error *local_err = NULL;
 int ret;
 
@@ -2057,8 +2065,19 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto finish;
 }
 
+if (version  3  refcount_width != 16) {
+error_setg(errp, Different refcount widths than 16 bits require 
+   compatibility level 1.1 or above (use compat=1.1 or 
+   greater));
+ret = -EINVAL;
+goto finish;
+}
+
+refcount_order = ffs(refcount_width) - 1;
+
 ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
-cluster_size, prealloc, opts, version, local_err);
+cluster_size, prealloc, opts, version, refcount_order,
+local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 }
-- 
1.9.3




[Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options()

2014-11-14 Thread Max Reitz
Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c  | 14 ++
 tests/qemu-iotests/061.out | 14 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d084485..e82120c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(compat, 1.1)) {
 new_version = 3;
 } else {
-fprintf(stderr, Unknown compatibility level %s.\n, compat);
+error_report(Unknown compatibility level %s, compat);
 return -EINVAL;
 }
 } else if (!strcmp(desc-name, preallocation)) {
-fprintf(stderr, Cannot change preallocation mode.\n);
+error_report(Cannot change preallocation mode);
 return -ENOTSUP;
 } else if (!strcmp(desc-name, size)) {
 new_size = qemu_opt_get_size(opts, size, 0);
@@ -2701,16 +2701,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 } else if (!strcmp(desc-name, encryption)) {
 encrypt = qemu_opt_get_bool(opts, encryption, s-crypt_method);
 if (encrypt != !!s-crypt_method) {
-fprintf(stderr, Changing the encryption flag is not 
-supported.\n);
+error_report(Changing the encryption flag is not supported);
 return -ENOTSUP;
 }
 } else if (!strcmp(desc-name, cluster_size)) {
 cluster_size = qemu_opt_get_size(opts, cluster_size,
  cluster_size);
 if (cluster_size != s-cluster_size) {
-fprintf(stderr, Changing the cluster size is not 
-supported.\n);
+error_report(Changing the cluster size is not supported);
 return -ENOTSUP;
 }
 } else if (!strcmp(desc-name, lazy_refcounts)) {
@@ -2756,8 +2754,8 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 if (s-use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
 if (s-qcow_version  3) {
-fprintf(stderr, Lazy refcounts only supported with 
compatibility 
-level 1.1 and above (use compat=1.1 or greater)\n);
+error_report(Lazy refcounts only supported with compatibility 

+ level 1.1 and above (use compat=1.1 or 
greater));
 return -EINVAL;
 }
 s-compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 9045544..2fd92ca 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -281,19 +281,19 @@ No errors were found on the image.
 === Testing invalid configurations ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42
 qemu-img: Error while amending options: Invalid argument
 qemu-img: Invalid parameter 'foo'
 qemu-img: Invalid options for file format 'qcow2'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported
 qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode
 qemu-img: Error while amending options: Operation not supported
 
 === Testing correct handling of unset value ===
@@ -301,7 +301,7 @@ qemu-img: Error while amending options: Operation not 
supported
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
 Should work:
 Should not work:
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
 
 === Testing zero expansion on inactive clusters ===
-- 
1.9.3




[Qemu-devel] [PATCH v2 08/21] qcow2: More helpers for refcount modification

2014-11-14 Thread Max Reitz
Add helper functions for getting and setting refcounts in a refcount
array for any possible refcount order, and choose the correct one during
refcount initialization.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-refcount.c | 146 -
 1 file changed, 144 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b3ca7d2..2e13a9c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,10 +32,73 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 int64_t offset, int64_t length,
 int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index);
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index);
 
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+ uint64_t value);
 static void set_refcount_ro4(void *refcount_array, uint64_t index,
  uint64_t value);
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+ uint64_t value);
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+ uint64_t value);
+
+static void get_refcount_functions(int refcount_order,
+   Qcow2GetRefcountFunc **get,
+   Qcow2SetRefcountFunc **set)
+{
+switch (refcount_order) {
+case 0:
+*get = get_refcount_ro0;
+*set = set_refcount_ro0;
+break;
+
+case 1:
+*get = get_refcount_ro1;
+*set = set_refcount_ro1;
+break;
+
+case 2:
+*get = get_refcount_ro2;
+*set = set_refcount_ro2;
+break;
+
+case 3:
+*get = get_refcount_ro3;
+*set = set_refcount_ro3;
+break;
+
+case 4:
+*get = get_refcount_ro4;
+*set = set_refcount_ro4;
+break;
+
+case 5:
+*get = get_refcount_ro5;
+*set = set_refcount_ro5;
+break;
+
+case 6:
+*get = get_refcount_ro6;
+*set = set_refcount_ro6;
+break;
+
+default:
+abort();
+}
+}
 
 
 /*/
@@ -47,8 +110,8 @@ int qcow2_refcount_init(BlockDriverState *bs)
 unsigned int refcount_table_size2, i;
 int ret;
 
-s-get_refcount = get_refcount_ro4;
-s-set_refcount = set_refcount_ro4;
+get_refcount_functions(s-refcount_order,
+   s-get_refcount, s-set_refcount);
 
 assert(s-refcount_table_size = INT_MAX / sizeof(uint64_t));
 refcount_table_size2 = s-refcount_table_size * sizeof(uint64_t);
@@ -80,6 +143,59 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 8]  (index % 8))  0x1;
+}
+
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value  1));
+((uint8_t *)refcount_array)[index / 8] = ~(0x1  (index % 8));
+((uint8_t *)refcount_array)[index / 8] |= value  (index % 8);
+}
+
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 4]  (2 * (index % 4)))
+0x3;
+}
+
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value  2));
+((uint8_t *)refcount_array)[index / 4] = ~(0x3  (2 * (index % 4)));
+((uint8_t *)refcount_array)[index / 4] |= value  (2 * (index % 4));
+}
+
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index)
+{
+return (((const uint8_t *)refcount_array)[index / 2]  (4 * (index % 2)))
+0xf;
+}
+
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+ uint64_t value)
+{
+assert(!(value  4));
+   

[Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false)

2014-11-14 Thread Max Reitz
Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e82120c..423af48 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 error_report(Cannot change refcount entry width);
 return -ENOTSUP;
 } else {
-/* if this assertion fails, this probably means a new option was
+/* if this point is reached, this probably means a new option was
  * added without having it covered here */
-assert(false);
+abort();
 }
 
 desc++;
-- 
1.9.3




[Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB for amend

2014-11-14 Thread Max Reitz
If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 80 ++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d2553b9..0263019 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2655,6 +2655,75 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return 0;
 }
 
+typedef enum Qcow2AmendOperation {
+/* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+ * statically initialized to so that the helper CB can discern the first
+ * invocation from an operation change */
+QCOW2_NO_OPERATION = 0,
+
+QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;
+
+typedef struct Qcow2AmendHelperCBInfo {
+/* The code coordinating the amend operations should only modify
+ * these four fields; the rest will be managed by the CB */
+BlockDriverAmendStatusCB *original_status_cb;
+void *original_cb_opaque;
+
+Qcow2AmendOperation current_operation;
+
+/* Total number of operations to perform (only set once) */
+int total_operations;
+
+/* The following fields are managed by the CB */
+
+/* Number of operations completed */
+int operations_completed;
+
+/* Cumulative offset of all completed operations */
+int64_t offset_completed;
+
+Qcow2AmendOperation last_operation;
+int64_t last_work_size;
+} Qcow2AmendHelperCBInfo;
+
+static void qcow2_amend_helper_cb(BlockDriverState *bs,
+  int64_t operation_offset,
+  int64_t operation_work_size, void *opaque)
+{
+Qcow2AmendHelperCBInfo *info = opaque;
+int64_t current_work_size;
+int64_t projected_work_size;
+
+if (info-current_operation != info-last_operation) {
+if (info-last_operation != QCOW2_NO_OPERATION) {
+info-offset_completed += info-last_work_size;
+info-operations_completed++;
+}
+
+info-last_operation = info-current_operation;
+}
+
+assert(info-total_operations  0);
+assert(info-operations_completed  info-total_operations);
+
+info-last_work_size = operation_work_size;
+
+current_work_size = info-offset_completed + operation_work_size;
+
+/* current_work_size is the total work size for (operations_completed + 1)
+ * operations (which includes this one), so multiply it by the number of
+ * operations not covered and divide it by the number of operations
+ * covered to get a projection for the operations not covered */
+projected_work_size = current_work_size * (info-total_operations -
+   info-operations_completed - 1)
+/ (info-operations_completed + 1);
+
+info-original_status_cb(bs, info-offset_completed + operation_offset,
+ current_work_size + projected_work_size,
+ info-original_cb_opaque);
+}
+
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb,
void *cb_opaque)
@@ -2669,6 +2738,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 bool encrypt;
 int ret;
 QemuOptDesc *desc = opts-list-desc;
+Qcow2AmendHelperCBInfo helper_cb_info;
 
 while (desc  desc-name) {
 if (!qemu_opt_find(opts, desc-name)) {
@@ -2726,6 +2796,12 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
+helper_cb_info = (Qcow2AmendHelperCBInfo){
+.original_status_cb = status_cb,
+.original_cb_opaque = cb_opaque,
+.total_operations = (new_version  old_version)
+};
+
 /* Upgrade first (some features may require compat=1.1) */
 if (new_version  old_version) {
 s-qcow_version = new_version;
@@ -2784,7 +2860,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 /* Downgrade last (so unsupported features can be removed before) */
 if (new_version  old_version) {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+helper_cb_info.current_operation = QCOW2_DOWNGRADING;
+ret = qcow2_downgrade(bs, new_version, qcow2_amend_helper_cb,
+  helper_cb_info);
 if (ret  0) {
 return ret;
 }
-- 
1.9.3




[Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()

2014-11-14 Thread Max Reitz
qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
the refcount is increased accordingly. However, if this would lead to an
overflow the function should instead just not reuse this cluster and
allocate a new one.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-refcount.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index be4e5fe..66c78c0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,12 +761,13 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, 
uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
 BDRVQcowState *s = bs-opaque;
-int64_t offset, cluster_offset, new_cluster;
+int64_t offset, cluster_offset, new_cluster, refcount;
 int64_t ret;
 int free_in_cluster;
 
 BLKDBG_EVENT(bs-file, BLKDBG_CLUSTER_ALLOC_BYTES);
 assert(size  0  size = s-cluster_size);
+ redo:
 if (s-free_byte_offset == 0) {
 offset = qcow2_alloc_clusters(bs, s-cluster_size);
 if (offset  0) {
@@ -774,12 +775,25 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 }
 s-free_byte_offset = offset;
 }
- redo:
+
 free_in_cluster = s-cluster_size -
 offset_into_cluster(s, s-free_byte_offset);
 if (size = free_in_cluster) {
 /* enough space in current cluster */
 offset = s-free_byte_offset;
+
+if (offset_into_cluster(s, offset) != 0) {
+/* We will have to increase the refcount of this cluster; if the
+ * maximum has been reached already, this cluster cannot be used */
+refcount = qcow2_get_refcount(bs, offset  s-cluster_bits);
+if (refcount  0) {
+return refcount;
+} else if (refcount == s-refcount_max) {
+s-free_byte_offset = 0;
+goto redo;
+}
+}
+
 s-free_byte_offset += size;
 free_in_cluster -= size;
 if (free_in_cluster == 0)
@@ -800,6 +814,20 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 if ((cluster_offset + s-cluster_size) == new_cluster) {
 /* we are lucky: contiguous data */
 offset = s-free_byte_offset;
+
+/* Same as above: In order to reuse the cluster, the refcount has 
to
+ * be increased; if that will not work, we are not so lucky after
+ * all */
+refcount = qcow2_get_refcount(bs, offset  s-cluster_bits);
+if (refcount  0) {
+qcow2_free_clusters(bs, new_cluster, s-cluster_size,
+QCOW2_DISCARD_NEVER);
+return refcount;
+} else if (refcount == s-refcount_max) {
+s-free_byte_offset = offset;
+goto redo;
+}
+
 ret = qcow2_update_cluster_refcount(bs, offset  s-cluster_bits,
 1, QCOW2_DISCARD_NEVER);
 if (ret  0) {
-- 
1.9.3




[Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-14 Thread Max Reitz
Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2-refcount.c | 457 +
 block/qcow2.h  |   4 +
 2 files changed, 461 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2e13a9c..b59a028 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2484,3 +2484,460 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
int ign, int64_t offset,
 
 return 0;
 }
+
+/* A pointer to a function of this type is given to walk_over_reftable(). That
+ * function will create refblocks and pass them to a RefblockFinishOp once they
+ * are completed (@refblock). @refblock_empty is set if the refblock is
+ * completely empty.
+ *
+ * Along with the refblock, a corresponding reftable entry is passed, in the
+ * reftable @reftable (which may be reallocated) at @reftable_index.
+ *
+ * @allocated should be set to true if a new cluster has been allocated.
+ */
+typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable,
+   uint64_t reftable_index, uint64_t 
*reftable_size,
+   void *refblock, bool refblock_empty,
+   bool *allocated, Error **errp);
+
+/**
+ * This operation for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs-opaque;
+int64_t offset;
+
+if (!refblock_empty  reftable_index = *reftable_size) {
+uint64_t *new_reftable;
+uint64_t new_reftable_size;
+
+new_reftable_size = ROUND_UP(reftable_index + 1,
+ s-cluster_size / sizeof(uint64_t));
+if (new_reftable_size  QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+error_setg(errp,
+   This operation would make the refcount table grow 
+   beyond the maximum size supported by QEMU, aborting);
+return -ENOTSUP;
+}
+
+new_reftable = g_try_realloc(*reftable, new_reftable_size *
+sizeof(uint64_t));
+if (!new_reftable) {
+error_setg(errp, Failed to increase reftable buffer size);
+return -ENOMEM;
+}
+
+memset(new_reftable + *reftable_size, 0,
+   (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+*reftable  = new_reftable;
+*reftable_size = new_reftable_size;
+}
+
+if (!refblock_empty  !(*reftable)[reftable_index]) {
+offset = qcow2_alloc_clusters(bs, s-cluster_size);
+if (offset  0) {
+error_setg_errno(errp, -offset, Failed to allocate refblock);
+return offset;
+}
+(*reftable)[reftable_index] = offset;
+*allocated = true;
+}
+
+return 0;
+}
+
+/**
+ * This operation for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs-opaque;
+int64_t offset;
+int ret;
+
+if (reftable_index  *reftable_size  (*reftable)[reftable_index]) {
+offset = (*reftable)[reftable_index];
+
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, s-cluster_size);
+if (ret  0) {
+error_setg_errno(errp, -ret, Overlap check failed);
+return ret;
+}
+
+ret = bdrv_pwrite(bs-file, offset, refblock, s-cluster_size);
+if (ret  0) {
+error_setg_errno(errp, -ret, Failed to write refblock);
+return ret;
+}
+} else {
+assert(refblock_empty);
+}
+
+return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced 
refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status 
callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the 
total
+ * number of walk_over_reftable() 

[Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4

2014-11-14 Thread Max Reitz
No longer refuse to open images with a different refcount entry width
than 16 bits; only reject images with a refcount width larger than 64
bits (which is prohibited by the specification).

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/qcow2.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d70e927..528d696 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Check support for various header values */
-if (header.refcount_order != 4) {
-report_unsupported(bs, errp, %d bit reference counts,
-   1  header.refcount_order);
-ret = -ENOTSUP;
+if (header.refcount_order  6) {
+error_setg(errp, Reference count entry width too large; may not 
+   exceed 64 bit);
+ret = -EINVAL;
 goto fail;
 }
 s-refcount_order = header.refcount_order;
-- 
1.9.3




[Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function

2014-11-14 Thread Max Reitz
Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0263019..469650b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2607,13 +2607,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 }
 
 if (s-refcount_order != 4) {
-/* we would have to convert the image to a refcount_order == 4 image
- * here; however, since qemu (at the time of writing this) does not
- * support anything different than 4 anyway, there is no point in doing
- * so right now; however, we should error out (if qemu supports this in
- * the future and this code has not been adapted) */
-error_report(qcow2_downgrade: Image refcount orders other than 4 are 
- currently not supported.);
+error_report(compat=0.10 requires refcount_width=16);
 return -ENOTSUP;
 }
 
@@ -2661,6 +2655,7 @@ typedef enum Qcow2AmendOperation {
  * invocation from an operation change */
 QCOW2_NO_OPERATION = 0,
 
+QCOW2_CHANGING_REFCOUNT_ORDER,
 QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
 
@@ -2736,6 +2731,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 const char *compat = NULL;
 uint64_t cluster_size = s-cluster_size;
 bool encrypt;
+int refcount_width = s-refcount_bits;
 int ret;
 QemuOptDesc *desc = opts-list-desc;
 Qcow2AmendHelperCBInfo helper_cb_info;
@@ -2785,8 +2781,16 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 lazy_refcounts = qemu_opt_get_bool(opts, lazy_refcounts,
lazy_refcounts);
 } else if (!strcmp(desc-name, refcount_width)) {
-error_report(Cannot change refcount entry width);
-return -ENOTSUP;
+refcount_width = qemu_opt_get_number(opts, refcount_width,
+ refcount_width);
+
+if (refcount_width = 0 || refcount_width  64 ||
+!is_power_of_2(refcount_width))
+{
+error_report(Refcount width must be a power of two and may 
+ not exceed 64 bits);
+return -EINVAL;
+}
 } else {
 /* if this point is reached, this probably means a new option was
  * added without having it covered here */
@@ -2800,6 +2804,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 .original_status_cb = status_cb,
 .original_cb_opaque = cb_opaque,
 .total_operations = (new_version  old_version)
+  + (s-refcount_bits != refcount_width)
 };
 
 /* Upgrade first (some features may require compat=1.1) */
@@ -2812,6 +2817,27 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+if (s-refcount_bits != refcount_width) {
+int refcount_order = ffs(refcount_width) - 1;
+Error *local_error = NULL;
+
+if (new_version  3  refcount_width != 16) {
+error_report(Different refcount widths than 16 bits require 
+ compatibility level 1.1 or above (use compat=1.1 or 
+ greater));
+return -EINVAL;
+}
+
+helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
+ret = qcow2_change_refcount_order(bs, refcount_order,
+  qcow2_amend_helper_cb,
+  helper_cb_info, local_error);
+if (ret  0) {
+qerror_report_err(local_error);
+return ret;
+}
+}
+
 if (backing_file || backing_format) {
 ret = qcow2_change_backing_file(bs, backing_file ?: bs-backing_file,
 backing_format ?: bs-backing_format);
-- 
1.9.3




[Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend

2014-11-14 Thread Max Reitz
If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 423af48..d2553b9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2726,20 +2726,13 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 desc++;
 }
 
-if (new_version != old_version) {
-if (new_version  old_version) {
-/* Upgrade */
-s-qcow_version = new_version;
-ret = qcow2_update_header(bs);
-if (ret  0) {
-s-qcow_version = old_version;
-return ret;
-}
-} else {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
-if (ret  0) {
-return ret;
-}
+/* Upgrade first (some features may require compat=1.1) */
+if (new_version  old_version) {
+s-qcow_version = new_version;
+ret = qcow2_update_header(bs);
+if (ret  0) {
+s-qcow_version = old_version;
+return ret;
 }
 }
 
@@ -2753,7 +2746,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 
 if (s-use_lazy_refcounts != lazy_refcounts) {
 if (lazy_refcounts) {
-if (s-qcow_version  3) {
+if (new_version  3) {
 error_report(Lazy refcounts only supported with compatibility 

  level 1.1 and above (use compat=1.1 or 
greater));
 return -EINVAL;
@@ -2789,6 +2782,14 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+/* Downgrade last (so unsupported features can be removed before) */
+if (new_version  old_version) {
+ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+if (ret  0) {
+return ret;
+}
+}
+
 return 0;
 }
 
-- 
1.9.3




[Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB

2014-11-14 Thread Max Reitz
Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block.c   |  4 ++--
 block/qcow2-cluster.c | 14 --
 block/qcow2.c |  9 +
 block/qcow2.h |  3 ++-
 include/block/block.h |  4 ++--
 include/block/block_int.h |  3 ++-
 qemu-img.c|  5 +++--
 7 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index c979d51..c34b188 100644
--- a/block.c
+++ b/block.c
@@ -5790,12 +5790,12 @@ void bdrv_add_before_write_notifier(BlockDriverState 
*bs,
 }
 
 int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
 if (!bs-drv-bdrv_amend_options) {
 return -ENOTSUP;
 }
-return bs-drv-bdrv_amend_options(bs, opts, status_cb);
+return bs-drv-bdrv_amend_options(bs, opts, status_cb, cb_opaque);
 }
 
 /* This function will be called by the bdrv_recurse_is_first_non_filter method
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ab43902..2daf334 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1620,7 +1620,8 @@ fail:
 static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
   int l1_size, int64_t *visited_l1_entries,
   int64_t l1_entries,
-  BlockDriverAmendStatusCB *status_cb)
+  BlockDriverAmendStatusCB *status_cb,
+  void *cb_opaque)
 {
 BDRVQcowState *s = bs-opaque;
 bool is_active_l1 = (l1_table == s-l1_table);
@@ -1646,7 +1647,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 /* unallocated */
 (*visited_l1_entries)++;
 if (status_cb) {
-status_cb(bs, *visited_l1_entries, l1_entries);
+status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
 }
 continue;
 }
@@ -1768,7 +1769,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 
 (*visited_l1_entries)++;
 if (status_cb) {
-status_cb(bs, *visited_l1_entries, l1_entries);
+status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
 }
 }
 
@@ -1797,7 +1798,8 @@ fail:
  * qcow2 version which doesn't yet support metadata zero clusters.
  */
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb,
+   void *cb_opaque)
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t *l1_table = NULL;
@@ -1814,7 +1816,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
 ret = expand_zero_clusters_in_l1(bs, s-l1_table, s-l1_size,
  visited_l1_entries, l1_entries,
- status_cb);
+ status_cb, cb_opaque);
 if (ret  0) {
 goto fail;
 }
@@ -1849,7 +1851,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
 ret = expand_zero_clusters_in_l1(bs, l1_table, s-snapshots[i].l1_size,
  visited_l1_entries, l1_entries,
- status_cb);
+ status_cb, cb_opaque);
 if (ret  0) {
 goto fail;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 657d558..d084485 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2592,7 +2592,7 @@ static int qcow2_load_vmstate(BlockDriverState *bs, 
uint8_t *buf,
  * have to be removed.
  */
 static int qcow2_downgrade(BlockDriverState *bs, int target_version,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb, void 
*cb_opaque)
 {
 BDRVQcowState *s = bs-opaque;
 int current_version = s-qcow_version;
@@ -2641,7 +2641,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 /* clearing autoclear features is trivial */
 s-autoclear_features = 0;
 
-ret = qcow2_expand_zero_clusters(bs, status_cb);
+ret = qcow2_expand_zero_clusters(bs, status_cb, cb_opaque);
 if (ret  0) {
 return ret;
 }
@@ -2656,7 +2656,8 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 }
 
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
-   BlockDriverAmendStatusCB *status_cb)
+   BlockDriverAmendStatusCB *status_cb,
+   void *cb_opaque)
 {
 

Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure

2014-11-14 Thread Markus Armbruster
Eric Blake ebl...@redhat.com writes:

 On 11/13/2014 08:29 AM, Eric Blake wrote:

 lseek() with SEEK_DATA starting in a hole when there is no data until
 EOF is actually the part that isn't documented in the man page, but
 ENXIO is what I'm seeing here on RHEL 7.

 Here's the (proposed) POSIX wording:

 http://austingroupbugs.net/view.php?id=415

 And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
 so maybe we should special case it.

 
 Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
 says that SEEK_HOLE in a trailing hole is allowed (but not required) to
 seek to EOF instead of reporting the offset requested.  I have no clue
 why this was done, but it is VERY annoying - it means that if you
 provide an offset within a tail hole of a file, you cannot reliably tell
 if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
  For applications that are reading a file sequentially but skipping over
 holes, this behavior is fine (it short-circuits the hole/data search
 points and might shave an iteration off a lop).  But for OUR purposes,
 where we are merely trying to ascertain whether we are in a hole, we
 have an inaccurate response - since SEEK_HOLE does NOT return the offset
 we passed in, we are prone to treat the offset as belonging to data,
 which is a pessimization (you never get wrong results by treating a hole
 as data and reading it, but it is definitely slower).
 
 I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
 SEEK_DATA, if you want to accurately determine whether an offset happens
 to live within a trailing hole.

 Here's a table of possible situations, based solely on POSIX wording
 (and not on actual tests on Solaris or Linux, although it shouldn't be
 too hard to confirm behavior):

 0-length file:
 lseek(fd, 0, SEEK_HOLE) = -1 ENXIO
 lseek(fd, 0, SEEK_DATA) = -1 ENXIO
 conclusion: 0 is at EOF

Isn't this a special case of the next one?

 file of any size:
 lseek(fd, size_or_larger, SEEK_HOLE) = -1 ENXIO
 lseek(fd, size_or_larger, SEEK_DATA) = -1 ENXIO
 conclusion: size_or_larger is at or beyond EOF

 file where offset is in a hole, but data appears later:
 lseek(fd, offset, SEEK_HOLE) = offset
 lseek(fd, offset, SEEK_DATA) = end_of_hole
 conclusion: offset through end_of_hole is in a hole

 file where offset is data, whether or not a hole appears later:
 lseek(fd, offset, SEEK_HOLE) = end_of_data
 lseek(fd, offset, SEEK_DATA) = offset
 conclusion: offset through end_of_data is in data

 file where offset is in a tail hole, option 1:
 lseek(fd, offset, SEEK_HOLE) = offset
 lseek(fd, offset, SEEK_DATA) = -1 ENXIO
 conclusion: offset through EOF is in hole, but another seek needed to
 learn EOF

 file where offset is in a tail hole, option 2:
 lseek(fd, offset, SEEK_HOLE) = EOF
 lseek(fd, offset, SEEK_DATA) = -1 ENXIO
 conclusion: offset through EOF is in hole, no additional seek needed

 The two calls are both necessary, in order to learn which extant type
 offset belongs to, and to tell where that extant ends; and the behaviors
 are distinguishable (if both lseek() succeed, we have both numbers we
 want; if both fail with ENXIO, we know the offset is at or beyond EOF;
 and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
 hole); and we can tell at runtime what to do about a trailing hole (if
 the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
 find EOF; if the return value is larger than offset, we have EOF for
 free).  You can optimize by calling SEEK_HOLE first (if it fails with
 ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
 insufficient to give you all the information you need.

Not discussed: how to handle failures other than ENXIO.

The appended code still avoids a second seek in one case.  Useful mostly
because it saves us from handling a second seek's contradictory
information.


/*
 * Find allocation range in @bs around offset @start.
 * May change underlying file descriptor's file offset.
 * If @start is not in a hole, store @start in @data, and the
 * beginning of the next hole in @hole, and return 0.
 * If @start is in a non-trailing hole, store @start in @hole and the
 * beginning of the next non-hole in @data, and return 0.
 * If @start is in a trailing hole or beyond EOF, return -ENXIO.
 * If we can't find out, return a negative errno other than -ENXIO.
 */
static int find_allocation(BlockDriverState *bs, off_t start,
   off_t *data, off_t *hole)
{
#if defined SEEK_HOLE  defined SEEK_DATA
BDRVRawState *s = bs-opaque;
off_t offs;

/*
 * SEEK_DATA cases:
 * D1. offs == start: start is in data
 * D2. offs  start: start is in a hole, next data at offs
 * D3. offs  0, errno = ENXIO: either start is in a trailing hole
 *  or start is beyond EOF
 * If the latter happens, the file has been truncated behind
 * our back since we opened it.  Best 

[Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check

2014-11-14 Thread Max Reitz
If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block/qcow2-refcount.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b59a028..e9647ce 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1368,6 +1368,9 @@ static int inc_refcounts(BlockDriverState *bs,
 if (refcount == s-refcount_max) {
 fprintf(stderr, ERROR: overflow cluster offset=0x% PRIx64
 \n, cluster_offset);
+fprintf(stderr, Use qemu-img amend to increase the refcount entry 

+width or qemu-img convert to create a clean copy if the 
+image cannot be opened for writing\n);
 res-corruptions++;
 continue;
 }
-- 
1.9.3




[Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths

2014-11-14 Thread Max Reitz
Add a test for conversion between different refcount widths and errors
specific to certain widths (i.e. snapshots with refcount_width=1).

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/112 | 252 +
 tests/qemu-iotests/112.out | 131 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 384 insertions(+)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

diff --git a/tests/qemu-iotests/112 b/tests/qemu-iotests/112
new file mode 100755
index 000..e824d8a
--- /dev/null
+++ b/tests/qemu-iotests/112
@@ -0,0 +1,252 @@
+#!/bin/bash
+#
+# Test cases for different refcount_widths
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo QA output created by $seq
+
+here=$PWD
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qcow2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+# This test will set refcount_width on its own which would conflict with the
+# manual setting; compat will be overridden as well
+_unsupported_imgopts refcount_width 'compat=0.10'
+
+function print_refcount_width()
+{
+$QEMU_IMG info $TEST_IMG | sed -n '/refcount width:/ s/^ *//p'
+}
+
+echo
+echo '=== refcount_width limits ==='
+echo
+
+# Must be positive (non-zero)
+IMGOPTS=$IMGOPTS,refcount_width=0 _make_test_img 64M
+# Must be positive (non-negative)
+IMGOPTS=$IMGOPTS,refcount_width=-1 _make_test_img 64M
+# May not exceed 64
+IMGOPTS=$IMGOPTS,refcount_width=128 _make_test_img 64M
+# Must be a power of two
+IMGOPTS=$IMGOPTS,refcount_width=42 _make_test_img 64M
+
+# 1 is the minimum
+IMGOPTS=$IMGOPTS,refcount_width=1 _make_test_img 64M
+print_refcount_width
+
+# 64 is the maximum
+IMGOPTS=$IMGOPTS,refcount_width=64 _make_test_img 64M
+print_refcount_width
+
+# 16 is the default
+_make_test_img 64M
+print_refcount_width
+
+echo
+echo '=== refcount_width and compat=0.10 ==='
+echo
+
+# Should work
+IMGOPTS=$IMGOPTS,compat=0.10,refcount_width=16 _make_test_img 64M
+print_refcount_width
+
+# Should not work
+IMGOPTS=$IMGOPTS,compat=0.10,refcount_width=1 _make_test_img 64M
+IMGOPTS=$IMGOPTS,compat=0.10,refcount_width=64 _make_test_img 64M
+
+
+echo
+echo '=== Snapshot limit on refcount_width=1 ==='
+echo
+
+IMGOPTS=$IMGOPTS,refcount_width=1 _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' $TEST_IMG | _filter_qemu_io
+
+# Should fail for now; in the future, this might be supported by automatically
+# copying all clusters with overflowing refcount
+$QEMU_IMG snapshot -c foo $TEST_IMG
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Snapshot limit on refcount_width=2 ==='
+echo
+
+IMGOPTS=$IMGOPTS,refcount_width=2 _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' $TEST_IMG | _filter_qemu_io
+
+# Should succeed
+$QEMU_IMG snapshot -c foo $TEST_IMG
+$QEMU_IMG snapshot -c bar $TEST_IMG
+# Should fail (4th reference)
+$QEMU_IMG snapshot -c baz $TEST_IMG
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Compressed clusters with refcount_width=1 ==='
+echo
+
+IMGOPTS=$IMGOPTS,refcount_width=1 _make_test_img 64M
+print_refcount_width
+
+# Both should fit into a single host cluster; instead of failing to increase 
the
+# refcount of that cluster, qemu should just allocate a new cluster and make
+# this operation succeed
+$QEMU_IO -c 'write -P 0 -c  0  64k' \
+ -c 'write -P 1 -c 64k 64k' \
+ $TEST_IMG | _filter_qemu_io
+
+_check_test_img
+
+echo
+echo '=== Amend from refcount_width=16 to refcount_width=1 ==='
+echo
+
+_make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 16M 32M' $TEST_IMG | _filter_qemu_io
+$QEMU_IMG amend -o refcount_width=1 $TEST_IMG
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend from refcount_width=1 to refcount_width=64 ==='
+echo
+
+$QEMU_IMG amend -o refcount_width=64 $TEST_IMG
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend to compat=0.10 ==='
+echo
+
+# Should not work because refcount_width needs to be 16 for 

Re: [Qemu-devel] [PULL for-2.2 0/2] Xen tree 2014-11-14

2014-11-14 Thread Peter Maydell
On 14 November 2014 11:29, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 The following changes since commit c52e67924fbdadfa00668248f5c075542943c54c:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-11-13 15:44:16 +)

 are available in the git repository at:


   git://xenbits.xen.org/people/sstabellini/qemu-dm.git xen-2014-11-14

 for you to fetch changes up to 2f01dfacb56bc7a0d4639adc9dff9aae131e6216:

   xen_disk: fix unmapping of persistent grants (2014-11-14 11:12:38 +)

 
 Igor Mammedov (1):
   pc: piix4_pm: init legacy PCI hotplug when running on Xen

 Roger Pau Monne (1):
   xen_disk: fix unmapping of persistent grants

  hw/acpi/piix4.c |4 +++
  hw/block/xen_disk.c |   72 
 ++-
  hw/i386/pc_piix.c   |   11 
  3 files changed, 70 insertions(+), 17 deletions(-)

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH v2 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-14 Thread Eric Blake
On 11/14/2014 02:02 AM, Max Reitz wrote:
 _filter_qmp should be able to correctly filter out the QMP version
 object for pretty JSON output.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  tests/qemu-iotests/common.filter | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/tests/qemu-iotests/common.filter 
 b/tests/qemu-iotests/common.filter
 index 3acdb30..63d15b3 100644
 --- a/tests/qemu-iotests/common.filter
 +++ b/tests/qemu-iotests/common.filter
 @@ -165,9 +165,12 @@ _filter_qemu()
  # replace problematic QMP output like timestamps
  _filter_qmp()
  {
 +discard=0
 +
  _filter_win32 | \
  sed -e 's#\(\(micro\)\?seconds: \)[0-9]\+#\1 TIMESTAMP#g' \
 --e 's#^{QMP:.*}$#QMP_VERSION#'
 +-e 's#^{QMP:.*}$#QMP_VERSION#' \
 +-e '/^QMP: {\s*$/, /^}\s*$/ c\QMP_VERSION'

Technically, portable sed requires a newline between the initial 'c\'
and the first line of replacement.  Use of multiple -e is the easiest
way to provide newlines (sed concatenates separate -e strings with an
implicit newline).  Even though GNU sed doesn't barf on your variant,
I'd still prefer:

-e '/^QMP: {/, /^}/ c\' \
-e 'QMP_VERSION'

written as two lines, to ensure we don't upset BSD sed.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 4/4] target-tricore: Add instructions of RCR opcode format

2014-11-14 Thread Richard Henderson
On 11/13/2014 06:12 PM, Bastian Koppelmann wrote:
 +tcg_gen_ext_i32_i64(t3, r3);
 +tcg_gen_concat_i32_i64(t2, r2_low, r2_high);
 +/* extend the sign for r2 to high 64 bits */
 +tcg_gen_sari_i64(t4, t2, 63);
 +tcg_gen_ext_i32_i64(t1, r1);
 +
 +tcg_gen_muls2_i64(t1, t3, t1, t3);
 +tcg_gen_add2_i64(t1, t3, t2, t4, t1, t3);
 +

I don't believe that you need 128 bit arithemetic for multiply-accumulate,
either here or elsewhere (e.g. msub).

Looking at unsigned, the maximum result of the multiply is 2*(2^n-1), or 2^(2n)
- 2^(n+1).  Which means that the accumulate with a 2^n-1 value cannot overflow
a double-word intermediate result.


r~



Re: [Qemu-devel] [PATCH v2 0/4] Add TriCore RCPW, RCRR, RCRW, RLC and RCR instructions

2014-11-14 Thread Richard Henderson
On 11/13/2014 06:12 PM, Bastian Koppelmann wrote:
 Hi,
 
 this patch depends on the previous TriCore patches 
 (https://patchwork.ozlabs.org/patch/405459/) and will hopefully end up in 2.3 
 QEMU.
 Other than adding the RCPW, RCRR, RCRW, RLC and RCR instructions, it cleans 
 up how ISA versions in the feature bitmask are handled,
 to simplify the checks, when instructions are available.
 
 Thanks,
 Bastian
 
 v1 - v2:
 - Fixed obvious style errors, given by checkpatch.
 
 Bastian Koppelmann (4):
   target-tricore: Make TRICORE_FEATURES implying others.
   target-tricore: Add instructions of RCPW, RCRR and RCRW opcode format
   target-tricore: Add instructions of RLC opcode format
   target-tricore: Add instructions of RCR opcode format

Patches 1-3 get

Reviewed-by: Richard Henderson r...@twiddle.net

Patch 4 looks correct, but appears to work too hard.


r~




Re: [Qemu-devel] [PATCH 2.3 5/8] tcg: Put opcodes in a linked list

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

+static void tcg_gen_op_begin(TCGContext *ctx, TCGOpcode opc, int args)
+{
+int oi = ctx-gen_next_op_idx;
+int ni = oi + 1;
+int pi = oi - 1;
+
+tcg_debug_assert(oi  OPC_BUF_SIZE);
+ctx-gen_last_op_idx = oi;
+ctx-gen_next_op_idx = ni;
+
+ctx-gen_op_buf[oi] = (TCGOp){
+.opc = opc,
+.args = args,
+.prev = pi,
+.next = ni
+};
+}
+
The name of this function says begin while used at the end of each 
tcg_gen_op. How about tcg_gen_op_list_add?

@@ -508,14 +521,10 @@ struct TCGContext {
  int goto_tb_issue_mask;
  #endif
  
-uint16_t gen_opc_buf[OPC_BUF_SIZE];

-TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
-
-uint16_t *gen_opc_ptr;
-TCGArg *gen_opparam_ptr;
You forgot to remove gen_opc_ptr in the dummy function 
tcg_liveness_analysis, in case USE_LIVENESS_ANALYSIS is not defined.

-target_ulong gen_opc_pc[OPC_BUF_SIZE];
-uint16_t gen_opc_icount[OPC_BUF_SIZE];
-uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
+int gen_first_op_idx;
+int gen_last_op_idx;
+int gen_next_op_idx;
+int gen_next_parm_idx;


Other than that it looks good to me.

Cheers,
Bastian



Re: [Qemu-devel] [PATCH 2.3 6/8] tcg: Remove opcodes instead of noping them out

2014-11-14 Thread Bastian Koppelmann

On 11/11/2014 04:24 PM, Richard Henderson wrote:

With the linked list scheme we need not leave nops in the stream
that we need to process later.

Signed-off-by: Richard Henderson r...@twiddle.net
---
  tcg/optimize.c | 14 +++---
  tcg/tcg.c  | 28 
  tcg/tcg.h  |  1 +
  3 files changed, 32 insertions(+), 11 deletions(-)



Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de



Re: [Qemu-devel] [PATCH 0/4] qom: Replace automatic arrayification

2014-11-14 Thread Markus Armbruster
Paolo Bonzini pbonz...@redhat.com writes:

 On 13/11/2014 15:34, Markus Armbruster wrote:
 -... bla ...
 +char *name = object_gen_new_property_name(obj, bla);
 +... name ...
 +g_free(name);
 
 Not quite as mechanical, in particular finding the obj to pass to
 object_gen_new_property_name().
 
 Correct?

 Yes (except that the obj is just qdev_get_machine()).

Ah, of course.

Let me summarize.  Automatic arrayification currently has two kinds of
users: qdev properties and memory regions.

qdev properties can easily use object_gen_new_property_name() instead,
as shown in PATCH 3/4.

So can memory regions [PATCH 2/4], but only because we clumsily arrayify
all of them, by appending [*] in memory_region_init().

Some day, we may want to arrayify only the ones that actually need it,
by appending [*] right to their names instead of appending it behind
the scenes to all memory region names.

This would involve touching the untouchables: non-qdevified devices.
But the changes should be limited to string literals.

With my series, you'd have to graft in object_gen_new_property_name()
and the matching g_free() instead.  You called that just too ugly.

Here's how to avoid it, and confine the ugliness to
memory_region_init():

Change memory_region_init() from

char *escaped_name = memory_region_escape_name(name);
char *propname = object_gen_new_property_name(owner, escaped_name);
object_property_add_child(owner, propname, OBJECT(mr), error_abort);
object_unref(OBJECT(mr));
g_free(propname);
g_free(escaped_name);

to something like

if (name ends with [*]) {
stem = g_strndup(name, strlen(name) -3);
escaped = memory_region_escape_name(stem);
propname = object_gen_new_property_name(owner, escaped_name);
g_free(escaped);
g_free(stem);
else
propname = memory_region_escape_name(name);
object_property_add_child(owner, propname, OBJECT(mr), error_abort);
object_unref(OBJECT(mr));
g_free(propname);

and append [*] to the names of regions that need arrayification.

That way, the bad magic is limited to just memory_region_init() rather
than all of QOM, and it's needed only as long as the problem with
non-qdevified users remains.

 [*] What we want to do is drag them along until the Second Coming, when
 Jesus will set everything right, which naturally includes qdevifying all
 the old crap nobody wants to touch.

 Or maybe not.  Look at the bottom:
 https://upload.wikimedia.org/wikipedia/commons/a/a5/Michelangelo%2C_Giudizio_Universale_02.jpg

Oww!  Well played, sir!



Re: [Qemu-devel] [PATCH 2.3 7/8] tcg: Implement insert_op_before

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.

Signed-off-by: Richard Henderson r...@twiddle.net
---
  tcg/optimize.c | 57 +++--
  tcg/tcg-op.c   | 16 
  2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 973fbb4..067917c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -67,6 +67,37 @@ static void reset_temp(TCGArg temp)
  temps[temp].mask = -1;
  }
  
+static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,

+TCGOpcode opc, int nargs)
+{
+int oi = s-gen_next_op_idx;
+int pi = s-gen_next_parm_idx;
+int prev = old_op-prev;
+int next = old_op - s-gen_op_buf;
+TCGOp *new_op;
+
+tcg_debug_assert(oi  OPC_BUF_SIZE);
+tcg_debug_assert(pi + nargs = OPPARAM_BUF_SIZE);

I thinks it is better to assure these assertion always hold, e.g.

if (oi  OPC_BUF_SIZE || pi + nargs = OPPARAM_BUF_SIZE) {
return NULL;
}
...
TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
if (op2) {
*args2 = s-gen_opparam_buf[op2-args];
}

Or how do we know they always hold?


diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index fbd82bd..8de259a 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -571,8 +571,6 @@ void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
  {
  if (TCG_TARGET_HAS_add2_i32) {
  tcg_gen_op6_i32(INDEX_op_add2_i32, rl, rh, al, ah, bl, bh);
-/* Allow the optimizer room to replace add2 with two moves.  */
-tcg_gen_op0(tcg_ctx, INDEX_op_nop);


All references on tcg_gen_op0 are gone, so lets remove it.

Cheers,
Bastian



Re: [Qemu-devel] [PATCH 0/4] qom: Replace automatic arrayification

2014-11-14 Thread Paolo Bonzini


On 14/11/2014 15:21, Markus Armbruster wrote:
 qdev properties can easily use object_gen_new_property_name() instead,
 as shown in PATCH 3/4.
 
 So can memory regions [PATCH 2/4], but only because we clumsily arrayify
 all of them, by appending [*] in memory_region_init().
 
 Some day, we may want to arrayify only the ones that actually need it,
 by appending [*] right to their names instead of appending it behind
 the scenes to all memory region names.
 
 This would involve touching the untouchables: non-qdevified devices.
 But the changes should be limited to string literals.
 
 With my series, you'd have to graft in object_gen_new_property_name()
 and the matching g_free() instead.  You called that just too ugly.
 
 Here's how to avoid it, and confine the ugliness to
 memory_region_init():
 
 Change memory_region_init() from
 
 char *escaped_name = memory_region_escape_name(name);
 char *propname = object_gen_new_property_name(owner, escaped_name);
 object_property_add_child(owner, propname, OBJECT(mr), error_abort);
 object_unref(OBJECT(mr));
 g_free(propname);
 g_free(escaped_name);
 
 to something like
 
 if (name ends with [*]) {
 stem = g_strndup(name, strlen(name) -3);
 escaped = memory_region_escape_name(stem);
 propname = object_gen_new_property_name(owner, escaped_name);
 g_free(escaped);
 g_free(stem);
 else
 propname = memory_region_escape_name(name);
 object_property_add_child(owner, propname, OBJECT(mr), error_abort);
 object_unref(OBJECT(mr));
 g_free(propname);
 
 and append [*] to the names of regions that need arrayification.
 
 That way, the bad magic is limited to just memory_region_init() rather
 than all of QOM, and it's needed only as long as the problem with
 non-qdevified users remains.

Yes, I agree.

Whether it's an improvement to move back the special-casing of [*] to
memory_region_init(), that's of course in the eye of the beholder. ;)
We do know that it's very unlikely to be removed from memory regions.

Paolo



Re: [Qemu-devel] [PATCH 2.3 8/8] tcg: Remove unused opcodes

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

diff --git a/tci.c b/tci.c
index 4711ee4..28292b3 100644
--- a/tci.c
+++ b/tci.c
@@ -506,19 +506,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
  tb_ptr += 2;
  
  switch (opc) {

-case INDEX_op_end:
-case INDEX_op_nop:
-break;
-case INDEX_op_nop1:
-case INDEX_op_nop2:
-case INDEX_op_nop3:
-case INDEX_op_nopn:
-case INDEX_op_discard:
-TODO();
-break;
-case INDEX_op_set_label:
-TODO();
-break;
Why do you remove the TODO notice for INDEX_op_discard/set_label? Is TCI 
no longer maintained?

  case INDEX_op_call:
  t0 = tci_read_ri(tb_ptr);
  #if TCG_TARGET_REG_BITS == 32

Other than that,

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de

Cheers,
Bastian



Re: [Qemu-devel] [PATCH 2.3 7/8] tcg: Implement insert_op_before

2014-11-14 Thread Richard Henderson
On 11/14/2014 04:25 PM, Bastian Koppelmann wrote:
 
 On 11/11/2014 04:24 PM, Richard Henderson wrote:
 Rather reserving space in the op stream for optimization,
 let the optimizer add ops as necessary.

 Signed-off-by: Richard Henderson r...@twiddle.net
 ---
   tcg/optimize.c | 57 
 +++--
   tcg/tcg-op.c   | 16 
   2 files changed, 35 insertions(+), 38 deletions(-)

 diff --git a/tcg/optimize.c b/tcg/optimize.c
 index 973fbb4..067917c 100644
 --- a/tcg/optimize.c
 +++ b/tcg/optimize.c
 @@ -67,6 +67,37 @@ static void reset_temp(TCGArg temp)
   temps[temp].mask = -1;
   }
   +static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
 +TCGOpcode opc, int nargs)
 +{
 +int oi = s-gen_next_op_idx;
 +int pi = s-gen_next_parm_idx;
 +int prev = old_op-prev;
 +int next = old_op - s-gen_op_buf;
 +TCGOp *new_op;
 +
 +tcg_debug_assert(oi  OPC_BUF_SIZE);
 +tcg_debug_assert(pi + nargs = OPPARAM_BUF_SIZE);
 I thinks it is better to assure these assertion always hold, e.g.
 
 if (oi  OPC_BUF_SIZE || pi + nargs = OPPARAM_BUF_SIZE) {
 return NULL;
 }
 ...
 TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
 if (op2) {
 *args2 = s-gen_opparam_buf[op2-args];
 }
 
 Or how do we know they always hold?

For the same reason we don't bother checking during initial generation of the
opcodes.  We simply assume there's enough space.  Not a good answer but...

 All references on tcg_gen_op0 are gone, so lets remove it.

Fair enough.


r~




Re: [Qemu-devel] [PATCH 2.3 8/8] tcg: Remove unused opcodes

2014-11-14 Thread Richard Henderson
On 11/14/2014 04:31 PM, Bastian Koppelmann wrote:
 
 On 11/11/2014 04:24 PM, Richard Henderson wrote:
 diff --git a/tci.c b/tci.c
 index 4711ee4..28292b3 100644
 --- a/tci.c
 +++ b/tci.c
 @@ -506,19 +506,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t
 *tb_ptr)
   tb_ptr += 2;
 switch (opc) {
 -case INDEX_op_end:
 -case INDEX_op_nop:
 -break;
 -case INDEX_op_nop1:
 -case INDEX_op_nop2:
 -case INDEX_op_nop3:
 -case INDEX_op_nopn:
 -case INDEX_op_discard:
 -TODO();
 -break;
 -case INDEX_op_set_label:
 -TODO();
 -break;
 Why do you remove the TODO notice for INDEX_op_discard/set_label? Is TCI no
 longer maintained?

Barely.  But these opcodes never reach this far anyway, so the todo is bogus.


r~



Re: [Qemu-devel] [PATCH 00/12] qcow2: Add new overlap check functions

2014-11-14 Thread Max Reitz

On 2014-11-03 at 18:04, Max Reitz wrote:

As has been requested, this series adds new overlap check functions to
the qcow2 code. My local branch is called qcow2-improved-overlap-v1,
but I am not so sure whether it is actually an improvement; that is left
for you to decide, dear reviewers.

See patch 1 for an explanation of why this series exists and what it
does. Patch 1 is basically the core of this series, the rest just
employs the functions introduced there.

I have yet to do benchmarks to test whether this series actually
improves things, but judging from the iotests it at least does not slow
things down (which it did at one time during development, particularily
test 044 is good for testing this, so this actually has some
significance to it).

In a later patch, we may want to change the meaning of the constant
overlap checking option to mean the same as cached, which is
everything except for inactive L2 tables. This series does make
checking for overlaps with inactive L2 tables at runtime just as cheap
as everything else (constant time plus caching), but using these checks
means qemu has to read all the snapshot L1 tables when opening a qcow2
file. This does not take long, of course, but it does result in a bit of
overhead so I did not want to enable it by default.

I think just enabling all overlap checks by default after this series
should be fine and useful, though.


Rejoice, for I return with benchmarks; the kind of benchmarks which 
always show the result I want them to show, which I should be known for 
by now.


First, I basically tried the setup from last time only this time I 
didn't care about the in-VM I/O performance but just use perf record -g 
to record the amount of cycles used by the overlap check. This worked 
somehow, but bonnie++ (which I used as the in-VM benchmark tool) does 
some different tests, both reading and writing and writing with 
different sizes, so the result is not that bad there.


So I did what's absolutely worst for the overlap checks: dd if=/dev/zero 
of=/dev/vda bs=65536 oflag=direct (even worse would be cluster_size=512 
and bs=512, but I wanted to get the test over with today, so I just went 
for 64k). The image was a 1G qcow2 image in tmpfs with ten snapshots 
(each having 128 MB of data, all pointing to the same data clusters 
which are different from the active clusters) just because having 
internal snapshots makes the overlap checks even more CPU intensive, of 
course.


I ran dd 42 times in a row (for i in $(seq 1 42); do ...; done) and 
started up perf record just after I hit enter and canceled it just 
before the last dd exited.


I don't remember the exact numbers, but for the currently existing 
overlap check function (using the default of overlap-check=cached), it 
used about 13.5 % in the first run, 10.5 % in the second and (this I do 
know) 12.41 % in the third run.


With these patches applied, I had 0.08 % in the first run with 
overlap-check=cached and 0.09 % in the second run with overlap-check=all.


(all percentages are referring to the fraction of cycles used by 
qcow2_check_metadata_overlap())


So this series apparently is actually worth it. I could yet do another 
benchmark where I test what happens if the cache is too small, which 
means that the range list representation has to be converted to the 
bitmap all the time and vice versa. The biggest problem with that would 
be to somehow fit the image into tmpfs (if it's not in tmpfs, I don't 
want to do CPU benchmarks)...


Max



Re: [Qemu-devel] [PATCH 0/4] qom: Replace automatic arrayification

2014-11-14 Thread Markus Armbruster
Paolo Bonzini pbonz...@redhat.com writes:

 On 14/11/2014 15:21, Markus Armbruster wrote:
 qdev properties can easily use object_gen_new_property_name() instead,
 as shown in PATCH 3/4.
 
 So can memory regions [PATCH 2/4], but only because we clumsily arrayify
 all of them, by appending [*] in memory_region_init().
 
 Some day, we may want to arrayify only the ones that actually need it,
 by appending [*] right to their names instead of appending it behind
 the scenes to all memory region names.
 
 This would involve touching the untouchables: non-qdevified devices.
 But the changes should be limited to string literals.
 
 With my series, you'd have to graft in object_gen_new_property_name()
 and the matching g_free() instead.  You called that just too ugly.
 
 Here's how to avoid it, and confine the ugliness to
 memory_region_init():
 
 Change memory_region_init() from
 
 char *escaped_name = memory_region_escape_name(name);
 char *propname = object_gen_new_property_name(owner, escaped_name);
 object_property_add_child(owner, propname, OBJECT(mr), error_abort);
 object_unref(OBJECT(mr));
 g_free(propname);
 g_free(escaped_name);
 
 to something like
 
 if (name ends with [*]) {
 stem = g_strndup(name, strlen(name) -3);
 escaped = memory_region_escape_name(stem);
 propname = object_gen_new_property_name(owner, escaped_name);
 g_free(escaped);
 g_free(stem);
 else
 propname = memory_region_escape_name(name);
 object_property_add_child(owner, propname, OBJECT(mr), error_abort);
 object_unref(OBJECT(mr));
 g_free(propname);
 
 and append [*] to the names of regions that need arrayification.
 
 That way, the bad magic is limited to just memory_region_init() rather
 than all of QOM, and it's needed only as long as the problem with
 non-qdevified users remains.

 Yes, I agree.

 Whether it's an improvement to move back the special-casing of [*] to
 memory_region_init(), that's of course in the eye of the beholder. ;)
 We do know that it's very unlikely to be removed from memory regions.

Like Paris, we get to choose from three.  Unlike Paris, we get to choose
from three uglies:

1. Ugly magic in QOM properties (status quo)

2. No ugly magic now, maybe ugly magic in memory region names later

3. No ugly magic, but may have to touch the untouchables later, which is
going to be ugly

I'd go for 2.  

For 1, we do nothing, for 2 and 3, we merge my patches now, and worry
about 2 vs. 3 later.



[Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-14 Thread Paolo Bonzini
Yes, it is valid. libvirt is disabling the automagic UHCI controller,
which makes sense since he is using XHCI.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1392504

Title:
  USB Passthrough is not working anymore

Status in QEMU:
  New

Bug description:
  After upgrading from Ubuntu 14.04 to Ubuntu 14.10 USB passthrough in
  QEMU (version is now 2.1.0 - Debian2.1+dfsg-4ubuntu6.1) is not working
  any more. Already tried to remove and attach the USB devices. I use 1
  USB2 HDD  +  1 USB3 HDD to a virtual linux machine; 1 USB2 HDD to a
  virual FNAS machine and a USB 2camera to virtual Win7 machine. All
  these devices are not working any more.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1392504/+subscriptions



Re: [Qemu-devel] State of ARM FIQ in Qemu

2014-11-14 Thread Greg Bellows
On 14 November 2014 09:34, Tim Sander t...@krieglstein.org wrote:

0xbfffe000? You where talking about the fact that the security
extensions
where not implemented. I was not aware that the different vbar's
 where
already part of the security stuff?
  
   MVBAR is part of the Security extensions. HVBAR is part of the
   Virtualization extensions. In mainline QEMU we implement neither
   of those extensions, and so don't implement the associated
   registers. (Strictly speaking, VBAR is also only in the
   Security extensions, but we provide it as a workaround for
   guests that assume our CPUs should implement it.)
 
  Peter beat me to it.  None of the VBAR registers should matter in your
 case
  which coincides with the use of hivecs.
 While writing this mail i found out that the integrated debugger is causing
 harm in combination with the fiq. So everything below the braces seems to
 be related to the this problem. But i still wanted to keep the data points
 for
 reference:

 {
 Ok, so qemu only implements the SCTLR.V bit to control the memory address
 of
 the interrupt vector. So its either 0 or 0x. That is fine with me.
 Currently i have the problem that a call to set_fiq_handler does not place
 the
 binary stuff loaded at the address where qemu is jumping to which is
 presumably
 0x1240. I have checked that SCTLR.V =1 under linux which is fine.

 The background info to set_fiq_handler from my understanding is that it
 copies
 the given stuff directly at the address where the FIQ vector is located.
 This
 works as the FIQ is the last entry and thus there is some memory space for
 a
 short interrupt handler. I checked the memory when entering the FIQ with
 the
 integrated gdb:
 (gdb) info reg
 r0 0x0  0
 r1 0x0  0
 r2 0x1  1
 r3 0x76eb34c8   1995125960
 r4 0x76eb34c8   1995125960
 r5 0x76f633b8   1995846584
 r6 0x2a 42
 r7 0x76f4c28c   1995752076
 r8 0xf8200100   -132120320
 r9 0xe004   -536608768
 r100x60004059   1610629209
 r110x0  0
 r120x0  0
 sp 0x908be000   0x908be000
 lr 0x76dfc108   1994375432
 pc 0x1240   0x1240 firq_fiq_handler
 cpsr   0x600f01d1   1611596241
 (gdb) x 0x1240
 0xe599b00c

 But my firq_fiq_handler starts with 0xee12af10? I know that this works on
 real
 hardware so i suspect that this an error within qemu? Or at least that
 there
 is something amiss in the way the memory is initialized or handled.

 Is there a way to instrument the memory below the vector table to get debug
 logs if the memory is modified?
 }

  It may be worthwhile to put a kernel breakpoint in handle_fiq_as_nmi()
 just
  to see where it goes.  If CONFIG_ARM_GIC is enabled it should take you to
  your handler I suspect.  Plus, if you get there then we have likely
 proven
  that QEMU is getting the kernel to the right place.  I set a BP in this
  routine on my A9 run and appear to be hitting it correctly.
 So you are talking about the linux kernel, right? CONFIG_ARM_GIC=y check
 but
 i can't find handle_fiq_as_nmi? Even a fuzzier rgrep nmi * |grep fiq
 does not
 find anything.


Maybe we are working off different versions of the kernel sources.  I'm
using a kernel variant of v3.18-rc1.  I took a look at my 3.15 kernel and
it does not have the routine, so perhaps yours is an earlier version as
well.

I don't spend much time working in or tracking the Linux kernel, so I am
not sure when the difference was introduced.  I just found it to be a
convenient function to set a BP for early FIQ debugging, you may have
something different.

Interestingly, as I researched the Linux FIQ support I found this mail
thread...

http://www.spinics.net/lists/arm-kernel/msg14960.html

As I don't have access to your code, I could not verify that the SVC SPSR
was being preserved, but it may be worth you looking into it as I can see
this potentially resulting in all kinds of random behavior.  More
interestingly, this comment and code appears to have been changed in later
versions of the FIQ code, so perhaps this has been fixed or improved (My
3.18 kernel does not have the comment).


 Concerning the fact that qemu is jumping to the right address:
 To i have put a breakpoint to 0x001c which is the fiq base vector
 address.
 There is an instruction 0xea000480 which seems to be a pc relative branch
 to
 0x1224 which then lands at 0x1240.

 But the internal debugger gives me some concerns. If i do at the gdb
 command
 line:
 hb *0x001c
 hb *0x1240
 The debugger only stops at the first breakpoint. If i leave the first
 breakpoint
 away the debugger stops at 0x1240. As i know that at 0x01c it
 should jump
 right to 0x1240 i would expect that both breakpoints are triggered.

 Then if i reach the breakpoint at 

Re: [Qemu-devel] [PATCH 2.3 1/8] tcg: Move some opcode generation functions out of line

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

Some of these functions are really quite large.  We have a number of
things that ought to be circularly dependent, but we duplicated code
to break that chain for the inlines.

This saved 25% of the code size of one of the translators I examined.

Signed-off-by: Richard Henderson r...@twiddle.net
---
  Makefile.target |2 +-
  tcg/tcg-op.c| 1978 +++
  tcg/tcg-op.h| 2488 ---
  tcg/tcg.c   |  137 ---
  tcg/tcg.h   |3 -
  5 files changed, 2339 insertions(+), 2269 deletions(-)
  create mode 100644 tcg/tcg-op.c



Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de

Cheers,
Bastian



Re: [Qemu-devel] [PATCH 2.3 2/8] tcg: Reduce ifdefs in tcg-op.c

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

Almost completely eliminates the ifdefs in this file, improving
confidence in the lesser used 32-bit builds.

Signed-off-by: Richard Henderson r...@twiddle.net
---
  tcg/tcg-op.c | 449 +++
  1 file changed, 207 insertions(+), 242 deletions(-)

Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de

Cheers,
Bastian



Re: [Qemu-devel] [PATCH 2.3 0/8] Linked list for tcg ops

2014-11-14 Thread Bastian Koppelmann


On 11/11/2014 04:24 PM, Richard Henderson wrote:

Richard Henderson (8):
   tcg: Move some opcode generation functions out of line
   tcg: Reduce ifdefs in tcg-op.c
   tcg: Move emit of INDEX_op_end into gen_tb_end
   tcg: Introduce tcg_op_buf_count and tcg_op_buf_full
   tcg: Put opcodes in a linked list
   tcg: Remove opcodes instead of noping them out
   tcg: Implement insert_op_before
   tcg: Remove unused opcodes

  Makefile.target   |2 +-
  include/exec/gen-icount.h |   22 +-
  target-alpha/translate.c  |   16 +-
  target-arm/translate-a64.c|   10 +-
  target-arm/translate.c|   10 +-
  target-cris/translate.c   |   15 +-
  target-i386/translate.c   |   11 +-
  target-lm32/translate.c   |   16 +-
  target-m68k/translate.c   |   10 +-
  target-microblaze/translate.c |   22 +-
  target-mips/translate.c   |   10 +-
  target-moxie/translate.c  |   10 +-
  target-openrisc/translate.c   |   15 +-
  target-ppc/translate.c|   11 +-
  target-s390x/translate.c  |   11 +-
  target-sh4/translate.c|   10 +-
  target-sparc/translate.c  |   10 +-
  target-tricore/translate.c|5 +-
  target-unicore32/translate.c  |   10 +-
  target-xtensa/translate.c |8 +-
  tcg/optimize.c|  307 +++--
  tcg/tcg-op.c  | 1941 
  tcg/tcg-op.h  | 2488 ++---
  tcg/tcg-opc.h |9 -
  tcg/tcg.c |  535 +++--
  tcg/tcg.h |   72 +-
  tci.c |   13 -
  27 files changed, 2761 insertions(+), 2838 deletions(-)
  create mode 100644 tcg/tcg-op.c

Richard, doing the review for the tcg changes helped me in my 
understanding on how tcg works. So whenever you have more changes for 
tcg, feel free to CC me.


Cheers,
Bastian



Re: [Qemu-devel] [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-14 Thread Pawel Moll
On Thu, 2014-11-13 at 09:39 +, Shannon Zhao wrote:
 When we use only virtio-mmio or vhost-net without irqfd, the device uses 
 qemu_set_irq(within qemu)
 to inject interrupt and at the same time qemu update 
 VIRTIO_MMIO_INTERRUPT_STATUS to tell guest
 driver whom this interrupt to. All these things are done by qemu. Though 
 qemu_set_irq will call
 kvm_set_irq to do the real interrupt inject, but the trigger is qemu and it 
 can update the
 VIRTIO_MMIO_INTERRUPT_STATUS register before injecting the interrupt.
 
 But if we use vhost-net with irqfd, the device uses ioeventfd mechanism to 
 inject interrupt.
 When an interrupt happened, it doesn't transfer to qemu, while the irqfd 
 finally call kvm_set_irq
 to inject the interrupt directly. All these things are done in kvm and it 
 can't update the
 VIRTIO_MMIO_INTERRUPT_STATUS register.
 So if the guest driver still uses the old interrupt handler, it has to read 
 the
 VIRTIO_MMIO_INTERRUPT_STATUS register to get the interrupt reason and run 
 different handlers
 for different reasons. But the register has nothing and none of handlers will 
 be called.
 
 I make myself clear? :-)

I see... (well, at least I believe I see ;-)

Of course this means that with irqfd, the device is simply non-compliant
with the spec. And that extending the status register doesn't help you
at all, so we can drop this idea.

Paradoxically, it's a good news (of a sort :-) because I don't think we
should be doing such a fundamental change in the spec at this date. I'll
discuss it with others in the TC and I'm open to be convinced otherwise,
but I believe the spec should stay as it is.

Having said that, I see no problem with experimenting with the device
model so the next version of the standard is extended. Two suggestions I
have would be:

1. Drop the virtio-pci like case with two interrupts (one for config
changes, one for all queues), as it doesn't bring anything new. Just
make it all interrupts individual.

2. Change the way the interrupts are acknowledge - instead of a bitmask,
have a register which takes a queue number to ack the queue's interrupt
and ~0 to ack the config interrupt.

3. Change the version of the device to (intially) 0x8003 - I've just
made an executive decision :-) that non-standard compliant devices
should have the MSB of the version number set (I'll propose to reserve
this range of versions in future version of the spec). We'll consider it
a prototype of the version 3. Then make the driver behave in the new
way when (and only when) such device version is observed.

Also, I remembered I haven't addressed one of your previous comments:

On Wed, 2014-11-12 at 08:32 +, Shannon Zhao wrote:
  One point I'd like to make is that the device was intentionally designed
  with simplicity in mind first, performance later (something about
  embedded etc :-). Changing this assumption is of course possible, but
 Ah, I think ARM is not only about embedded things. Maybe it could has
 a wider application  such as micro server. Just my personal opinion.

By all means, I couldn't agree more. But there's one thing you have to
keep in mind - it doesn't matter whether the real hardware has got PCI
or not, but what is emulated by qemu/KVM. Therefore, if only you can get
the PCI host controller working in the qemu virt machine (which should
be much simpler now, as we have generic support for PCI
controllers/bridges in the kernel now), you'll be able to forget the
issue of virtio-mmio and multiple interrupts and still enjoy your
performance gains :-)

Does it all make sense?

Pawel





[Qemu-devel] [PATCH] target-ppc: Altivec's mtvscr Decodes Wrong Register

2014-11-14 Thread Tom Musta
The Move to Vector Status and Control Register (mtvscr) instruction
uses VRB as the source register.  Fix the code generator to correctly
decode the VRB field.  That is, use rB(ctx-opcode) instead of
rD(ctx-opcode).

Signed-off-by: Tom Musta tommu...@gmail.com
---
 target-ppc/translate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 910ce56..d381632 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -6848,7 +6848,7 @@ static void gen_mtvscr(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VPU);
 return;
 }
-p = gen_avr_ptr(rD(ctx-opcode));
+p = gen_avr_ptr(rB(ctx-opcode));
 gen_helper_mtvscr(cpu_env, p);
 tcg_temp_free_ptr(p);
 }
-- 
1.7.1




Re: [Qemu-devel] [PATCH 0/3] apic: ExtINT fixes for QNX

2014-11-14 Thread Richard Bilson
I've been testing these against our entire kernel regression suite. While they 
definitely fix the original problems, there is one test that now hangs. I'm 
going to have to get to the bottom of that failure before I can sign off.


From: Paolo Bonzini pbonz...@redhat.commailto:pbonz...@redhat.com
Date: Tue Nov 11 2014 09:23:38 GMT-0500 (EST)
To: qemu-devel@nongnu.org qemu-devel@nongnu.orgmailto:qemu-devel@nongnu.org
Cc: Richard Bilson rbil...@qnx.commailto:rbil...@qnx.com
Subject: [PATCH 0/3] apic: ExtINT fixes for QNX

Richard Bilson reported some QNX emulation bugs. I sent him offlist a prototype 
patch, that he commented on.

The first patch here works around what is arguably a QNX bug, that is
triggered by the last two patches; but it is better to avoid spurious
HLT wakeups, so here it is. The last two patches are what Richard
had tested, amended with a fix for Windows XP + KVM + QEMU apic.

Richard, can you confirm that these patches work?

Paolo

Paolo Bonzini (3):
apic: avoid getting out of halted state on masked PIC interrupts
apic: fix loss of IPI due to masked ExtINT
apic: fix incorrect handling of ExtINT interrupts wrt processor
priority

hw/intc/apic.c | 20 
target-i386/cpu.c | 8 ++--
2 files changed, 18 insertions(+), 10 deletions(-)

--
2.1.0




Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure

2014-11-14 Thread Eric Blake
On 11/14/2014 06:12 AM, Markus Armbruster wrote:
 0-length file:
 lseek(fd, 0, SEEK_HOLE) = -1 ENXIO
 lseek(fd, 0, SEEK_DATA) = -1 ENXIO
 conclusion: 0 is at EOF
 
 Isn't this a special case of the next one?
 
 file of any size:
 lseek(fd, size_or_larger, SEEK_HOLE) = -1 ENXIO
 lseek(fd, size_or_larger, SEEK_DATA) = -1 ENXIO
 conclusion: size_or_larger is at or beyond EOF

Yes.


 The two calls are both necessary, in order to learn which extant type
 offset belongs to, and to tell where that extant ends; and the behaviors
 are distinguishable (if both lseek() succeed, we have both numbers we
 want; if both fail with ENXIO, we know the offset is at or beyond EOF;
 and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
 hole); and we can tell at runtime what to do about a trailing hole (if
 the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
 find EOF; if the return value is larger than offset, we have EOF for
 free).  You can optimize by calling SEEK_HOLE first (if it fails with
 ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
 insufficient to give you all the information you need.
 
 Not discussed: how to handle failures other than ENXIO.
 
 The appended code still avoids a second seek in one case.  Useful mostly
 because it saves us from handling a second seek's contradictory
 information.

Slick - I focused on SEEK_HOLE first, but you focused on SEEK_DATA
first.  Your comments make all the difference.

 
 
 /*
  * Find allocation range in @bs around offset @start.
  * May change underlying file descriptor's file offset.
  * If @start is not in a hole, store @start in @data, and the
  * beginning of the next hole in @hole, and return 0.
  * If @start is in a non-trailing hole, store @start in @hole and the
  * beginning of the next non-hole in @data, and return 0.
  * If @start is in a trailing hole or beyond EOF, return -ENXIO.

And caller can blindly and safely treat that as a trailing hole, as needed.

  * If we can't find out, return a negative errno other than -ENXIO.
  */
 static int find_allocation(BlockDriverState *bs, off_t start,
off_t *data, off_t *hole)
 {
 #if defined SEEK_HOLE  defined SEEK_DATA

I seriously doubt you'd find a system with one but not both of these
constants defined.  But it doesn't hurt to check both.

 BDRVRawState *s = bs-opaque;
 off_t offs;
 
 /*
  * SEEK_DATA cases:
  * D1. offs == start: start is in data
  * D2. offs  start: start is in a hole, next data at offs
  * D3. offs  0, errno = ENXIO: either start is in a trailing hole
  *  or start is beyond EOF
  * If the latter happens, the file has been truncated behind
  * our back since we opened it.  Best we can do is treat like
  * a trailing hole.
  * D4. offs  0, errno != ENXIO: we learned nothing
  */

Correct.

 offs = lseek(s-fd, start, SEEK_DATA);
 if (offs  0) {
 return -errno;  /* D3 or D4 */
 }
 assert(offs = start);
 
 if (offs  start) {
 /* D2: in hole, next data at offs */
 *hole = start;
 *data = offs;
 return 0;
 }
 
 /* D1: in data, end not yet known */
 
 /*
  * SEEK_HOLE cases:
  * H1. offs == start: start is in a hole
  * If this happens here, a hole has been dug behind our back
  * since the previous lseek().
  * H2. offs  start: either start is in data, next hole at offs,
  *   or start is in trailing hole, EOF at offs
  * Linux treats trailing holes like any other hole: offs ==
  * start.  Solaris seeks to EOF instead: offs  start (blech).

Correct in isolation.  Coupled with the additional knowledge that we are
in state D1 (and already treated D3 as a trailing hole with early exit),...

  * If that happens here, a hole has been dug behind our back
  * since the previous lseek().

...this is further true for this function.

  * H3. offs  0, errno = ENXIO: start is beyond EOF
  * If this happens, the file has been truncated behind our
  * back since we opened it.  Treat it like a trailing hole.
  * H4. offs  0, errno != ENXIO: we learned nothing
  * Pretend we know nothing at all, i.e. forget about D1.
  */
 offs = lseek(s-fd, start, SEEK_HOLE);
 if (offs  0) {
 return -errno;  /* D1 and (H3 or H4) */
 }
 assert(offs = start);
 
 if (offs  start) {
 /*
  * D1 and H2: either in data, next hole at offs, or it was in
  * data but is now in a trailing hole.  Treating the latter as
  * if it there was data extending to EOF is safe, so simply do
  * that.
  */
 *data = start;
 *hole = offs;
 return 0;
 }

Reasonable.

 
 /* D1 and H1 */
 return -EBUSY;
 #else
 return -ENOTSUP;
 #endif
 }

I like it.  Maybe we could do 

Re: [Qemu-devel] [Bug 1392504] Re: USB Passthrough is not working anymore

2014-11-14 Thread Gonglei
On 2014/11/14 19:57, Leen Keus wrote:

 Command lines generated by virsh from the libvirt xml files (let me know if 
 you also need the xml configuration files):
  
Does this below patch fix your problem?

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=79ae25af1569a50a0ec799901a1bb280c088f121

Regards,
-Gonglei