date:20171012

[Qemu-devel] [PATCH v2] oslib-posix: Fix compiler warning and some data types

2017-10-12 Thread Stefan Weil

gcc warning:

/qemu/util/oslib-posix.c:304:11: error:
 variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’
 [-Werror=clobbered]

Fix also some related data types:

numpages, hpagesize are used as pointer offset.
Always use size_t for them and for the derived numpages_per_thread.

Avoid a type cast by declaring addr volatile.

Signed-off-by: Stefan Weil 
---

v2: Fix more data types (partially as discussed with Richard)

Please note that checkpatch.pl raises an error:

ERROR: Use of volatile is usually wrong: see 
Documentation/volatile-considered-harmful.txt

This error is wrong in the current context.
It also refers to a file which exists in the Linux sources
but not in the QEMU source.

Regards
Stefan

 util/oslib-posix.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 80086c549f..beef148c96 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -59,8 +59,8 @@
 
 struct MemsetThread {
 char *addr;
-uint64_t numpages;
-uint64_t hpagesize;
+size_t numpages;
+size_t hpagesize;
 QemuThread pgthread;
 sigjmp_buf env;
 };
@@ -301,11 +301,7 @@ static void sigbus_handler(int signal)
 static void *do_touch_pages(void *arg)
 {
 MemsetThread *memset_args = (MemsetThread *)arg;
-char *addr = memset_args->addr;
-uint64_t numpages = memset_args->numpages;
-uint64_t hpagesize = memset_args->hpagesize;
 sigset_t set, oldset;
-int i = 0;
 
 /* unblock SIGBUS */
 sigemptyset();
@@ -315,6 +311,10 @@ static void *do_touch_pages(void *arg)
 if (sigsetjmp(memset_args->env, 1)) {
 memset_thread_failed = true;
 } else {
+volatile char *addr = memset_args->addr;
+size_t numpages = memset_args->numpages;
+size_t hpagesize = memset_args->hpagesize;
+size_t i;
 for (i = 0; i < numpages; i++) {
 /*
  * Read & write back the same value, so we don't
@@ -328,7 +328,7 @@ static void *do_touch_pages(void *arg)
  * don't need to write at all so we don't cause
  * wear on the storage backing the region...
  */
-*(volatile char *)addr = *addr;
+*addr = *addr;
 addr += hpagesize;
 }
 }
@@ -351,7 +351,8 @@ static inline int get_memset_num_threads(int smp_cpus)
 static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
 int smp_cpus)
 {
-uint64_t numpages_per_thread, size_per_thread;
+size_t numpages_per_thread;
+size_t size_per_thread;
 char *addr = area;
 int i = 0;
 
-- 
2.11.0

Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy

2017-10-12 Thread Peter Xu

On Thu, Oct 12, 2017 at 01:19:52PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (pet...@redhat.com) wrote:
> > > > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > > We have to be careful about this; a network can fail in a way it
> > > > > gets stuck rather than fails - this can get stuck until a full TCP
> > > > > disconnection; and that takes about 30mins (from memory).
> > > > > The nice thing about using 'shutdown' is that you can kill the 
> > > > > existing
> > > > > connection if it's hung. (Which then makes an interesting question;
> > > > > the rules in your migrate-incoming command become different if you
> > > > > want to declare it's failed!).  Having said that, you're right that at
> > > > > this point stuff has already failed - so do we need the shutdown?
> > > > > (You might want to do the shutdown as part of the recovery earlier
> > > > > or as a separate command to force the failure)
> > > > 
> > > > I assume if I call shutdown before the lock then we'll be good then.
> > > 
> > > The question is what happens if you only allow recovery if we're already
> > > in postcopy-paused state; in the case of a hung socket, since no IO has
> > > actually failed yet, you will still be in postcopy-active.
> > 
> > Hmm, but isn't that a problem of kernel rather than QEMU?  Since
> > sockets are after all managed by kernel.
> 
> Kind of, but it comes down to what the right behaviour of a TCP socket
> is, and the kernel is probably doing the right thing.
> 
> > I don't really know what is the best thing to do to detect whether a
> > socket is stuck.  Assume we can observed that (say, we see migration
> > transferred bytes keep static for 30 seconds), IIRC you mentioned
> > about iptable tricks to break an existing e.g. TCP connection, then we
> > can trigger the -EIO path.
> 
> From the qemu level I'd prefer to make it a command;  if we start
> adding heuristics and timeouts etc then it's very difficult to actually
> get them right.
> 
> > Or do you think we should provide a way to manually trigger the paused
> > state?  Then it goes back to something we discussed with Dan in the
> > earlier post - I'd appreciate if we can postpone the manual trigger
> > support a bit (to make this series small, which is already not...).
> 
> I think that manual trigger is probably necessary; it would just call a
> shutdown() on the sockets and let the things fail into the paused state.
> It'd be pretty simple.  It would be another OOB command; the tricky
> part is just making sure it's thread safe against hte migration
> finishing when you issue it.
> 
> I think it can wait until after this series if you want, but it would
> be good if we can figure it out.

OK.  Let me try it in my next post.  I hope it won't grow into
something bigger (which does happens sometimes... :).

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v1 3/5] xlnx-zcu102: Specify the valid CPUs

2017-10-12 Thread Eduardo Habkost

On Thu, Oct 12, 2017 at 04:59:48PM -0700, Alistair Francis wrote:
> On Tue, Oct 10, 2017 at 8:25 AM, Eduardo Habkost  wrote:
> > On Mon, Oct 09, 2017 at 09:12:29AM +0200, Igor Mammedov wrote:
> >> On Fri, 6 Oct 2017 15:06:57 -0700
> >> Alistair Francis  wrote:
> >>
> >> > On Fri, Oct 6, 2017 at 4:45 AM, Eduardo Habkost  
> >> > wrote:
> >> > > On Fri, Oct 06, 2017 at 10:23:12AM +0200, Igor Mammedov wrote:
> >> > >> On Thu, 5 Oct 2017 14:09:06 -0300
> >> > >> Eduardo Habkost  wrote:
> >> > >>
> >> > >> > On Thu, Oct 05, 2017 at 11:04:27AM +0200, Igor Mammedov wrote:
> >> > >> > > On Wed, 4 Oct 2017 14:39:20 -0700
> >> > >> > > Alistair Francis  wrote:
> >> > >> > >
> >> > >> > > > On Wed, Oct 4, 2017 at 9:34 AM, Eduardo Habkost 
> >> > >> > > >  wrote:
> >> > >> > > > > On Wed, Oct 04, 2017 at 03:08:16PM +0200, Igor Mammedov wrote:
> >> > >> > > > >> On Wed, 4 Oct 2017 09:28:51 -0300
> >> > >> > > > >> Eduardo Habkost  wrote:
> >> > >> > > > >>
> >> > >> > > > >> > On Wed, Oct 04, 2017 at 01:12:32PM +0200, Igor Mammedov 
> >> > >> > > > >> > wrote:
> >> > >> > > > >> > > On Tue, 3 Oct 2017 14:41:17 -0700
> >> > >> > > > >> > > Alistair Francis  wrote:
> >> > >> > > > >> > >
> >> > >> > > > >> > > > On Tue, Oct 3, 2017 at 1:36 PM, Eduardo Habkost 
> >> > >> > > > >> > > >  wrote:
> >> > >> > > > >> > > > > On Tue, Oct 03, 2017 at 01:05:13PM -0700, Alistair 
> >> > >> > > > >> > > > > Francis wrote:
> >> > >> > > > >> > > > >> List all possible valid CPU options.
> >> > >> > > > >> > > > >>
> >> > >> > > > >> > > > >> Signed-off-by: Alistair Francis 
> >> > >> > > > >> > > > >> 
> >> > >> > > > >> > > > >> ---
> >> > >> > > > >> > > > >>
> >> > >> > > > >> > > > >>  hw/arm/xlnx-zcu102.c | 10 ++
> >> > >> > > > >> > > > >>  hw/arm/xlnx-zynqmp.c | 16 +---
> >> > >> > > > >> > > > >>  include/hw/arm/xlnx-zynqmp.h |  1 +
> >> > >> > > > >> > > > >>  3 files changed, 20 insertions(+), 7 deletions(-)
> >> > >> > > > >> > > > >>
> >> > >> > > > >> > > > >> diff --git a/hw/arm/xlnx-zcu102.c 
> >> > >> > > > >> > > > >> b/hw/arm/xlnx-zcu102.c
> >> > >> > > > >> > > > >> index 519a16ed98..039649e522 100644
> >> > >> > > > >> > > > >> --- a/hw/arm/xlnx-zcu102.c
> >> > >> > > > >> > > > >> +++ b/hw/arm/xlnx-zcu102.c
> >> > >> > > > >> > > > >> @@ -98,6 +98,8 @@ static void 
> >> > >> > > > >> > > > >> xlnx_zynqmp_init(XlnxZCU102 *s, MachineState 
> >> > >> > > > >> > > > >> *machine)
> >> > >> > > > >> > > > >>  object_property_add_child(OBJECT(machine), 
> >> > >> > > > >> > > > >> "soc", OBJECT(>soc),
> >> > >> > > > >> > > > >>_abort);
> >> > >> > > > >> > > > >>
> >> > >> > > > >> > > > >> +object_property_set_str(OBJECT(>soc), 
> >> > >> > > > >> > > > >> machine->cpu_type, "cpu-type",
> >> > >> > > > >> > > > >> +_fatal);
> >> > >> > > > >> > > > >
> >> > >> > > > >> > > > > Do you have plans to support other CPU types to 
> >> > >> > > > >> > > > > xlnx_zynqmp in
> >> > >> > > > >> > > > > the future?  If not, I wouldn't bother adding the 
> >> > >> > > > >> > > > > cpu-type
> >> > >> > > > >> > > > > property and the extra boilerplate code if it's 
> >> > >> > > > >> > > > > always going to
> >> > >> > > > >> > > > > be set to cortex-a53.
> >> > >> > > > >> > > >
> >> > >> > > > >> > > > No, it'll always be A53.
> >> > >> > > > >> > > >
> >> > >> > > > >> > > > I did think of that, but I also wanted to use the new 
> >> > >> > > > >> > > > option! I also
> >> > >> > > > >> > > > think there is an advantage in sanely handling users 
> >> > >> > > > >> > > > '-cpu' option,
> >> > >> > > > >> > > > before now we just ignored it, so I think it still 
> >> > >> > > > >> > > > does give a
> >> > >> > > > >> > > > benefit. That'll be especially important on the Xilinx 
> >> > >> > > > >> > > > tree (sometimes
> >> > >> > > > >> > > > people use our machines with a different CPU to 
> >> > >> > > > >> > > > 'benchmark' or test
> >> > >> > > > >> > > > other CPUs with our CoSimulation setup). So I think it 
> >> > >> > > > >> > > > does make sense
> >> > >> > > > >> > > > to keep in.
> >> > >> > > > >> > > if cpu isn't user settable, one could just outright die 
> >> > >> > > > >> > > if cpu_type
> >> > >> > > > >> > > is not NULL and say that user's CLI is wrong.
> >> > >> > > > >> > > (i.e. don't give users illusion that they allowed to use 
> >> > >> > > > >> > > '-cpu')
> >> > >> > > > >> >
> >> > >> > > > >> > Isn't it exactly what this patch does, by setting:
> >> > >> > > > >> > mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
> >> > >> > > > >> > mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
> >> > >> > > > >> > ?
> >> > >> > > > >> >
> >> > >> > > > >> >

[Qemu-devel] [PATCH 2/2] MAINTAINERS: Drop Sun4v nonexistent file

2017-10-12 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index da3c78df47..731c5c7ec2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -778,7 +778,6 @@ F: pc-bios/openbios-sparc64
 Sun4v
 M: Artyom Tarasenko 
 S: Maintained
-F: hw/sparc64/sun4v.c
 F: hw/timer/sun4v-rtc.c
 F: include/hw/timer/sun4v-rtc.h
 
-- 
2.13.5

[Qemu-devel] [PATCH 1/2] MAINTAINERS: Fix scsi path

2017-10-12 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 772ac209e1..da3c78df47 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -982,7 +982,7 @@ S: Supported
 F: include/hw/scsi/*
 F: include/scsi/*
 F: hw/scsi/*
-F: util/scsi*
+F: scsi/*
 F: tests/virtio-scsi-test.c
 T: git git://github.com/bonzini/qemu.git scsi-next
 
-- 
2.13.5

[Qemu-devel] [PATCH 0/2] MAINTAINERS: Two small fixes

2017-10-12 Thread Fam Zheng



Fam Zheng (2):
  MAINTAINERS: Fix scsi path
  MAINTAINERS: Drop Sun4v nonexistent file

 MAINTAINERS | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.13.5

[Qemu-devel] [PATCH] docker: Don't allocate tty unless DEBUG=1

2017-10-12 Thread Fam Zheng

The existence of tty in the container seems to urge gcc into colorize
the output, but the escape chars will clutter the report once turned
into email replies on patchew. Move -t to debug mode.

Reported-by: Eric Blake 
Signed-off-by: Fam Zheng 
---
 tests/docker/Makefile.include | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 6f9ea196a7..ab939f2bec 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -134,10 +134,10 @@ docker-run: docker-qemu-src
"  COPYING $(EXECUTABLE) to $(IMAGE)"))
$(call quiet-command,   \
$(SRC_PATH)/tests/docker/docker.py run  \
-   $(if $(NOUSER),,-u $(shell id -u)) -t   \
+   $(if $(NOUSER),,-u $(shell id -u))  \
--security-opt seccomp=unconfined   \
$(if $V,,--rm)  \
-   $(if $(DEBUG),-i,)  \
+   $(if $(DEBUG),-ti,) \
$(if $(NETWORK),$(if $(subst 
$(NETWORK),,1),--net=$(NETWORK)),--net=none) \
-e TARGET_LIST=$(TARGET_LIST)   \
-e EXTRA_CONFIGURE_OPTS="$(EXTRA_CONFIGURE_OPTS)" \
-- 
2.13.5

Re: [Qemu-devel] [PATCH] fixup! ppc: spapr: use cpu type name directly

2017-10-12 Thread David Gibson

On Thu, Oct 12, 2017 at 05:45:14PM +0200, Igor Mammedov wrote:
> follow up commit that registers host-spapr-cpu-core type unconditionally
>  "ppc: spapr: register 'host' core type  along with the rest of core types"
> 
> makes 'non' machine crash
>  ppc64-softmmu/qemu-system-ppc64 -M none -device host-spapr-cpu-core
>  ERROR:qom/object.c:217:object_type_get_instance_size: assertion failed: 
> (type != NULL)
>  Aborted
> 
> before it qemu fails cleanly with
>  ppc64-softmmu/qemu-system-ppc64 -M none -device host-spapr-cpu-core
>  qemu-system-ppc64: -device host-spapr-cpu-core: 'host-spapr-cpu-core' is not 
> a valid device model name
> 
> spapr_cpu_core_realize() already has explicit check for pseries machine,
> so move access to host cpu type after it so 'none' machine would fail
> cleanly as expected.
> 
> Reported-by: Greg Kurz 
> Signed-off-by: Igor Mammedov 

Applied (folded into the original patch).

> ---
>  hw/ppc/spapr_cpu_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index b5bbb6a..7dbf9c3 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -151,7 +151,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, 
> Error **errp)
>  sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev));
>  sPAPRCPUCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(OBJECT(dev));
>  CPUCore *cc = CPU_CORE(OBJECT(dev));
> -size_t size = object_type_get_instance_size(scc->cpu_type);
> +size_t size;
>  Error *local_err = NULL;
>  void *obj;
>  int i, j;
> @@ -162,6 +162,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> +size = object_type_get_instance_size(scc->cpu_type);
>  sc->threads = g_malloc0(size * cc->nr_threads);
>  for (i = 0; i < cc->nr_threads; i++) {
>  char id[32];

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets

2017-10-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1507852743-51639-1-git-send-email-eswi...@skyportsystems.com
Subject: [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
384d619695 e1000: Faulty tx checksum offload corrupts packets

=== OUTPUT BEGIN ===
Checking PATCH 1/1: e1000: Faulty tx checksum offload corrupts packets...
ERROR: line over 90 characters
#127: FILE: hw/net/e1000.c:549:
+DBGOUT(TXERR, "txsum bug! ver %d src %08x dst %08x len %d proto %d 
cptse %d sum_needed %x oldsum %x newsum %x realsum %x\n",

ERROR: braces {} are necessary for all arms of this statement
#169: FILE: hw/net/e1000.c:584:
+if (tp->tso_props.tcp) {
[...]
 }
[...]

ERROR: line over 90 characters
#230: FILE: hw/net/e1000.c:644:
+DBGOUT(TXERR, "context descriptor: tse %d ip %d tcp %d ipcss %d ipcso %d 
ipcse %d tucss %d tucso %d tucse %d\n",

WARNING: line over 80 characters
#322: FILE: hw/net/e1000.c:731:
+if (sz >= tp->tso_props.hdr_len && tp->size < 
tp->tso_props.hdr_len) {

ERROR: do not use C99 // comments
#402: FILE: hw/net/e1000x_common.h:196:
+unsigned char sum_needed; // doesn't belong here

ERROR: do not use C99 // comments
#411: FILE: hw/net/e1000x_common.h:209:
+bool cptse; // doesn't belong here

ERROR: Missing Signed-off-by: line(s)

total: 6 errors, 1 warnings, 299 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH v2 2/5] bcm2836: Use the Cortex-A7 instead of Cortex-A15

2017-10-12 Thread Alistair Francis

The BCM2836 uses a Cortex-A7 not a Cortex-A15. Update the device to use
the correct CPU.
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2836/QA7_rev3.4.pdf

Signed-off-by: Alistair Francis 
---
V2:
 - Fix the BCM2836 CPU

 hw/arm/bcm2836.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 8c43291112..931795a878 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -30,7 +30,7 @@ static void bcm2836_init(Object *obj)
 
 for (n = 0; n < BCM2836_NCPUS; n++) {
 object_initialize(>cpus[n], sizeof(s->cpus[n]),
-  "cortex-a15-" TYPE_ARM_CPU);
+  "cortex-a7-" TYPE_ARM_CPU);
 object_property_add_child(obj, "cpu[*]", OBJECT(>cpus[n]),
   _abort);
 }
-- 
2.11.0

[Qemu-devel] [PATCH v2 5/5] xilinx_zynq: : Specify the valid CPUs

2017-10-12 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
V2:
 - Fixup alignment

 hw/arm/xilinx_zynq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 1836a4ed45..e169bf0a86 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -313,6 +313,11 @@ static void zynq_init(MachineState *machine)
 arm_load_kernel(ARM_CPU(first_cpu), _binfo);
 }
 
+const char *xlnx_zynq_7000_valid_cpus[] = {
+ARM_CPU_TYPE_NAME("cortex-a9"),
+NULL
+  };
+
 static void zynq_machine_init(MachineClass *mc)
 {
 mc->desc = "Xilinx Zynq Platform Baseboard for Cortex-A9";
@@ -321,6 +326,7 @@ static void zynq_machine_init(MachineClass *mc)
 mc->no_sdcard = 1;
 mc->ignore_memory_transaction_failures = true;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
+mc->valid_cpu_types = xlnx_zynq_7000_valid_cpus;
 }
 
 DEFINE_MACHINE("xilinx-zynq-a9", zynq_machine_init)
-- 
2.11.0

[Qemu-devel] [PATCH v2 1/5] netduino2: Specify the valid CPUs

2017-10-12 Thread Alistair Francis

List all possible valid CPU options.

Although the board only ever has a Cortex-M3 we mark the Cortex-M4 as
supported because the Netduino2 Plus supports the Cortex-M4 and the
Netduino2 Plus is similar to the Netduino2.

Signed-off-by: Alistair Francis 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Philippe Mathieu-Daudé 
---

V2:
 - Fixup allignment
RFC v2:
 - Use a NULL terminated list
 - Add the Cortex-M4 for testing


 hw/arm/netduino2.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/arm/netduino2.c b/hw/arm/netduino2.c
index f936017d4a..6427552a72 100644
--- a/hw/arm/netduino2.c
+++ b/hw/arm/netduino2.c
@@ -34,18 +34,26 @@ static void netduino2_init(MachineState *machine)
 DeviceState *dev;
 
 dev = qdev_create(NULL, TYPE_STM32F205_SOC);
-qdev_prop_set_string(dev, "cpu-type", ARM_CPU_TYPE_NAME("cortex-m3"));
+qdev_prop_set_string(dev, "cpu-type", machine->cpu_type);
 object_property_set_bool(OBJECT(dev), true, "realized", _fatal);
 
 armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
FLASH_SIZE);
 }
 
+const char *netduino_valid_cpus[] = {
+ARM_CPU_TYPE_NAME("cortex-m3"),
+ARM_CPU_TYPE_NAME("cortex-m4"),
+NULL
+};
+
 static void netduino2_machine_init(MachineClass *mc)
 {
 mc->desc = "Netduino 2 Machine";
 mc->init = netduino2_init;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m3");
+mc->valid_cpu_types = netduino_valid_cpus;
 }
 
 DEFINE_MACHINE("netduino2", netduino2_machine_init)
-- 
2.11.0

[Qemu-devel] [PATCH v2 0/5] Add a valid_cpu_types property

2017-10-12 Thread Alistair Francis

There are numorous QEMU machines that only have a single or a handful of
valid CPU options. To simplyfy the management of specificying which CPU
is/isn't valid let's create a property that can be set in the machine
init. We can then check to see if the user supplied CPU is in that list
or not.

I have added the valid_cpu_types for some ARM machines only at the
moment.

Here is what specifying the CPUs looks like now:

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m3" -S
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) info cpus
* CPU #0: thread_id=24175
(qemu) q

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m4" -S
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) q

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-m5" -S
qemu-system-aarch64: unable to find CPU model 'cortex-m5'

$ aarch64-softmmu/qemu-system-aarch64 -M netduino2 -kernel ./u-boot.elf 
-nographic -cpu "cortex-a9" -S
qemu-system-aarch64: Invalid CPU type: cortex-a9-arm-cpu
The valid types are: cortex-m3-arm-cpu, cortex-m4-arm-cpu

V2:
 - Rebase
 - Reorder patches
 - Add a Raspberry Pi 2 CPU fix
V1:
 - Small fixes to prepare a series instead of RFC
 - Add commit messages for the commits
 - Expand the machine support to ARM machines
RFC v2:
 - Rebase on Igor's work
 - Use more QEMUisms inside the code
 - List the supported machines in a NULL terminated array

Alistair Francis (5):
  netduino2: Specify the valid CPUs
  bcm2836: Use the Cortex-A7 instead of Cortex-A15
  raspi: Specify the valid CPUs
  xlnx-zcu102: Specify the valid CPUs
  xilinx_zynq: : Specify the valid CPUs

 hw/arm/bcm2836.c |  2 +-
 hw/arm/netduino2.c   | 10 +-
 hw/arm/raspi.c   |  7 +++
 hw/arm/xilinx_zynq.c |  6 ++
 hw/arm/xlnx-zcu102.c | 17 +
 5 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.11.0

[Qemu-devel] [PATCH v2 4/5] xlnx-zcu102: Specify the valid CPUs

2017-10-12 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
---

An implementation for single CPU machines is still being discussed. A
solution proposed by Eduardo is this:

1) Change the default on TYPE_MACHINE to:
 mc->valid_cpu_types = { TYPE_CPU, NULL };

   This will keep the existing behavior for all boards.

2) mc->valid_cpu_types=NULL be interpreted as "no CPU model
   except the default is accepted" or "-cpu is not accepted" in
   machine_run_board_init() (I prefer the former, but both
   options would be correct)

3) Boards like xlnx_zynqmp could then just do this:

   static void xxx_class_init(...) {
   mc->default_cpu_type = MY_CPU_TYPE;
   /* Reason: XXX_init() is hardcoded to MY_CPU_TYPE */
   mc->valid_cpu_types = NULL;
   }

V2:
 - Don't use the users -cpu
 - Fixup allignment

 hw/arm/xlnx-zcu102.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
index 519a16ed98..c6df776d89 100644
--- a/hw/arm/xlnx-zcu102.c
+++ b/hw/arm/xlnx-zcu102.c
@@ -160,6 +160,11 @@ static void xlnx_zynqmp_init(XlnxZCU102 *s, MachineState 
*machine)
 arm_load_kernel(s->soc.boot_cpu_ptr, _zcu102_binfo);
 }
 
+const char *xlnx_zynqmp_valid_cpus[] = {
+ARM_CPU_TYPE_NAME("cortex-a53"),
+NULL
+   };
+
 static void xlnx_ep108_init(MachineState *machine)
 {
 XlnxZCU102 *s = EP108_MACHINE(machine);
@@ -185,6 +190,12 @@ static void xlnx_ep108_machine_class_init(ObjectClass *oc, 
void *data)
 mc->block_default_type = IF_IDE;
 mc->units_per_default_bus = 1;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
+/* The ZynqMP SoC is always a Cortex-A53. We add this here to give
+ * users a sane error if they specify a different CPU, but we never
+ * use their CPU choice.
+ */
+mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
 }
 
 static const TypeInfo xlnx_ep108_machine_init_typeinfo = {
@@ -240,6 +251,12 @@ static void xlnx_zcu102_machine_class_init(ObjectClass 
*oc, void *data)
 mc->block_default_type = IF_IDE;
 mc->units_per_default_bus = 1;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
+/* The ZynqMP SoC is always a Cortex-A53. We add this here to give
+ * users a sane error if they specify a different CPU, but we never
+ * use their CPU choice.
+ */
+mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
 }
 
 static const TypeInfo xlnx_zcu102_machine_init_typeinfo = {
-- 
2.11.0

[Qemu-devel] [PATCH v2 3/5] raspi: Specify the valid CPUs

2017-10-12 Thread Alistair Francis

List all possible valid CPU options.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
V2:
 - Fix the indentation

 hw/arm/raspi.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 5941c9f751..2a7e6a40bb 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -158,6 +158,11 @@ static void raspi2_init(MachineState *machine)
 setup_boot(machine, 2, machine->ram_size - vcram_size);
 }
 
+const char *raspi2_valid_cpus[] = {
+ARM_CPU_TYPE_NAME("cortex-a7"),
+NULL
+  };
+
 static void raspi2_machine_init(MachineClass *mc)
 {
 mc->desc = "Raspberry Pi 2";
@@ -169,5 +174,7 @@ static void raspi2_machine_init(MachineClass *mc)
 mc->max_cpus = BCM2836_NCPUS;
 mc->default_ram_size = 1024 * 1024 * 1024;
 mc->ignore_memory_transaction_failures = true;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
+mc->valid_cpu_types = raspi2_valid_cpus;
 };
 DEFINE_MACHINE("raspi2", raspi2_machine_init)
-- 
2.11.0

Re: [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets

2017-10-12 Thread no-reply

Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Message-id: 1507852743-51639-1-git-send-email-eswi...@skyportsystems.com
Subject: [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1507852743-51639-1-git-send-email-eswi...@skyportsystems.com -> 
patchew/1507852743-51639-1-git-send-email-eswi...@skyportsystems.com
 - [tag update]  patchew/20171012095319.136610-1-vsement...@virtuozzo.com 
-> patchew/20171012095319.136610-1-vsement...@virtuozzo.com
Switched to a new branch 'test'
384d619 e1000: Faulty tx checksum offload corrupts packets

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=44483
SHELL=/bin/sh
USER=fam
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin
PWD=/var/tmp/patchew-tester-tmp-5drd4523/src
LANG=en_US.UTF-8
HOME=/home/fam
SHLVL=2
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
xz-libs-5.2.2-2.fc24.s390x
libxshmfence-1.2-3.fc24.s390x
giflib-4.1.6-15.fc24.s390x
trousers-lib-0.3.13-6.fc24.s390x
ncurses-base-6.0-6.20160709.fc25.noarch
gmp-6.1.1-1.fc25.s390x
libidn-1.33-1.fc25.s390x
slang-2.3.0-7.fc25.s390x
pkgconfig-0.29.1-1.fc25.s390x
alsa-lib-1.1.1-2.fc25.s390x
yum-metadata-parser-1.1.4-17.fc25.s390x
python3-slip-dbus-0.6.4-4.fc25.noarch
python2-cssselect-0.9.2-1.fc25.noarch
createrepo_c-libs-0.10.0-6.fc25.s390x
initscripts-9.69-1.fc25.s390x
parted-3.2-21.fc25.s390x
flex-2.6.0-3.fc25.s390x
colord-libs-1.3.4-1.fc25.s390x
python-osbs-client-0.33-3.fc25.noarch
perl-Pod-Simple-3.35-1.fc25.noarch
python2-simplejson-3.10.0-1.fc25.s390x
brltty-5.4-2.fc25.s390x
librados2-10.2.4-2.fc25.s390x
tcp_wrappers-7.6-83.fc25.s390x
libcephfs_jni1-10.2.4-2.fc25.s390x
nettle-devel-3.3-1.fc25.s390x
bzip2-devel-1.0.6-21.fc25.s390x
libuuid-2.28.2-2.fc25.s390x
python3-dnf-1.1.10-6.fc25.noarch
texlive-kpathsea-doc-svn41139-33.fc25.1.noarch
openssh-7.4p1-4.fc25.s390x
texlive-kpathsea-bin-svn40473-33.20160520.fc25.1.s390x
texlive-graphics-svn41015-33.fc25.1.noarch
texlive-dvipdfmx-def-svn40328-33.fc25.1.noarch
texlive-mfware-svn40768-33.fc25.1.noarch
texlive-texlive-scripts-svn41433-33.fc25.1.noarch
texlive-euro-svn22191.1.1-33.fc25.1.noarch
texlive-etex-svn37057.0-33.fc25.1.noarch
texlive-iftex-svn29654.0.2-33.fc25.1.noarch
texlive-palatino-svn31835.0-33.fc25.1.noarch
texlive-texlive-docindex-svn41430-33.fc25.1.noarch
texlive-xunicode-svn30466.0.981-33.fc25.1.noarch
texlive-koma-script-svn41508-33.fc25.1.noarch
texlive-pst-grad-svn15878.1.06-33.fc25.1.noarch
texlive-pst-blur-svn15878.2.0-33.fc25.1.noarch
texlive-jknapltx-svn19440.0-33.fc25.1.noarch
texinfo-6.1-4.fc25.s390x
openssl-devel-1.0.2k-1.fc25.s390x
jansson-2.10-2.fc25.s390x
fedora-repos-25-4.noarch
perl-Errno-1.25-387.fc25.s390x
acl-2.2.52-13.fc25.s390x
systemd-pam-231-17.fc25.s390x
NetworkManager-libnm-1.4.4-5.fc25.s390x
poppler-0.45.0-5.fc25.s390x
ccache-3.3.4-1.fc25.s390x
valgrind-3.12.0-9.fc25.s390x
perl-open-1.10-387.fc25.noarch
libgcc-6.4.1-1.fc25.s390x
libsoup-2.56.1-1.fc25.s390x
libstdc++-devel-6.4.1-1.fc25.s390x
libobjc-6.4.1-1.fc25.s390x
python2-rpm-4.13.0.1-2.fc25.s390x
python2-gluster-3.10.5-1.fc25.s390x
rpm-build-4.13.0.1-2.fc25.s390x
glibc-static-2.24-10.fc25.s390x
lz4-1.8.0-1.fc25.s390x
xapian-core-libs-1.2.24-1.fc25.s390x
elfutils-libelf-devel-0.169-1.fc25.s390x
nss-softokn-3.32.0-1.2.fc25.s390x
pango-1.40.9-1.fc25.s390x
glibc-debuginfo-common-2.24-10.fc25.s390x
libaio-0.3.110-6.fc24.s390x
libfontenc-1.1.3-3.fc24.s390x
lzo-2.08-8.fc24.s390x
isl-0.14-5.fc24.s390x
libXau-1.0.8-6.fc24.s390x
linux-atm-libs-2.5.1-14.fc24.s390x
libXext-1.3.3-4.fc24.s390x
libXxf86vm-1.1.4-3.fc24.s390x
bison-3.0.4-4.fc24.s390x
perl-srpm-macros-1-20.fc25.noarch
gawk-4.1.3-8.fc25.s390x
libwayland-client-1.12.0-1.fc25.s390x
perl-Exporter-5.72-366.fc25.noarch
perl-version-0.99.17-1.fc25.s390x
fftw-libs-double-3.3.5-3.fc25.s390x
libssh2-1.8.0-1.fc25.s390x
ModemManager-glib-1.6.4-1.fc25.s390x
newt-python3-0.52.19-2.fc25.s390x
python-munch-2.0.4-3.fc25.noarch
python-bugzilla-1.2.2-4.fc25.noarch
libedit-3.1-16.20160618cvs.fc25.s390x
createrepo_c-0.10.0-6.fc25.s390x
device-mapper-multipath-libs-0.4.9-83.fc25.s390x
yum-3.4.3-510.fc25.noarch
mozjs17-17.0.0-16.fc25.s390x
libselinux-2.5-13.fc25.s390x

Re: [Qemu-devel] [PATCH v1 3/5] xlnx-zcu102: Specify the valid CPUs

2017-10-12 Thread Alistair Francis

On Tue, Oct 10, 2017 at 8:25 AM, Eduardo Habkost  wrote:
> On Mon, Oct 09, 2017 at 09:12:29AM +0200, Igor Mammedov wrote:
>> On Fri, 6 Oct 2017 15:06:57 -0700
>> Alistair Francis  wrote:
>>
>> > On Fri, Oct 6, 2017 at 4:45 AM, Eduardo Habkost  
>> > wrote:
>> > > On Fri, Oct 06, 2017 at 10:23:12AM +0200, Igor Mammedov wrote:
>> > >> On Thu, 5 Oct 2017 14:09:06 -0300
>> > >> Eduardo Habkost  wrote:
>> > >>
>> > >> > On Thu, Oct 05, 2017 at 11:04:27AM +0200, Igor Mammedov wrote:
>> > >> > > On Wed, 4 Oct 2017 14:39:20 -0700
>> > >> > > Alistair Francis  wrote:
>> > >> > >
>> > >> > > > On Wed, Oct 4, 2017 at 9:34 AM, Eduardo Habkost 
>> > >> > > >  wrote:
>> > >> > > > > On Wed, Oct 04, 2017 at 03:08:16PM +0200, Igor Mammedov wrote:
>> > >> > > > >> On Wed, 4 Oct 2017 09:28:51 -0300
>> > >> > > > >> Eduardo Habkost  wrote:
>> > >> > > > >>
>> > >> > > > >> > On Wed, Oct 04, 2017 at 01:12:32PM +0200, Igor Mammedov 
>> > >> > > > >> > wrote:
>> > >> > > > >> > > On Tue, 3 Oct 2017 14:41:17 -0700
>> > >> > > > >> > > Alistair Francis  wrote:
>> > >> > > > >> > >
>> > >> > > > >> > > > On Tue, Oct 3, 2017 at 1:36 PM, Eduardo Habkost 
>> > >> > > > >> > > >  wrote:
>> > >> > > > >> > > > > On Tue, Oct 03, 2017 at 01:05:13PM -0700, Alistair 
>> > >> > > > >> > > > > Francis wrote:
>> > >> > > > >> > > > >> List all possible valid CPU options.
>> > >> > > > >> > > > >>
>> > >> > > > >> > > > >> Signed-off-by: Alistair Francis 
>> > >> > > > >> > > > >> 
>> > >> > > > >> > > > >> ---
>> > >> > > > >> > > > >>
>> > >> > > > >> > > > >>  hw/arm/xlnx-zcu102.c | 10 ++
>> > >> > > > >> > > > >>  hw/arm/xlnx-zynqmp.c | 16 +---
>> > >> > > > >> > > > >>  include/hw/arm/xlnx-zynqmp.h |  1 +
>> > >> > > > >> > > > >>  3 files changed, 20 insertions(+), 7 deletions(-)
>> > >> > > > >> > > > >>
>> > >> > > > >> > > > >> diff --git a/hw/arm/xlnx-zcu102.c 
>> > >> > > > >> > > > >> b/hw/arm/xlnx-zcu102.c
>> > >> > > > >> > > > >> index 519a16ed98..039649e522 100644
>> > >> > > > >> > > > >> --- a/hw/arm/xlnx-zcu102.c
>> > >> > > > >> > > > >> +++ b/hw/arm/xlnx-zcu102.c
>> > >> > > > >> > > > >> @@ -98,6 +98,8 @@ static void 
>> > >> > > > >> > > > >> xlnx_zynqmp_init(XlnxZCU102 *s, MachineState *machine)
>> > >> > > > >> > > > >>  object_property_add_child(OBJECT(machine), 
>> > >> > > > >> > > > >> "soc", OBJECT(>soc),
>> > >> > > > >> > > > >>_abort);
>> > >> > > > >> > > > >>
>> > >> > > > >> > > > >> +object_property_set_str(OBJECT(>soc), 
>> > >> > > > >> > > > >> machine->cpu_type, "cpu-type",
>> > >> > > > >> > > > >> +_fatal);
>> > >> > > > >> > > > >
>> > >> > > > >> > > > > Do you have plans to support other CPU types to 
>> > >> > > > >> > > > > xlnx_zynqmp in
>> > >> > > > >> > > > > the future?  If not, I wouldn't bother adding the 
>> > >> > > > >> > > > > cpu-type
>> > >> > > > >> > > > > property and the extra boilerplate code if it's always 
>> > >> > > > >> > > > > going to
>> > >> > > > >> > > > > be set to cortex-a53.
>> > >> > > > >> > > >
>> > >> > > > >> > > > No, it'll always be A53.
>> > >> > > > >> > > >
>> > >> > > > >> > > > I did think of that, but I also wanted to use the new 
>> > >> > > > >> > > > option! I also
>> > >> > > > >> > > > think there is an advantage in sanely handling users 
>> > >> > > > >> > > > '-cpu' option,
>> > >> > > > >> > > > before now we just ignored it, so I think it still does 
>> > >> > > > >> > > > give a
>> > >> > > > >> > > > benefit. That'll be especially important on the Xilinx 
>> > >> > > > >> > > > tree (sometimes
>> > >> > > > >> > > > people use our machines with a different CPU to 
>> > >> > > > >> > > > 'benchmark' or test
>> > >> > > > >> > > > other CPUs with our CoSimulation setup). So I think it 
>> > >> > > > >> > > > does make sense
>> > >> > > > >> > > > to keep in.
>> > >> > > > >> > > if cpu isn't user settable, one could just outright die if 
>> > >> > > > >> > > cpu_type
>> > >> > > > >> > > is not NULL and say that user's CLI is wrong.
>> > >> > > > >> > > (i.e. don't give users illusion that they allowed to use 
>> > >> > > > >> > > '-cpu')
>> > >> > > > >> >
>> > >> > > > >> > Isn't it exactly what this patch does, by setting:
>> > >> > > > >> > mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
>> > >> > > > >> > mc->valid_cpu_types = xlnx_zynqmp_valid_cpus;
>> > >> > > > >> > ?
>> > >> > > > >> >
>> > >> > > > >> > Except that "-cpu cortex-a53" won't die, which is a good 
>> > >> > > > >> > thing.
>> > >> > > > >> allowing "-cpu cortex-a53" here, would allow to use feature 
>> > >> > > > >> parsing
>> > >> > > > >> which weren't allowed or were ignored before if user supplied 
>> > >> > > >

[Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets

2017-10-12 Thread Ed Swierk via Qemu-devel

The transmit checksum offload implementation in QEMU's e1000 device is
deficient and causes packet data corruption in some situations.

According to the Intel 8254x software developer's manual[1], the
device maintains two separate contexts: the TCP segmentation offload
context includes parameters for both segmentation offload and checksum
offload, and the normal (checksum-offload-only) context includes only
checksum offload parameters. These parameters specify over which
packet data to compute the checksum, and where in the packet to store
the computed checksum(s).

[1] 
https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf

The e1000 driver can update either of these contexts by sending a
transmit context descriptor. The TSE bit in the TUCMD field controls
which context is modified by the descriptor. Crucially, a transmit
context descriptor with TSE=1 changes only the TSO context, leaving
the SUM context unchanged; with TSE=0 the opposite is true.

Fields in the transmit data descriptor determine which (if either) of
these two contexts the device uses when actually transmitting some
data:

- If the TSE bit in the DCMD field is set, then the device performs
  TCP segmentation offload using the parameters previously set in the
  TSO context. In addition, if TXSM and/or IXSM is set in the POPTS
  field, the device performs the appropriate checksum offloads using
  the parameters in the same (TSO) context.

- Otherwise, if the TSE bit in the DCMD field is clear, then there is
  no TCP segmentation offload. If TXSM and/or IXSM is set in the POPTS
  field, the device performs the appropriate checksum offloads using
  the parameters in the SUM context.

The e1000 driver is free to set up the TSO and SUM contexts and then
transmit a mixture of data, with each data descriptor using a
different (or neither) context. This is what the e1000 driver for
Windows (Intel(R) PRO/1000 MT Network Connection, aka E1G6023E.sys)
does in certain cases. Sometimes with quite undesirable results, since
the QEMU e1000 device doesn't work as described above.

Instead, the QEMU e1000 device maintains only one context in its state
structure. When it receives a transmit context descriptor from the
driver, it overwrites the context parameters regardless of the TSE bit
in the TUCMD field.

To see why this is wrong, suppose the driver first sets up a SUM
context with UDP checksum offload parameters (say, TUCSO pointing to
the appropriate offset for a UDP checksum, 6 bytes into the header),
and then sets up a TSO context with TCP checksum offload parameters
(TUCSO pointing to the appropriate offset for a TCP checksum, 16 bytes
into the header). The driver then sends a transmit data descriptor
with TSO=0 and TXSM=1 along with a UDP datagram. The QEMU e1000 device
computes the checksum using the last set of checksum offload
parameters, and writes the checksum to offset 16, stomping on two
bytes of UDP data, and leaving the wrong checksum in the UDP checksum
field.

To make matters worse, if the network stack on the host running QEMU
treats data transmitted from a VM as locally originated, it may do its
own UDP checksum computation, "correcting" it to match the corrupt
data before sending it on the wire. Now the corrupt UDP packet makes
its way all the way to the destination.

(A separate layer of icing on the cake is that QEMU ignores the
requirement that a UDP checksum computed as zero be sent as 0x,
since zero is a special value meaning no checksum. So even when QEMU
doesn't corrupt the packet data, the packet sometimes leaves the box
with no checksum at all.)

I have instrumented QEMU and reproduced this behavior with a Windows
10 guest (rather easily with a TCP iperf and a UDPi iperf running in
parallel). I have also attempted a fix, which is below in very rough
form.

Before I spend too much time refining a patch, I'd like to get
feedback on my approach.

One puzzle is what to do about e1000e: it shares shares some data
structures and a bit of code with e1000, but little else, which is
surprising given how similar they are (or should be). The e1000e's
handling of TCP segmentation offload and checksum offload is totally
different, and problematic for other reasons (it totally ignores most
of the context parameters provided by the driver and basically does
what it thinks is best by digging into the packet data). Is this
divergence intentional? Is there a reason not to change e1000e as long
as I'm trying to make e1000 more datasheet-conformant?

--Ed

---
 hw/net/e1000.c | 176 ++---
 hw/net/e1000x_common.h |   4 +-
 2 files changed, 126 insertions(+), 54 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 9324949..e45746f 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -98,7 +98,10 @@ typedef struct E1000State_st {
 unsigned char data[0x1];
 uint16_t size;
 unsigned char vlan_needed;
-

[Qemu-devel] [PATCH 3/4] multiboot: load elf sections and section headers

2017-10-12 Thread Anatol Pomozov

Multiboot may load section headers and all sections (even those that are
not part of any segment) to target memory.

Tested with an ELF application that uses data from strings table
section.

Signed-off-by: Anatol Pomozov 
---
 hw/core/loader.c |   8 +--
 hw/i386/multiboot.c  |  83 +---
 hw/s390x/ipl.c   |   2 +-
 include/hw/elf_ops.h | 107 ++-
 include/hw/loader.h  |  11 +++-
 tests/multiboot/Makefile |   8 ++-
 tests/multiboot/generate_sections_out.py |  33 ++
 tests/multiboot/modules.out  |  22 +++
 tests/multiboot/run_test.sh  |   6 +-
 tests/multiboot/sections.c   |  55 
 tests/multiboot/start.S  |   2 +-
 11 files changed, 276 insertions(+), 61 deletions(-)
 create mode 100755 tests/multiboot/generate_sections_out.py
 create mode 100644 tests/multiboot/sections.c

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 4593061445..a8787f7685 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -439,7 +439,7 @@ int load_elf_as(const char *filename,
 {
 return load_elf_ram(filename, translate_fn, translate_opaque,
 pentry, lowaddr, highaddr, big_endian, elf_machine,
-clear_lsb, data_swab, as, true);
+clear_lsb, data_swab, as, true, NULL);
 }
 
 /* return < 0 if error, otherwise the number of bytes loaded in memory */
@@ -448,7 +448,7 @@ int load_elf_ram(const char *filename,
  void *translate_opaque, uint64_t *pentry, uint64_t *lowaddr,
  uint64_t *highaddr, int big_endian, int elf_machine,
  int clear_lsb, int data_swab, AddressSpace *as,
- bool load_rom)
+ bool load_rom, SectionsData *sections)
 {
 int fd, data_order, target_data_order, must_swab, ret = ELF_LOAD_FAILED;
 uint8_t e_ident[EI_NIDENT];
@@ -488,11 +488,11 @@ int load_elf_ram(const char *filename,
 if (e_ident[EI_CLASS] == ELFCLASS64) {
 ret = load_elf64(filename, fd, translate_fn, translate_opaque, 
must_swab,
  pentry, lowaddr, highaddr, elf_machine, clear_lsb,
- data_swab, as, load_rom);
+ data_swab, as, load_rom, sections);
 } else {
 ret = load_elf32(filename, fd, translate_fn, translate_opaque, 
must_swab,
  pentry, lowaddr, highaddr, elf_machine, clear_lsb,
- data_swab, as, load_rom);
+ data_swab, as, load_rom, sections);
 }
 
  fail:
diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 7dacd6d827..841d05160a 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -125,7 +125,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 {
 int i;
 bool is_multiboot = false;
-uint32_t flags = 0;
+uint32_t flags = 0, bootinfo_flags = 0;
 uint32_t mh_entry_addr;
 uint32_t mh_load_addr;
 uint32_t mb_kernel_size;
@@ -134,6 +134,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 uint8_t *mb_bootinfo_data;
 uint32_t cmdline_len;
 struct multiboot_header *multiboot_header;
+SectionsData sections;
 
 /* Ok, let's see if it is a multiboot image.
The header is 12x32bit long, so the latest entry may be 8192 - 48. */
@@ -161,37 +162,8 @@ int load_multiboot(FWCfgState *fw_cfg,
 if (flags & MULTIBOOT_VIDEO_MODE) {
 fprintf(stderr, "qemu: multiboot knows VBE. we don't.\n");
 }
-if (!(flags & MULTIBOOT_AOUT_KLUDGE)) {
-uint64_t elf_entry;
-uint64_t elf_low, elf_high;
-int kernel_size;
-fclose(f);
 
-if (((struct elf64_hdr*)header)->e_machine == EM_X86_64) {
-fprintf(stderr, "Cannot load x86-64 image, give a 32bit one.\n");
-exit(1);
-}
-
-kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
-   _low, _high, 0, EM_NONE,
-   0, 0);
-if (kernel_size < 0) {
-fprintf(stderr, "Error while loading elf kernel\n");
-exit(1);
-}
-mh_load_addr = elf_low;
-mb_kernel_size = elf_high - elf_low;
-mh_entry_addr = elf_entry;
-
-mbs.mb_buf = g_malloc(mb_kernel_size);
-if (rom_copy(mbs.mb_buf, mh_load_addr, mb_kernel_size) != 
mb_kernel_size) {
-fprintf(stderr, "Error while fetching elf kernel from rom\n");
-exit(1);
-}
-
-mb_debug("qemu: loading multiboot-elf kernel (%#x bytes) with entry 
%#zx\n",
-  mb_kernel_size, (size_t)mh_entry_addr);
-} else {
+if (flags & MULTIBOOT_AOUT_KLUDGE) {
 /* Valid if mh_flags sets MULTIBOOT_AOUT_KLUDGE. */
 uint32_t mh_header_addr = ldl_p(_header->header_addr);

[Qemu-devel] [PATCH 2/4] multiboot: load any machine type of ELF

2017-10-12 Thread Anatol Pomozov

x86 is not the only architecture supported by multiboot.
For example GRUB supports MIPS architecture as well.

Signed-off-by: Anatol Pomozov 
---
 hw/i386/multiboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index c9254f313e..7dacd6d827 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -173,7 +173,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 }
 
 kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
-   _low, _high, 0, I386_ELF_MACHINE,
+   _low, _high, 0, EM_NONE,
0, 0);
 if (kernel_size < 0) {
 fprintf(stderr, "Error while loading elf kernel\n");
-- 
2.15.0.rc0.271.g36b669edcc-goog

[Qemu-devel] [PATCH 4/4] multiboot: make tests work with clang

2017-10-12 Thread Anatol Pomozov

 * clang 3.8 enables SSE even for 32bit code. Generate code for pentium
   CPU to make sure no new instructions are used.
 * add memset() implementation. Clang implements array zeroing in
   print_num() via memset() function call.
---
 tests/multiboot/Makefile| 2 +-
 tests/multiboot/libc.c  | 9 +
 tests/multiboot/libc.h  | 2 ++
 tests/multiboot/run_test.sh | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/tests/multiboot/Makefile b/tests/multiboot/Makefile
index b6a5056347..79dfe85afc 100644
--- a/tests/multiboot/Makefile
+++ b/tests/multiboot/Makefile
@@ -1,5 +1,5 @@
 CC=gcc
-CCFLAGS=-m32 -Wall -Wextra -Werror -fno-stack-protector -nostdinc -fno-builtin
+CCFLAGS=-m32 -Wall -Wextra -Werror -fno-stack-protector -nostdinc -fno-builtin 
-march=pentium
 ASFLAGS=-m32
 
 LD=ld
diff --git a/tests/multiboot/libc.c b/tests/multiboot/libc.c
index 6df9bda96d..512fccd7fa 100644
--- a/tests/multiboot/libc.c
+++ b/tests/multiboot/libc.c
@@ -33,6 +33,15 @@ void* memcpy(void *dest, const void *src, int n)
 
 return dest;
 }
+void *memset(void *s, int c, size_t n)
+{
+size_t i;
+char *d = s;
+for (i = 0; i < n; i++) {
+*d++ = c;
+}
+return s;
+}
 
 static void print_char(char c)
 {
diff --git a/tests/multiboot/libc.h b/tests/multiboot/libc.h
index 04c9922c27..bdf7c34287 100644
--- a/tests/multiboot/libc.h
+++ b/tests/multiboot/libc.h
@@ -36,6 +36,7 @@ typedef signed short int16_t;
 typedef signed char int8_t;
 
 typedef uint32_t uintptr_t;
+typedef uint32_t size_t;
 
 
 /* stdarg.h */
@@ -58,5 +59,6 @@ static inline void outb(uint16_t port, uint8_t data)
 
 void printf(const char *fmt, ...);
 void* memcpy(void *dest, const void *src, int n);
+void* memset(void *s, int c, size_t n);
 
 #endif
diff --git a/tests/multiboot/run_test.sh b/tests/multiboot/run_test.sh
index f04e35cbf0..38dcfef42c 100755
--- a/tests/multiboot/run_test.sh
+++ b/tests/multiboot/run_test.sh
@@ -29,6 +29,7 @@ run_qemu() {
 printf %b "\n\n=== Running test case: $kernel $* ===\n\n" >> test.log
 
 $QEMU \
+-cpu pentium \
 -kernel $kernel \
 -display none \
 -device isa-debugcon,chardev=stdio \
-- 
2.15.0.rc0.271.g36b669edcc-goog

[Qemu-devel] (no subject)

2017-10-12 Thread Anatol Pomozov




It is V3 of multiboot improvements to Qemu

Changes made sinse V2:
 - rebase on top of qemu master changes
 - make multiboot/sections test more reliable
 Add generate_sections_out.py script that generates ELF sections information
 - rename 'struct section_data' to 'struct SectionData' to match naming
 convention in include/hw/loader.h

[Qemu-devel] [PATCH 1/4] multiboot: Change multiboot_info from array of bytes to a C struct

2017-10-12 Thread Anatol Pomozov

Using C structs makes the code more readable and prevents type conversion
errors.

Borrow multiboot1 header from GRUB project.

Signed-off-by: Anatol Pomozov 
---
 hw/i386/multiboot.c | 124 +
 hw/i386/multiboot_header.h  | 254 
 tests/multiboot/mmap.c  |  14 +--
 tests/multiboot/mmap.out|  10 ++
 tests/multiboot/modules.c   |  12 ++-
 tests/multiboot/modules.out |  10 ++
 tests/multiboot/multiboot.h |  66 
 7 files changed, 339 insertions(+), 151 deletions(-)
 create mode 100644 hw/i386/multiboot_header.h
 delete mode 100644 tests/multiboot/multiboot.h

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index c7b70c91d5..c9254f313e 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -28,6 +28,7 @@
 #include "hw/hw.h"
 #include "hw/nvram/fw_cfg.h"
 #include "multiboot.h"
+#include "multiboot_header.h"
 #include "hw/loader.h"
 #include "elf.h"
 #include "sysemu/sysemu.h"
@@ -47,39 +48,9 @@
 #error multiboot struct needs to fit in 16 bit real mode
 #endif
 
-enum {
-/* Multiboot info */
-MBI_FLAGS   = 0,
-MBI_MEM_LOWER   = 4,
-MBI_MEM_UPPER   = 8,
-MBI_BOOT_DEVICE = 12,
-MBI_CMDLINE = 16,
-MBI_MODS_COUNT  = 20,
-MBI_MODS_ADDR   = 24,
-MBI_MMAP_ADDR   = 48,
-MBI_BOOTLOADER  = 64,
-
-MBI_SIZE= 88,
-
-/* Multiboot modules */
-MB_MOD_START= 0,
-MB_MOD_END  = 4,
-MB_MOD_CMDLINE  = 8,
-
-MB_MOD_SIZE = 16,
-
-/* Region offsets */
-ADDR_E820_MAP = MULTIBOOT_STRUCT_ADDR + 0,
-ADDR_MBI  = ADDR_E820_MAP + 0x500,
-
-/* Multiboot flags */
-MULTIBOOT_FLAGS_MEMORY  = 1 << 0,
-MULTIBOOT_FLAGS_BOOT_DEVICE = 1 << 1,
-MULTIBOOT_FLAGS_CMDLINE = 1 << 2,
-MULTIBOOT_FLAGS_MODULES = 1 << 3,
-MULTIBOOT_FLAGS_MMAP= 1 << 6,
-MULTIBOOT_FLAGS_BOOTLOADER  = 1 << 9,
-};
+/* Region offsets */
+#define ADDR_E820_MAP MULTIBOOT_STRUCT_ADDR
+#define ADDR_MBI  (ADDR_E820_MAP + 0x500)
 
 typedef struct {
 /* buffer holding kernel, cmdlines and mb_infos */
@@ -128,14 +99,15 @@ static void mb_add_mod(MultibootState *s,
hwaddr start, hwaddr end,
hwaddr cmdline_phys)
 {
-char *p;
+multiboot_module_t *mod;
 assert(s->mb_mods_count < s->mb_mods_avail);
 
-p = (char *)s->mb_buf + s->offset_mbinfo + MB_MOD_SIZE * s->mb_mods_count;
+mod = s->mb_buf + s->offset_mbinfo +
+  sizeof(multiboot_module_t) * s->mb_mods_count;
 
-stl_p(p + MB_MOD_START,   start);
-stl_p(p + MB_MOD_END, end);
-stl_p(p + MB_MOD_CMDLINE, cmdline_phys);
+stl_p(>mod_start, start);
+stl_p(>mod_end,   end);
+stl_p(>cmdline,   cmdline_phys);
 
 mb_debug("mod%02d: "TARGET_FMT_plx" - "TARGET_FMT_plx"\n",
  s->mb_mods_count, start, end);
@@ -151,26 +123,29 @@ int load_multiboot(FWCfgState *fw_cfg,
int kernel_file_size,
uint8_t *header)
 {
-int i, is_multiboot = 0;
+int i;
+bool is_multiboot = false;
 uint32_t flags = 0;
 uint32_t mh_entry_addr;
 uint32_t mh_load_addr;
 uint32_t mb_kernel_size;
 MultibootState mbs;
-uint8_t bootinfo[MBI_SIZE];
+multiboot_info_t bootinfo;
 uint8_t *mb_bootinfo_data;
 uint32_t cmdline_len;
+struct multiboot_header *multiboot_header;
 
 /* Ok, let's see if it is a multiboot image.
The header is 12x32bit long, so the latest entry may be 8192 - 48. */
 for (i = 0; i < (8192 - 48); i += 4) {
-if (ldl_p(header+i) == 0x1BADB002) {
-uint32_t checksum = ldl_p(header+i+8);
-flags = ldl_p(header+i+4);
+multiboot_header = (struct multiboot_header *)(header + i);
+if (ldl_p(_header->magic) == MULTIBOOT_HEADER_MAGIC) {
+uint32_t checksum = ldl_p(_header->checksum);
+flags = ldl_p(_header->flags);
 checksum += flags;
-checksum += (uint32_t)0x1BADB002;
+checksum += (uint32_t)MULTIBOOT_HEADER_MAGIC;
 if (!checksum) {
-is_multiboot = 1;
+is_multiboot = true;
 break;
 }
 }
@@ -180,13 +155,13 @@ int load_multiboot(FWCfgState *fw_cfg,
 return 0; /* no multiboot */
 
 mb_debug("qemu: I believe we found a multiboot image!\n");
-memset(bootinfo, 0, sizeof(bootinfo));
+memset(, 0, sizeof(bootinfo));
 memset(, 0, sizeof(mbs));
 
-if (flags & 0x0004) { /* MULTIBOOT_HEADER_HAS_VBE */
+if (flags & MULTIBOOT_VIDEO_MODE) {
 fprintf(stderr, "qemu: multiboot knows VBE. we don't.\n");
 }
-if (!(flags & 0x0001)) { /* MULTIBOOT_HEADER_HAS_ADDR */
+if (!(flags & MULTIBOOT_AOUT_KLUDGE)) {
 uint64_t elf_entry;
 uint64_t elf_low, elf_high;
 int kernel_size;
@@ -217,12 +192,12 @@ int load_multiboot(FWCfgState *fw_cfg,

Re: [Qemu-devel] [PATCH v2 18/24] fixup! ppc: spapr: use generic cpu_model parsing

2017-10-12 Thread David Gibson

On Thu, Oct 12, 2017 at 05:50:08PM +0200, Igor Mammedov wrote:
> inot sure how it managed to compile locally and on travis

Because target/ppc/kvm.c is only compiled on a ppc _host_.

> but build fails with type mismatch on PPC host, fixup it
> by casting to expected type

I already hit this one and fixed it up in place.

> 
> Signed-off-by: Igor Mammedov 
> ---
>  target/ppc/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 48dc3f7..9d57deb 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2505,7 +2505,7 @@ static int kvm_ppc_register_host_cpu_type(MachineState 
> *ms)
>  }
>  type_info.parent = object_class_get_name(OBJECT_CLASS(pvr_pcc));
>  type_register(_info);
> -if (object_dynamic_cast(ms, TYPE_SPAPR_MACHINE)) {
> +if (object_dynamic_cast(OBJECT(ms), TYPE_SPAPR_MACHINE)) {
>  /* override TCG default cpu type with 'host' cpu model */
>  mc->default_cpu_type = TYPE_HOST_POWERPC_CPU;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Pankaj Gupta


> > Dan,
> >
> > I have a query regarding below patch [*]. My assumption is its halted
> > because of memory hotplug restructuring work? Anything I am missing
> > here?
> >
> > [*] https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg02978.html
> 
> It's fallen to the back of my queue since the original driving need of
> platform firmware changing offsets from boot-to-boot is no longer an
> issue. However, it does mean that you need to arrange for 128MB
> aligned devm_memremap_pages() ranges for the foreseeable future.

o.k I will provide 128MB aligned pages to devm_memremap_pages() function.

Thanks,
Pankaj

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Dan Williams

On Thu, Oct 12, 2017 at 3:52 PM, Pankaj Gupta  wrote:
> Dan,
>
> I have a query regarding below patch [*]. My assumption is its halted
> because of memory hotplug restructuring work? Anything I am missing
> here?
>
> [*] https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg02978.html

It's fallen to the back of my queue since the original driving need of
platform firmware changing offsets from boot-to-boot is no longer an
issue. However, it does mean that you need to arrange for 128MB
aligned devm_memremap_pages() ranges for the foreseeable future.

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Pankaj Gupta


> > > wrote:
> > > > 
> > > > > >   This patch adds virtio-pmem driver for KVM guest.
> > > > > >   Guest reads the persistent memory range information
> > > > > >   over virtio bus from Qemu and reserves the range
> > > > > >   as persistent memory. Guest also allocates a block
> > > > > >   device corresponding to the pmem range which later
> > > > > >   can be accessed with DAX compatible file systems.
> > > > > >   Idea is to use the virtio channel between guest and
> > > > > >   host to perform the block device flush for guest pmem
> > > > > >   DAX device.
> > > > > > 
> > > > > >   There is work to do including DAX file system support
> > > > > >   and other advanced features.
> > > > > > 
> > > > > > Signed-off-by: Pankaj Gupta 
> > > > > > ---
> > > > > >  drivers/virtio/Kconfig   |  10 ++
> > > > > >  drivers/virtio/Makefile  |   1 +
> > > > > >  drivers/virtio/virtio_pmem.c | 322
> > > > > >  +++
> > > > > >  include/uapi/linux/virtio_pmem.h |  55 +++
> > > > > >  4 files changed, 388 insertions(+)
> > > > > >  create mode 100644 drivers/virtio/virtio_pmem.c
> > > > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > > > > 
> > > > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > > > index cff773f15b7e..0192c4bda54b 100644
> > > > > > --- a/drivers/virtio/Kconfig
> > > > > > +++ b/drivers/virtio/Kconfig
> > > > > > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
> > > > > > 
> > > > > >   If unsure, say Y.
> > > > > > 
> > > > > > +config VIRTIO_PMEM
> > > > > > +   tristate "Virtio pmem driver"
> > > > > > +   depends on VIRTIO
> > > > > > +   ---help---
> > > > > > +This driver adds persistent memory range within a
> > > > > > KVM guest.
> 
> With "Virtio Block Backed Pmem" we could name the config
> option VIRTIO_BLOCK_PMEM
> 
> The documentation text could make it clear to people that the
> image shows up as a disk image on the host, but as a pmem
> memory range in the guest.
> 
> > > > > I think we need to call this something other than persistent
> > > > > memory to
> > > > > make it clear that this not memory where the persistence can be
> > > > > managed from userspace. The persistence point always requires
> > > > > 
> > > So currently /proc/iomem in a guest with a pmem device attached to
> > > a
> > > namespace looks like this:
> > > 
> > > c-13bfff : Persistent Memory
> > >    c-13bfff : namespace2.0
> > > 
> > > Can we call it "Virtio Shared Memory" to make it clear it is a
> > > different beast than typical "Persistent Memory"?  You can likely
> > 
> > I think somewhere we need persistent keyword 'Virtio Persistent
> > Memory' or
> > so.
> 
> Still hoping for better ideas than "Virtio Block Backed Pmem" :)
> 

Dan,

I have a query regarding below patch [*]. My assumption is its halted
because of memory hotplug restructuring work? Anything I am missing 
here?

[*] https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg02978.html

Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimally implement nbd structured read extension.
> 
> v3:
> 
> clone: tag up-nbd-minimal-structured-read-v3 from 
> https://src.openvz.org/scm/~vsementsov/qemu.git
> online: 
> https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v3
> 
> 03: rename  nbd_co_send_reply here too, remove r-b
> 04-07: splitted nbd server refactoring, with some changes

English does not have 'splitted'; 'split' is one of those irregular
verbs that is the same form for present, past, and past participle tenses.

I've pushed 1-7 with tweaks to my NBD staging queue:
http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/nbd

Let me know if I need to redo any of the tweaks I made to your work.

> 08: add Eric's r-b
> 13: rewrite receiving loops by Paolo's suggestion
> 

and I will resume review of the rest tomorrow.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Pankaj Gupta

> > > > 
> > > > > >   This patch adds virtio-pmem driver for KVM guest.
> > > > > >   Guest reads the persistent memory range information
> > > > > >   over virtio bus from Qemu and reserves the range
> > > > > >   as persistent memory. Guest also allocates a block
> > > > > >   device corresponding to the pmem range which later
> > > > > >   can be accessed with DAX compatible file systems.
> > > > > >   Idea is to use the virtio channel between guest and
> > > > > >   host to perform the block device flush for guest pmem
> > > > > >   DAX device.
> > > > > > 
> > > > > >   There is work to do including DAX file system support
> > > > > >   and other advanced features.
> > > > > > 
> > > > > > Signed-off-by: Pankaj Gupta 
> > > > > > ---
> > > > > >  drivers/virtio/Kconfig   |  10 ++
> > > > > >  drivers/virtio/Makefile  |   1 +
> > > > > >  drivers/virtio/virtio_pmem.c | 322
> > > > > >  +++
> > > > > >  include/uapi/linux/virtio_pmem.h |  55 +++
> > > > > >  4 files changed, 388 insertions(+)
> > > > > >  create mode 100644 drivers/virtio/virtio_pmem.c
> > > > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > > > > 
> > > > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > > > index cff773f15b7e..0192c4bda54b 100644
> > > > > > --- a/drivers/virtio/Kconfig
> > > > > > +++ b/drivers/virtio/Kconfig
> > > > > > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
> > > > > > 
> > > > > >   If unsure, say Y.
> > > > > > 
> > > > > > +config VIRTIO_PMEM
> > > > > > +   tristate "Virtio pmem driver"
> > > > > > +   depends on VIRTIO
> > > > > > +   ---help---
> > > > > > +This driver adds persistent memory range within a
> > > > > > KVM guest.
> 
> With "Virtio Block Backed Pmem" we could name the config
> option VIRTIO_BLOCK_PMEM
> 
> The documentation text could make it clear to people that the
> image shows up as a disk image on the host, but as a pmem
> memory range in the guest.

yes, this looks better. 
thank you.

> 
> > > > > I think we need to call this something other than persistent
> > > > > memory to
> > > > > make it clear that this not memory where the persistence can be
> > > > > managed from userspace. The persistence point always requires
> > > > > 
> > > So currently /proc/iomem in a guest with a pmem device attached to
> > > a
> > > namespace looks like this:
> > > 
> > > c-13bfff : Persistent Memory
> > >    c-13bfff : namespace2.0
> > > 
> > > Can we call it "Virtio Shared Memory" to make it clear it is a
> > > different beast than typical "Persistent Memory"?  You can likely
> > 
> > I think somewhere we need persistent keyword 'Virtio Persistent
> > Memory' or
> > so.
> 
> Still hoping for better ideas than "Virtio Block Backed Pmem" :)

:-)
>

Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission

2017-10-12 Thread Eric Blake

On 10/12/2017 05:27 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
>> with a cork.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>  nbd/server.c | 50 --
>>  1 file changed, 24 insertions(+), 26 deletions(-)
>>
> 
>> @@ -1203,36 +1220,17 @@ static int nbd_co_send_simple_reply(NBDClient 
>> *client,
>>  size_t len,
>>  Error **errp)
>>  {
>> -NBDSimpleReply simple_reply;
>> -int ret;
>> -
>> -g_assert(qemu_in_coroutine());
>> +NBDSimpleReply reply;
> 
> Why the rename from simple_reply to reply?
> 
>> +struct iovec iov[] = {
>> +{.iov_base = , .iov_len = sizeof(reply)},
> 
> I guess it made this line shorter.  But we could reduce some churn by
> naming it 'reply' in the first place, back in earlier patches.  At the
> same time, I'm not going to bother if there's not a reason to respin the
> series (or at least the first half).

Answering myself - you couldn't use the name 'reply' until 5/13 removed
it as a parameter name, even though you introduced the name
'simple_reply' in 4/13.  Okay, the rename is fine here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission

2017-10-12 Thread Eric Blake

On 10/12/2017 05:27 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
>> with a cork.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>  nbd/server.c | 50 --
>>  1 file changed, 24 insertions(+), 26 deletions(-)
>>

Subject-line consistency: elsewhere you used nbd/server instead of
nbd-server. I'll make the obvious tweak.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Rik van Riel

On Thu, 2017-10-12 at 18:18 -0400, Pankaj Gupta wrote:
> > 
> > On Thu, Oct 12, 2017 at 2:25 PM, Pankaj Gupta 
> > wrote:
> > > 
> > > > >   This patch adds virtio-pmem driver for KVM guest.
> > > > >   Guest reads the persistent memory range information
> > > > >   over virtio bus from Qemu and reserves the range
> > > > >   as persistent memory. Guest also allocates a block
> > > > >   device corresponding to the pmem range which later
> > > > >   can be accessed with DAX compatible file systems.
> > > > >   Idea is to use the virtio channel between guest and
> > > > >   host to perform the block device flush for guest pmem
> > > > >   DAX device.
> > > > > 
> > > > >   There is work to do including DAX file system support
> > > > >   and other advanced features.
> > > > > 
> > > > > Signed-off-by: Pankaj Gupta 
> > > > > ---
> > > > >  drivers/virtio/Kconfig   |  10 ++
> > > > >  drivers/virtio/Makefile  |   1 +
> > > > >  drivers/virtio/virtio_pmem.c | 322
> > > > >  +++
> > > > >  include/uapi/linux/virtio_pmem.h |  55 +++
> > > > >  4 files changed, 388 insertions(+)
> > > > >  create mode 100644 drivers/virtio/virtio_pmem.c
> > > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > > > 
> > > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > > index cff773f15b7e..0192c4bda54b 100644
> > > > > --- a/drivers/virtio/Kconfig
> > > > > +++ b/drivers/virtio/Kconfig
> > > > > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
> > > > > 
> > > > >   If unsure, say Y.
> > > > > 
> > > > > +config VIRTIO_PMEM
> > > > > +   tristate "Virtio pmem driver"
> > > > > +   depends on VIRTIO
> > > > > +   ---help---
> > > > > +This driver adds persistent memory range within a
> > > > > KVM guest.

With "Virtio Block Backed Pmem" we could name the config
option VIRTIO_BLOCK_PMEM

The documentation text could make it clear to people that the
image shows up as a disk image on the host, but as a pmem
memory range in the guest.

> > > > I think we need to call this something other than persistent
> > > > memory to
> > > > make it clear that this not memory where the persistence can be
> > > > managed from userspace. The persistence point always requires
> > > > 
> > So currently /proc/iomem in a guest with a pmem device attached to
> > a
> > namespace looks like this:
> > 
> > c-13bfff : Persistent Memory
> >    c-13bfff : namespace2.0
> > 
> > Can we call it "Virtio Shared Memory" to make it clear it is a
> > different beast than typical "Persistent Memory"?  You can likely
> 
> I think somewhere we need persistent keyword 'Virtio Persistent
> Memory' or 
> so.

Still hoping for better ideas than "Virtio Block Backed Pmem" :)

Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
> with a cork.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  nbd/server.c | 50 --
>  1 file changed, 24 insertions(+), 26 deletions(-)
> 

> @@ -1203,36 +1220,17 @@ static int nbd_co_send_simple_reply(NBDClient *client,
>  size_t len,
>  Error **errp)
>  {
> -NBDSimpleReply simple_reply;
> -int ret;
> -
> -g_assert(qemu_in_coroutine());
> +NBDSimpleReply reply;

Why the rename from simple_reply to reply?

> +struct iovec iov[] = {
> +{.iov_base = , .iov_len = sizeof(reply)},

I guess it made this line shorter.  But we could reduce some churn by
naming it 'reply' in the first place, back in earlier patches.  At the
same time, I'm not going to bother if there's not a reason to respin the
series (or at least the first half).

> +{.iov_base = data, .iov_len = len}
> +};
>  
>  trace_nbd_co_send_simple_reply(handle, error, len);
>  
> -set_be_simple_reply(_reply, system_errno_to_nbd_errno(error),
> -handle);
> -
> -qemu_co_mutex_lock(>send_lock);
> -client->send_coroutine = qemu_coroutine_self();
> -
> -if (!len) {
> -ret = nbd_write(client->ioc, _reply, sizeof(simple_reply), 
> NULL);
> -} else {
> -qio_channel_set_cork(client->ioc, true);
> -ret = nbd_write(client->ioc, _reply, sizeof(simple_reply), 
> NULL);
> -if (ret == 0) {
> -ret = nbd_write(client->ioc, data, len, errp);
> -if (ret < 0) {
> -ret = -EIO;
> -}
> -}
> -qio_channel_set_cork(client->ioc, false);
> -}
> +set_be_simple_reply(, system_errno_to_nbd_errno(error), handle);
>  
> -client->send_coroutine = NULL;
> -qemu_co_mutex_unlock(>send_lock);
> -return ret;
> +return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);

This part is definitely nicer!  Thanks for splitting the v2 patch, it
made review a lot more pleasant.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 06/13] nbd/server: refactor nbd_co_send_simple_reply parameters

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Pass client and buffer (*data) parameters directly, to make the function
> consistent with further structured reply sending functions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  nbd/server.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Pankaj Gupta


> 
> On Thu, Oct 12, 2017 at 2:25 PM, Pankaj Gupta  wrote:
> >
> >> >   This patch adds virtio-pmem driver for KVM guest.
> >> >   Guest reads the persistent memory range information
> >> >   over virtio bus from Qemu and reserves the range
> >> >   as persistent memory. Guest also allocates a block
> >> >   device corresponding to the pmem range which later
> >> >   can be accessed with DAX compatible file systems.
> >> >   Idea is to use the virtio channel between guest and
> >> >   host to perform the block device flush for guest pmem
> >> >   DAX device.
> >> >
> >> >   There is work to do including DAX file system support
> >> >   and other advanced features.
> >> >
> >> > Signed-off-by: Pankaj Gupta 
> >> > ---
> >> >  drivers/virtio/Kconfig   |  10 ++
> >> >  drivers/virtio/Makefile  |   1 +
> >> >  drivers/virtio/virtio_pmem.c | 322
> >> >  +++
> >> >  include/uapi/linux/virtio_pmem.h |  55 +++
> >> >  4 files changed, 388 insertions(+)
> >> >  create mode 100644 drivers/virtio/virtio_pmem.c
> >> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >> >
> >> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> >> > index cff773f15b7e..0192c4bda54b 100644
> >> > --- a/drivers/virtio/Kconfig
> >> > +++ b/drivers/virtio/Kconfig
> >> > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
> >> >
> >> >   If unsure, say Y.
> >> >
> >> > +config VIRTIO_PMEM
> >> > +   tristate "Virtio pmem driver"
> >> > +   depends on VIRTIO
> >> > +   ---help---
> >> > +This driver adds persistent memory range within a KVM guest.
> >>
> >> I think we need to call this something other than persistent memory to
> >> make it clear that this not memory where the persistence can be
> >> managed from userspace. The persistence point always requires a driver
> >> call, so this is something distinctly different than "persistent
> >> memory". For example, it's a bug if this memory range ends up backing
> >> a device-dax range in the guest where there is no such thing as a
> >> driver callback to perform the flushing. How does this solution
> >> protect against that scenario?
> >
> > yes, you are right we are not providing device_dax in this case so it
> > should
> > be clear from name. Any suggestion for name?
> 
> So currently /proc/iomem in a guest with a pmem device attached to a
> namespace looks like this:
> 
> c-13bfff : Persistent Memory
>c-13bfff : namespace2.0
> 
> Can we call it "Virtio Shared Memory" to make it clear it is a
> different beast than typical "Persistent Memory"?  You can likely

I think somewhere we need persistent keyword 'Virtio Persistent Memory' or 
so.

> inject your own name into the resource tree the same way we do in the
> NFIT driver. See acpi_nfit_insert_resource().

Sure! thank you.

Re: [Qemu-devel] [PATCH v3 05/13] nbd/server: do not use NBDReply structure

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBDReply structure will be upgraded in future patches to handle both
> simple and structured replies and will be used only in the client
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  nbd/server.c | 36 +++-
>  1 file changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 43ade30ba3..3878145f63 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1196,7 +1196,9 @@ static inline void set_be_simple_reply(NBDSimpleReply 
> *reply, uint64_t error,
>  stq_be_p(>handle, handle);
>  }
>  
> -static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
> +static int nbd_co_send_simple_reply(NBDRequestData *req,
> +uint64_t handle,
> +uint32_t error,

Feels a bit like a step backwards for the parameter list, but I'm not
opposed to the patch (it does make for some nice cleanup elsewhere).

>  int len, Error **errp)
>  {
>  NBDClient *client = req->client;
> @@ -1205,10 +1207,10 @@ static int nbd_co_send_simple_reply(NBDRequestData 
> *req, NBDReply *reply,
>  
>  g_assert(qemu_in_coroutine());
>  
> -trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
> +trace_nbd_co_send_simple_reply(handle, error, len);

I had to tweak this to rebase on top of my churn on 4/13.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Dan Williams

On Thu, Oct 12, 2017 at 2:25 PM, Pankaj Gupta  wrote:
>
>> >   This patch adds virtio-pmem driver for KVM guest.
>> >   Guest reads the persistent memory range information
>> >   over virtio bus from Qemu and reserves the range
>> >   as persistent memory. Guest also allocates a block
>> >   device corresponding to the pmem range which later
>> >   can be accessed with DAX compatible file systems.
>> >   Idea is to use the virtio channel between guest and
>> >   host to perform the block device flush for guest pmem
>> >   DAX device.
>> >
>> >   There is work to do including DAX file system support
>> >   and other advanced features.
>> >
>> > Signed-off-by: Pankaj Gupta 
>> > ---
>> >  drivers/virtio/Kconfig   |  10 ++
>> >  drivers/virtio/Makefile  |   1 +
>> >  drivers/virtio/virtio_pmem.c | 322
>> >  +++
>> >  include/uapi/linux/virtio_pmem.h |  55 +++
>> >  4 files changed, 388 insertions(+)
>> >  create mode 100644 drivers/virtio/virtio_pmem.c
>> >  create mode 100644 include/uapi/linux/virtio_pmem.h
>> >
>> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>> > index cff773f15b7e..0192c4bda54b 100644
>> > --- a/drivers/virtio/Kconfig
>> > +++ b/drivers/virtio/Kconfig
>> > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
>> >
>> >   If unsure, say Y.
>> >
>> > +config VIRTIO_PMEM
>> > +   tristate "Virtio pmem driver"
>> > +   depends on VIRTIO
>> > +   ---help---
>> > +This driver adds persistent memory range within a KVM guest.
>>
>> I think we need to call this something other than persistent memory to
>> make it clear that this not memory where the persistence can be
>> managed from userspace. The persistence point always requires a driver
>> call, so this is something distinctly different than "persistent
>> memory". For example, it's a bug if this memory range ends up backing
>> a device-dax range in the guest where there is no such thing as a
>> driver callback to perform the flushing. How does this solution
>> protect against that scenario?
>
> yes, you are right we are not providing device_dax in this case so it should
> be clear from name. Any suggestion for name?

So currently /proc/iomem in a guest with a pmem device attached to a
namespace looks like this:

c-13bfff : Persistent Memory
   c-13bfff : namespace2.0

Can we call it "Virtio Shared Memory" to make it clear it is a
different beast than typical "Persistent Memory"?  You can likely
inject your own name into the resource tree the same way we do in the
NFIT driver. See acpi_nfit_insert_resource().

Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Use packed structure instead of pointer arithmetics.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |  6 ++
>  nbd/server.c| 36 ++--
>  2 files changed, 20 insertions(+), 22 deletions(-)
> 

>  if (!len) {
> -ret = nbd_send_reply(client->ioc, reply, errp);
> +ret = nbd_write(client->ioc, _reply, sizeof(simple_reply), 
> NULL);
>  } else {
>  qio_channel_set_cork(client->ioc, true);
> -ret = nbd_send_reply(client->ioc, reply, errp);
> +ret = nbd_write(client->ioc, _reply, sizeof(simple_reply), 
> NULL);

One more thing: this should be errp, not NULL - we don't want to lose
the error, even if the next patch changes things yet again.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending

2017-10-12 Thread Eric Blake

On 10/12/2017 04:42 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Use packed structure instead of pointer arithmetics.
> 

>> +set_be_simple_reply(_reply, 
>> system_errno_to_nbd_errno(reply->error),
>> +reply->handle);
>> +
> 
> ...but it always occurred immediately after another trace that has
> redundant information (well, the trace you kept shows pre- rather than
> post-translation of errno value to NBD wire value,

On second thought, tracing what gets sent over the wire is probably
nicer than what we had internally (especially if the client traces what
it receives - it's nice to match up values on both sides of the wire).

> 
> With that change,
> Reviewed-by: Eric Blake 
> 

So here's what I'm squashing in:

diff --git i/nbd/server.c w/nbd/server.c
index 69cd2cda76..65c08fa1cc 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1201,14 +1201,13 @@ static int
nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
 {
 NBDClient *client = req->client;
 NBDSimpleReply simple_reply;
+int nbd_err = system_errno_to_nbd_errno(reply->error);
 int ret;

 g_assert(qemu_in_coroutine());

-trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
-
-set_be_simple_reply(_reply,
system_errno_to_nbd_errno(reply->error),
-reply->handle);
+trace_nbd_co_send_simple_reply(reply->handle, nbd_err, len);
+set_be_simple_reply(_reply, nbd_err, reply->handle);

 qemu_co_mutex_lock(>send_lock);
 client->send_coroutine = qemu_coroutine_self();
diff --git i/nbd/trace-events w/nbd/trace-events
index 4d6f86c2d4..e27614f050 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -51,7 +51,6 @@ nbd_negotiate_old_style(uint64_t size, unsigned flags)
"advertising size %" PRIu
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags)
"advertising size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
 nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type,
uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ",
.flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len =
%" PRIu32 " }"
-nbd_send_reply(int32_t error, uint64_t handle) "Sending response to
client: { .error = %" PRId32 ", handle = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching
clients to AIO context %p\n"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching
clients from AIO context %p\n"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len)
"Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] target/i386: trap on instructions longer than >15 bytes

2017-10-12 Thread Paolo Bonzini

On 12/10/2017 21:30, Richard Henderson wrote:
> On 10/12/2017 07:35 AM, Paolo Bonzini wrote:
>> Besides being more correct, arbitrarily long instruction allow the
>> generation of a translation block that spans three pages.  This
>> confuses the generator and even allows ring 3 code to poison the
>> translation block cache and inject code into other processes that are
>> in guest ring 3.
>>
>> This is an improved (and more invasive) fix for the bug fixed in commit
>> 30663fd ("tcg/i386: Check the size of instruction being translated",
>> 2017-03-24).  In addition to being more precise (and generating the
>> right exception, which is #GP rather than #UD), it distinguishes better
>> between page faults and too long instructions, as shown by this test case:
>>
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>> char *x = mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
>>MAP_PRIVATE|MAP_ANON, -1, 0);
>> memset(x, 0x66, 4096);
>> x[4096] = 0x90;
>> x[4097] = 0xc3;
>> char *i = x + 4096 - 15;
>> mprotect(x + 4096, 4096, PROT_READ|PROT_WRITE);
>> ((void(*)(void)) i) ();
>> }
>>
>> ... which produces a #GP without the mprotect, and a #PF with it.
>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  target/i386/translate.c | 29 ++---
>>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> Reviewed-by: Richard Henderson 
> 
>> +if (sigsetjmp(s->jmpbuf, 0) != 0) {
> 
> Any particular reason to use sigsetjmp(x, 0) instead of setjmp(x)?
> Certainly there are no signal frames that the longjmp will pass...

sigsetjmp is used to _not_ save the signal mask.  On OS X setjmp saves
the signal mask by default, which is slower.

Paolo

Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Use packed structure instead of pointer arithmetics.

English is fun!  In the subject line, I'm fairly certain that
"structurize" is not likely to be in any dictionary, yet it is a perfect
word describing the patch, so I'm not touching it ;)

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |  6 ++
>  nbd/server.c| 36 ++--
>  2 files changed, 20 insertions(+), 22 deletions(-)
> 

> +++ b/nbd/server.c
> @@ -902,26 +902,6 @@ static int nbd_receive_request(QIOChannel *ioc, 
> NBDRequest *request,
>  return 0;
>  }
>  
> -static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
> -{
> -uint8_t buf[NBD_REPLY_SIZE];
> -
> -reply->error = system_errno_to_nbd_errno(reply->error);
> -
> -trace_nbd_send_reply(reply->error, reply->handle);

We lose a trace here...

>  static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
>  int len, Error **errp)
>  {
>  NBDClient *client = req->client;
> +NBDSimpleReply simple_reply;
>  int ret;
>  
>  g_assert(qemu_in_coroutine());
>  
>  trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
>  
> +set_be_simple_reply(_reply, 
> system_errno_to_nbd_errno(reply->error),
> +reply->handle);
> +

...but it always occurred immediately after another trace that has
redundant information (well, the trace you kept shows pre- rather than
post-translation of errno value to NBD wire value, and the trace you
drop didn't show length).  I'm fine with the reduction, but it needs a
tweak to trace-events to reap the dead trace.

With that change,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 1/2] pmem: Move reusable code to base header files

2017-10-12 Thread Pankaj Gupta


> 
> On Thu, Oct 12, 2017 at 8:50 AM, Pankaj Gupta  wrote:
> >  This patch moves common code to base header files
> >  so that it can be used for both ACPI pmem and VIRTIO pmem
> >  drivers. More common code needs to be moved out in future
> >  based on functionality required for virtio_pmem driver and
> >  coupling of code with existing ACPI pmem driver.
> >
> > Signed-off-by: Pankaj Gupta 
> [..]
> > diff --git a/include/linux/pmem_common.h b/include/linux/pmem_common.h
> > new file mode 100644
> > index ..e2e718c74b3f
> > --- /dev/null
> > +++ b/include/linux/pmem_common.h
> 
> This should be a common C file, not a header.

Sure! will create a common C file to put all the common code there.

> 
>

Re: [Qemu-devel] [PATCH v3 03/13] nbd: rename some simple-request related objects to be _simple_

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> To be consistent when their _structured_analogs will be introduced.

space before analogs

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  nbd/nbd-internal.h   |  2 +-
>  nbd/client.c |  4 ++--
>  nbd/server.c | 12 ++--
>  nbd/trace-events |  2 +-
>  tests/qemu-iotests/nbd-fault-injector.py |  4 ++--
>  5 files changed, 12 insertions(+), 12 deletions(-)
> 

> +++ b/nbd/trace-events
> @@ -54,7 +54,7 @@ nbd_receive_request(uint32_t magic, uint16_t flags, 
> uint16_t type, uint64_t from
>  nbd_send_reply(int32_t error, uint64_t handle) "Sending response to client: 
> { .error = %" PRId32 ", handle = %" PRIu64 " }"
>  nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching 
> clients to AIO context %p\n"
>  nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching 
> clients from AIO context %p\n"
> -nbd_co_send_reply(uint64_t handle, uint32_t error, int len) "Send reply: 
> handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
> +nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len) "Send 
> reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"

I'd also insert 'simple' in the log message.

Interesting that we trace when the server sends a simple message, but
the client does not (yet) trace which type of message it receives; but
that can be for later patches (I haven't checked if you already added it
later in this series)

Changes are small enough that I can make them as part of queueing them
on my NBD list, with:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 5/5] openrisc: Only kick cpu on timeout, not on update

2017-10-12 Thread Richard Henderson

On 08/22/2017 10:57 PM, Stafford Horne wrote:
> Previously we were kicking the cpu on every update.  This caused
> problems noticeable in SMP configurations where one CPU got pinned
> continuously servicing timer exceptions.
> 
> Signed-off-by: Stafford Horne 
> ---
>  hw/openrisc/cputimer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH 4/5] openrisc: Initial SMP support

2017-10-12 Thread Richard Henderson

On 08/22/2017 10:57 PM, Stafford Horne wrote:
> Wire in ompic and add basic support for SMP.  The OpenRISC is special in
> that interrupts for devices are routed to each core's PIC.  This is
> achieved using the qemu_irq_split utility, but this currently limits
> OpenRISC to 2 cores.
> 
> This models the reference architecture described in the OpenRISC spec
> 1.2 proposal.
> 
>   
> https://github.com/stffrdhrn/doc/raw/arch-1.2-proposal/openrisc-arch-1.2-rev0.pdf
> 
> The changes to the intialization of the sim include:
> 
> CPU Reset
>  o Reset each cpu to the bootstrap PC rather than only a single cpu as
>done before.
>  o During Kernel loading the bootstrap PC is saved in a static global.
> 
> Network Initialization
>  o Connect the interrupt to each CPU
>  o Use more simple sysbus_mmio_map() rather than memory_region_add_subregion()
> 
> Sim Initialization
>  o Initialize the pic and tick timer per cpu
>  o Wire in the OMPIC if SMP is enabled
>  o Wire the serial irq to each CPU using qemu_irq_split()
> 
> Signed-off-by: Stafford Horne 
> ---
>  hw/openrisc/openrisc_sim.c | 84 
> +-
>  1 file changed, 61 insertions(+), 23 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Pankaj Gupta


> >   This patch adds virtio-pmem driver for KVM guest.
> >   Guest reads the persistent memory range information
> >   over virtio bus from Qemu and reserves the range
> >   as persistent memory. Guest also allocates a block
> >   device corresponding to the pmem range which later
> >   can be accessed with DAX compatible file systems.
> >   Idea is to use the virtio channel between guest and
> >   host to perform the block device flush for guest pmem
> >   DAX device.
> >
> >   There is work to do including DAX file system support
> >   and other advanced features.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/virtio/Kconfig   |  10 ++
> >  drivers/virtio/Makefile  |   1 +
> >  drivers/virtio/virtio_pmem.c | 322
> >  +++
> >  include/uapi/linux/virtio_pmem.h |  55 +++
> >  4 files changed, 388 insertions(+)
> >  create mode 100644 drivers/virtio/virtio_pmem.c
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index cff773f15b7e..0192c4bda54b 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
> >
> >   If unsure, say Y.
> >
> > +config VIRTIO_PMEM
> > +   tristate "Virtio pmem driver"
> > +   depends on VIRTIO
> > +   ---help---
> > +This driver adds persistent memory range within a KVM guest.
> 
> I think we need to call this something other than persistent memory to
> make it clear that this not memory where the persistence can be
> managed from userspace. The persistence point always requires a driver
> call, so this is something distinctly different than "persistent
> memory". For example, it's a bug if this memory range ends up backing
> a device-dax range in the guest where there is no such thing as a
> driver callback to perform the flushing. How does this solution
> protect against that scenario?

yes, you are right we are not providing device_dax in this case so it should
be clear from name. Any suggestion for name?  

> 
> > + It also associates a block device corresponding to the pmem
> > +range.
> > +
> > +If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> > tristate "Virtio balloon driver"
> > depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 41e30e3dc842..032ade725cc2 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> > diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> > new file mode 100644
> > index ..74e47cae0e24
> > --- /dev/null
> > +++ b/drivers/virtio/virtio_pmem.c
> > @@ -0,0 +1,322 @@
> > +/*
> > + * virtio-pmem driver
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +void devm_vpmem_disable(struct device *dev, struct resource *res, void
> > *addr)
> > +{
> > +   devm_memunmap(dev, addr);
> > +   devm_release_mem_region(dev, res->start, resource_size(res));
> > +}
> > +
> > +static void pmem_flush_done(struct virtqueue *vq)
> > +{
> > +   return;
> > +};
> > +
> > +static void virtio_pmem_release_queue(void *q)
> > +{
> > +   blk_cleanup_queue(q);
> > +}
> > +
> > +static void virtio_pmem_freeze_queue(void *q)
> > +{
> > +   blk_freeze_queue_start(q);
> > +}
> > +
> > +static void virtio_pmem_release_disk(void *__pmem)
> > +{
> > +   struct virtio_pmem *pmem = __pmem;
> > +
> > +   del_gendisk(pmem->disk);
> > +   put_disk(pmem->disk);
> > +}
> 
> This code seems identical to the base pmem case, it should move to the
> shared code object.

Sure!
> 
> > +
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +   struct virtqueue *vq;
> > +
> > +   /* single vq */
> > +   vq = virtio_find_single_vq(vpmem->vdev, pmem_flush_done,
> > "flush_queue");
> > +
> > +   if (IS_ERR(vq))
> > +   return PTR_ERR(vq);
> > +
> > +   return 0;
> > +}
> > +
> > +static struct vmem_altmap *setup_pmem_pfn(struct virtio_pmem *vpmem,
> > +   struct resource *res, struct vmem_altmap *altmap)
> > +{
> > +   u32 start_pad = 0, end_trunc = 0;
> > +   resource_size_t start, size;
> > +   unsigned long npfns;
> > +   phys_addr_t offset;
> > +
> > +   size = resource_size(res);
> > +   start = PHYS_SECTION_ALIGN_DOWN(res->start);
> > +
> > +   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
> > +   IORES_DESC_NONE) ==

Re: [Qemu-devel] [PATCH v3 01/13] block/nbd-client: assert qiov len once in nbd_co_request

2017-10-12 Thread Eric Blake

On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Also improve the assertion: check that qiov is NULL for other commands
> than CMD_READ and CMD_WRITE.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd-client.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)

Looks identical to v2, where I gave R-b:
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01903.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] tcg: Initialize cpu_env generically

2017-10-12 Thread Richard Henderson

On 10/12/2017 01:32 PM, Lluís Vilanova wrote:
> Richard Henderson writes:
> 
>> On 10/11/2017 11:43 PM, Lluís Vilanova wrote:
 /* Track which vCPU triggers events */
 CPUState *cpu;  /* *_trans */
 -TCGv_env tcg_env;   /* *_exec  */
>>>
>>> I would rather keep it here instead of making a new global variable, since 
>>> that
>>> should make it easier in the future to have multiple translation contexts.
> 
>> Why do you believe this prevents it?  The variable is literally identical for
>> *all* targets.
> 
> If someone decides to make tcg_ctx thread-local or have one per target
> architecture (supporting multiple target archs concurrently), having all that
> info on the tcg_ctx object is easier to track and modify, compared to having
> multiple global variables.

We have patches on-list that do exactly this, from Emilio Cota.  I'm in the
process of reviewing them now.

We've made sure that patch set works with the cpu_* global variables created
once within target/*/translate.c.  This one is no different.


r~

Re: [Qemu-devel] [Bug 1723161] Re: Migration failing in qemu-2.10.1 but working qemu-2.9.1 and earlier with same options

2017-10-12 Thread William Tambe

Hi David,

Please let me know if you have preferred name.

I am an AMD employee.

Including my colleagues to this thread ...

The only error on the source side is the migration failure reported on
the source monitor when doing "info migration" .

It was indeed something related to the pflash, because migration will
work when removing the following entries:
-drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly
-drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd

I will take a look at the migration/block.c block_load code as you
suggested.

Sincerely,
William Tambe


On Thu, Oct 12, 2017 at 2:24 PM, Dr. David Alan Gilbert
 wrote:
> Interesting; As a test could you try removing the cdrom entry (I doubt 
> that'll help but it's worth a go).
> Also, is it possible for you to try a bisection?
>
> If I was to guess, I'd bet it's something relating to the pflash, but I
> don't know; it's unfortunate the block migration doesn't say what it got
> upset by.
>
> Are there no errors on the source side?
>
> Assuming there are no errors on the source side, I'd have to suggest
> taking a printf to migration/block.c block_load to find out which exit
> it's falling out of in there (at least I think that's what would cause
> that error)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1723161
>
> Title:
>   Migration failing in qemu-2.10.1 but working qemu-2.9.1 and earlier
>   with same options
>
> Status in QEMU:
>   New
>
> Bug description:
>
>   Qemu-2.10.1 migration failing with the following error:
>   Receiving block device images
>   qemu-system-x86_64: error while loading section id 2(block)
>   qemu-system-x86_64: load of migration failed: Input/output error
>
>   Migration is setup on the destination system of the migration using:
>   -incoming tcp:0:
>
>   Migration is initiated from the source using the following commands in its 
> qemu monitor:
>   migrate -b "tcp:localhost:"
>
>   The command-line used in both the source and destination is:
>   qemu-system-x86_64 \
>   -nodefaults \
>   -pidfile vm0.pid \
>   -enable-kvm \
>   -machine q35 \
>   -cpu host -smp 2 \
>   -m 4096M \
>   -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly \
>   -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd \
>   -drive file=${HDRIVE},format=qcow2 \
>   -drive media=cdrom \
>   -usb -device usb-tablet \
>   -vga std -vnc :0 \
>   -net nic,macaddr=${TAPMACADDR} -net 
> user,net=192.168.2.0/24,dhcpstart=192.168.2.10 \
>   -serial stdio \
>   -monitor unix:${MONITORSOCKET},server,nowait
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1723161/+subscriptions

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1723161

Title:
  Migration failing in qemu-2.10.1 but working qemu-2.9.1 and earlier
  with same options

Status in QEMU:
  New

Bug description:
  
  Qemu-2.10.1 migration failing with the following error:
  Receiving block device images
  qemu-system-x86_64: error while loading section id 2(block)
  qemu-system-x86_64: load of migration failed: Input/output error

  Migration is setup on the destination system of the migration using:
  -incoming tcp:0:

  Migration is initiated from the source using the following commands in its 
qemu monitor:
  migrate -b "tcp:localhost:"

  The command-line used in both the source and destination is:
  qemu-system-x86_64 \
  -nodefaults \
  -pidfile vm0.pid \
  -enable-kvm \
  -machine q35 \
  -cpu host -smp 2 \
  -m 4096M \
  -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly \
  -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd \
  -drive file=${HDRIVE},format=qcow2 \
  -drive media=cdrom \
  -usb -device usb-tablet \
  -vga std -vnc :0 \
  -net nic,macaddr=${TAPMACADDR} -net 
user,net=192.168.2.0/24,dhcpstart=192.168.2.10 \
  -serial stdio \
  -monitor unix:${MONITORSOCKET},server,nowait

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1723161/+subscriptions

[Qemu-devel] QEMU without X11 support

2017-10-12 Thread hanji unit

Hello, is it possible to run (or rebuild modifying build flags) QEMU
without support for X11 window system integration? I keep seeing XSync
in libX11 being called, so i know QEMU is trying to talk to X. I am
running QEMU using Xvfb since I still need the SDL rendering to work.
Here is how i am running QEMU from a script:

export DISPLAY=:1
Xvfb :1 -screen 0 1024x768x16 &


Thanks.

[Qemu-devel] [PULL 4/4] libvhost-user: Support VHOST_USER_SET_SLAVE_REQ_FD

2017-10-12 Thread Marc-André Lureau

From: "Dr. David Alan Gilbert" 

Allow the qemu to pass us a slave fd.  We don't do anything
with it yet.

Signed-off-by: Dr. David Alan Gilbert 
Message-Id: <20171002191521.15748-5-dgilb...@redhat.com>
Reviewed-by: Marc-André Lureau 
Reviewed-by: Maxime Coquelin 
---
 contrib/libvhost-user/libvhost-user.h |  1 +
 contrib/libvhost-user/libvhost-user.c | 27 ++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index d54683feeb..2f5864b5c4 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -226,6 +226,7 @@ struct VuDev {
 VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
 VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
 int log_call_fd;
+int slave_fd;
 uint64_t log_size;
 uint8_t *log_table;
 uint64_t features;
diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index e36aed72d0..f409bd3d41 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -726,7 +726,8 @@ vu_set_vring_err_exec(VuDev *dev, VhostUserMsg *vmsg)
 static bool
 vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
-uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ;
 
 if (dev->iface->get_protocol_features) {
 features |= dev->iface->get_protocol_features(dev);
@@ -779,6 +780,23 @@ vu_set_vring_enable_exec(VuDev *dev, VhostUserMsg *vmsg)
 return false;
 }
 
+static bool
+vu_set_slave_req_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+if (vmsg->fd_num != 1) {
+vu_panic(dev, "Invalid slave_req_fd message (%d fd's)", vmsg->fd_num);
+return false;
+}
+
+if (dev->slave_fd != -1) {
+close(dev->slave_fd);
+}
+dev->slave_fd = vmsg->fds[0];
+DPRINT("Got slave_fd: %d\n", vmsg->fds[0]);
+
+return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -842,6 +860,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 return vu_get_queue_num_exec(dev, vmsg);
 case VHOST_USER_SET_VRING_ENABLE:
 return vu_set_vring_enable_exec(dev, vmsg);
+case VHOST_USER_SET_SLAVE_REQ_FD:
+return vu_set_slave_req_fd(dev, vmsg);
 case VHOST_USER_NONE:
 break;
 default:
@@ -915,6 +935,10 @@ vu_deinit(VuDev *dev)
 
 
 vu_close_log(dev);
+if (dev->slave_fd != -1) {
+close(dev->slave_fd);
+dev->slave_fd = -1;
+}
 
 if (dev->sock != -1) {
 close(dev->sock);
@@ -945,6 +969,7 @@ vu_init(VuDev *dev,
 dev->remove_watch = remove_watch;
 dev->iface = iface;
 dev->log_call_fd = -1;
+dev->slave_fd = -1;
 for (i = 0; i < VHOST_MAX_NR_VIRTQUEUE; i++) {
 dev->vq[i] = (VuVirtq) {
 .call_fd = -1, .kick_fd = -1, .err_fd = -1,
-- 
2.15.0.rc0.40.gaefcc5f6f

[Qemu-devel] [PULL 3/4] libvhost-user: Update and fix feature and request lists

2017-10-12 Thread Marc-André Lureau

From: "Dr. David Alan Gilbert" 

Update the ProtocolFeature and UserRequest lists to
match hw/virtio/vhost-user.c.
Fix the text labelling in libvhost-user.c to match the list.

Signed-off-by: Dr. David Alan Gilbert 
Message-Id: <20171002191521.15748-4-dgilb...@redhat.com>
Reviewed-by: Marc-André Lureau 
Reviewed-by: Maxime Coquelin 
---
 contrib/libvhost-user/libvhost-user.h |  9 -
 contrib/libvhost-user/libvhost-user.c | 10 +-
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index ff8059257e..d54683feeb 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -34,6 +34,10 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_MQ = 0,
 VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
 VHOST_USER_PROTOCOL_F_RARP = 2,
+VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
+VHOST_USER_PROTOCOL_F_NET_MTU = 4,
+VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
+VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
 
 VHOST_USER_PROTOCOL_F_MAX
 };
@@ -61,7 +65,10 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
 VHOST_USER_SEND_RARP = 19,
-VHOST_USER_INPUT_GET_CONFIG = 20,
+VHOST_USER_NET_SET_MTU = 20,
+VHOST_USER_SET_SLAVE_REQ_FD = 21,
+VHOST_USER_IOTLB_MSG = 22,
+VHOST_USER_SET_VRING_ENDIAN = 23,
 VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index 62be958f59..e36aed72d0 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -56,13 +56,10 @@
 } while (0)
 
 static const char *
-vu_request_to_string(int req)
+vu_request_to_string(unsigned int req)
 {
 #define REQ(req) [req] = #req
 static const char *vu_request_str[] = {
-REQ(VHOST_USER_NONE),
-REQ(VHOST_USER_GET_FEATURES),
-REQ(VHOST_USER_SET_FEATURES),
 REQ(VHOST_USER_NONE),
 REQ(VHOST_USER_GET_FEATURES),
 REQ(VHOST_USER_SET_FEATURES),
@@ -83,7 +80,10 @@ vu_request_to_string(int req)
 REQ(VHOST_USER_GET_QUEUE_NUM),
 REQ(VHOST_USER_SET_VRING_ENABLE),
 REQ(VHOST_USER_SEND_RARP),
-REQ(VHOST_USER_INPUT_GET_CONFIG),
+REQ(VHOST_USER_NET_SET_MTU),
+REQ(VHOST_USER_SET_SLAVE_REQ_FD),
+REQ(VHOST_USER_IOTLB_MSG),
+REQ(VHOST_USER_SET_VRING_ENDIAN),
 REQ(VHOST_USER_MAX),
 };
 #undef REQ
-- 
2.15.0.rc0.40.gaefcc5f6f

[Qemu-devel] [PULL 0/4] vhost-user patches

2017-10-12 Thread Marc-André Lureau

The following changes since commit a0b261db8c030813e30a39eae47359ac2a37f7e2:

  Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' 
into staging (2017-10-12 10:02:09 +0100)

are available in the Git repository at:

  https://github.com/elmarco/qemu.git tags/vu-pull-request

for you to fetch changes up to 13384f158ce043744cf7542162a34953c7e05a70:

  libvhost-user: Support VHOST_USER_SET_SLAVE_REQ_FD (2017-10-12 16:57:50 +0200)





Dr. David Alan Gilbert (4):
  libvhost-user: vu_queue_started
  vhost-user-bridge: Only process received packets on started queues
  libvhost-user: Update and fix feature and request lists
  libvhost-user: Support VHOST_USER_SET_SLAVE_REQ_FD

 contrib/libvhost-user/libvhost-user.h | 19 +++-
 contrib/libvhost-user/libvhost-user.c | 43 ++-
 tests/vhost-user-bridge.c |  1 +
 3 files changed, 56 insertions(+), 7 deletions(-)

-- 
2.15.0.rc0.40.gaefcc5f6f

[Qemu-devel] [PULL 1/4] libvhost-user: vu_queue_started

2017-10-12 Thread Marc-André Lureau

From: "Dr. David Alan Gilbert" 

Add a vu_queue_started method to complement vu_queue_enabled.

Signed-off-by: Dr. David Alan Gilbert 
Message-Id: <20171002191521.15748-2-dgilb...@redhat.com>
Reviewed-by: Marc-André Lureau 
Reviewed-by: Maxime Coquelin 
---
 contrib/libvhost-user/libvhost-user.h | 9 +
 contrib/libvhost-user/libvhost-user.c | 6 ++
 2 files changed, 15 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index 5825b66880..ff8059257e 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -334,6 +334,15 @@ void vu_queue_set_notification(VuDev *dev, VuVirtq *vq, 
int enable);
  */
 bool vu_queue_enabled(VuDev *dev, VuVirtq *vq);
 
+/**
+ * vu_queue_started:
+ * @dev: a VuDev context
+ * @vq: a VuVirtq queue
+ *
+ * Returns: whether the queue is started.
+ */
+bool vu_queue_started(const VuDev *dev, const VuVirtq *vq);
+
 /**
  * vu_queue_empty:
  * @dev: a VuDev context
diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index a0e0da4ccb..62be958f59 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -966,6 +966,12 @@ vu_queue_enabled(VuDev *dev, VuVirtq *vq)
 return vq->enable;
 }
 
+bool
+vu_queue_started(const VuDev *dev, const VuVirtq *vq)
+{
+return vq->started;
+}
+
 static inline uint16_t
 vring_avail_flags(VuVirtq *vq)
 {
-- 
2.15.0.rc0.40.gaefcc5f6f

[Qemu-devel] [PULL 2/4] vhost-user-bridge: Only process received packets on started queues

2017-10-12 Thread Marc-André Lureau

From: "Dr. David Alan Gilbert" 

Only process received packets if the queue has been started.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Marc-André Lureau 
Message-Id: <20171002191521.15748-3-dgilb...@redhat.com>
Reviewed-by: Maxime Coquelin 
---
 tests/vhost-user-bridge.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index f922cc75ae..d820033a72 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -277,6 +277,7 @@ vubr_backend_recv_cb(int sock, void *ctx)
 DPRINT("hdrlen = %d\n", hdrlen);
 
 if (!vu_queue_enabled(dev, vq) ||
+!vu_queue_started(dev, vq) ||
 !vu_queue_avail_bytes(dev, vq, hdrlen, 0)) {
 DPRINT("Got UDP packet, but no available descriptors on RX virtq.\n");
 return;
-- 
2.15.0.rc0.40.gaefcc5f6f

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Dan Williams

On Thu, Oct 12, 2017 at 8:50 AM, Pankaj Gupta  wrote:
>   This patch adds virtio-pmem driver for KVM guest.
>   Guest reads the persistent memory range information
>   over virtio bus from Qemu and reserves the range
>   as persistent memory. Guest also allocates a block
>   device corresponding to the pmem range which later
>   can be accessed with DAX compatible file systems.
>   Idea is to use the virtio channel between guest and
>   host to perform the block device flush for guest pmem
>   DAX device.
>
>   There is work to do including DAX file system support
>   and other advanced features.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/virtio/Kconfig   |  10 ++
>  drivers/virtio/Makefile  |   1 +
>  drivers/virtio/virtio_pmem.c | 322 
> +++
>  include/uapi/linux/virtio_pmem.h |  55 +++
>  4 files changed, 388 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pmem.c
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index cff773f15b7e..0192c4bda54b 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -38,6 +38,16 @@ config VIRTIO_PCI_LEGACY
>
>   If unsure, say Y.
>
> +config VIRTIO_PMEM
> +   tristate "Virtio pmem driver"
> +   depends on VIRTIO
> +   ---help---
> +This driver adds persistent memory range within a KVM guest.

I think we need to call this something other than persistent memory to
make it clear that this not memory where the persistence can be
managed from userspace. The persistence point always requires a driver
call, so this is something distinctly different than "persistent
memory". For example, it's a bug if this memory range ends up backing
a device-dax range in the guest where there is no such thing as a
driver callback to perform the flushing. How does this solution
protect against that scenario?

> + It also associates a block device corresponding to the pmem
> +range.
> +
> +If unsure, say M.
> +
>  config VIRTIO_BALLOON
> tristate "Virtio balloon driver"
> depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 41e30e3dc842..032ade725cc2 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> new file mode 100644
> index ..74e47cae0e24
> --- /dev/null
> +++ b/drivers/virtio/virtio_pmem.c
> @@ -0,0 +1,322 @@
> +/*
> + * virtio-pmem driver
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +void devm_vpmem_disable(struct device *dev, struct resource *res, void *addr)
> +{
> +   devm_memunmap(dev, addr);
> +   devm_release_mem_region(dev, res->start, resource_size(res));
> +}
> +
> +static void pmem_flush_done(struct virtqueue *vq)
> +{
> +   return;
> +};
> +
> +static void virtio_pmem_release_queue(void *q)
> +{
> +   blk_cleanup_queue(q);
> +}
> +
> +static void virtio_pmem_freeze_queue(void *q)
> +{
> +   blk_freeze_queue_start(q);
> +}
> +
> +static void virtio_pmem_release_disk(void *__pmem)
> +{
> +   struct virtio_pmem *pmem = __pmem;
> +
> +   del_gendisk(pmem->disk);
> +   put_disk(pmem->disk);
> +}

This code seems identical to the base pmem case, it should move to the
shared code object.

> +
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +   struct virtqueue *vq;
> +
> +   /* single vq */
> +   vq = virtio_find_single_vq(vpmem->vdev, pmem_flush_done, 
> "flush_queue");
> +
> +   if (IS_ERR(vq))
> +   return PTR_ERR(vq);
> +
> +   return 0;
> +}
> +
> +static struct vmem_altmap *setup_pmem_pfn(struct virtio_pmem *vpmem,
> +   struct resource *res, struct vmem_altmap *altmap)
> +{
> +   u32 start_pad = 0, end_trunc = 0;
> +   resource_size_t start, size;
> +   unsigned long npfns;
> +   phys_addr_t offset;
> +
> +   size = resource_size(res);
> +   start = PHYS_SECTION_ALIGN_DOWN(res->start);
> +
> +   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
> +   IORES_DESC_NONE) == REGION_MIXED) {
> +
> +   start = res->start;
> +   start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
> +   }
> +   start = res->start;
> +   size = PHYS_SECTION_ALIGN_UP(start + size) - start;
> +
> +   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
> +   IORES_DESC_NONE) == REGION_MIXED) {

Re: [Qemu-devel] [PATCH 3/5] openrisc/cputimer: Perparation for Multicore

2017-10-12 Thread Richard Henderson

On 08/22/2017 10:57 PM, Stafford Horne wrote:
> In order to support multicore system we move some of the previously
> static state variables into the state of each core.
> 
> On the other hand in order to allow timers to be synced between each
> code the ttcr (tick timer count register) is moved out of the core.
> This is not as per real hardware spec which has a separate timer counter
> per core, but it seems the most simple way to keep each clock in sync.
> 
> Signed-off-by: Stafford Horne 
> ---
>  hw/openrisc/cputimer.c   | 62 
> +---
>  target/openrisc/cpu.c|  1 -
>  target/openrisc/cpu.h|  4 ++-
>  target/openrisc/machine.c|  1 -
>  target/openrisc/sys_helper.c |  4 +--
>  5 files changed, 52 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH 2/5] target/openrisc: Make coreid and numcores configurable in state

2017-10-12 Thread Richard Henderson

On 08/22/2017 10:57 PM, Stafford Horne wrote:
> Previously coreid and numcores were hard coded as 0 and 1 respectively
> as OpenRISC QEMU did not have multicore support.
> 
> Multicore support is now being added so these registers need to have
> configured values.
> 
> Signed-off-by: Stafford Horne 
> ---
>  hw/openrisc/openrisc_sim.c   | 3 +++
>  target/openrisc/cpu.h| 3 +++
>  target/openrisc/machine.c| 7 +--
>  target/openrisc/sys_helper.c | 4 ++--
>  4 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
> index e1eeffc490..44a657753d 100644
> --- a/hw/openrisc/openrisc_sim.c
> +++ b/hw/openrisc/openrisc_sim.c
> @@ -110,6 +110,9 @@ static void openrisc_sim_init(MachineState *machine)
>  
>  for (n = 0; n < smp_cpus; n++) {
>  cpu = cpu_openrisc_init(cpu_model);
> +cpu->env.coreid = n;
> +cpu->env.numcores = smp_cpus;

This duplicates cpu->parent_obj.cpu_index.  Also c.f. max_cpus vs smp_cpus; the
latter can change via hot-plug.

> @@ -104,8 +104,8 @@ static const VMStateInfo vmstate_sr = {
>  
>  static const VMStateDescription vmstate_env = {
>  .name = "env",
> -.version_id = 6,
> -.minimum_version_id = 6,
> +.version_id = 7,
> +.minimum_version_id = 7,
>  .post_load = env_post_load,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINTTL_2DARRAY(shadow_gpr, CPUOpenRISCState, 16, 32),
> @@ -152,6 +152,9 @@ static const VMStateDescription vmstate_env = {
>  VMSTATE_UINT32(picmr, CPUOpenRISCState),
>  VMSTATE_UINT32(picsr, CPUOpenRISCState),
>  
> +VMSTATE_UINT32(coreid, CPUOpenRISCState),
> +VMSTATE_UINT32(numcores, CPUOpenRISCState),

If you use the above directly you don't need to save/restore these yourself.

>  case TO_SPR(0, 128): /* COREID */
> -return 0;
> +return env->coreid;

>  case TO_SPR(0, 129): /* NUMCORES */
> -return 1;
> +return env->numcores;

Just use the global variable directly here, IMO.


r~

Re: [Qemu-devel] [PATCH 1/5] openrisc/ompic: Add OpenRISC Multicore PIC (OMPIC)

2017-10-12 Thread Richard Henderson

On 08/22/2017 10:57 PM, Stafford Horne wrote:
> Add OpenRISC Multicore PIC which handles inter processor interrupts
> (IPI) between cores.  In OpenRISC all device interrupts are routed to
> each core enabling this device to be simple.
> 
> Signed-off-by: Stafford Horne 
> ---
>  default-configs/or1k-softmmu.mak |   1 +
>  hw/intc/Makefile.objs|   1 +
>  hw/intc/ompic.c  | 179 
> +++
>  3 files changed, 181 insertions(+)
>  create mode 100644 hw/intc/ompic.c

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC 1/2] pmem: Move reusable code to base header files

2017-10-12 Thread Dan Williams

On Thu, Oct 12, 2017 at 8:50 AM, Pankaj Gupta  wrote:
>  This patch moves common code to base header files
>  so that it can be used for both ACPI pmem and VIRTIO pmem
>  drivers. More common code needs to be moved out in future
>  based on functionality required for virtio_pmem driver and
>  coupling of code with existing ACPI pmem driver.
>
> Signed-off-by: Pankaj Gupta 
[..]
> diff --git a/include/linux/pmem_common.h b/include/linux/pmem_common.h
> new file mode 100644
> index ..e2e718c74b3f
> --- /dev/null
> +++ b/include/linux/pmem_common.h

This should be a common C file, not a header.

Re: [Qemu-devel] [PATCH] tcg: Initialize cpu_env generically

2017-10-12 Thread Lluís Vilanova

Richard Henderson writes:

> On 10/11/2017 11:43 PM, Lluís Vilanova wrote:
>>> /* Track which vCPU triggers events */
>>> CPUState *cpu;  /* *_trans */
>>> -TCGv_env tcg_env;   /* *_exec  */
>> 
>> I would rather keep it here instead of making a new global variable, since 
>> that
>> should make it easier in the future to have multiple translation contexts.

> Why do you believe this prevents it?  The variable is literally identical for
> *all* targets.

If someone decides to make tcg_ctx thread-local or have one per target
architecture (supporting multiple target archs concurrently), having all that
info on the tcg_ctx object is easier to track and modify, compared to having
multiple global variables.

> r~

> PS: Everyone, please trim context that you don't care about.  If you just add
> two lines in the middle of 1000, I'll not always find it.

So true; I just went with the flow of the list, sorry.

Cheers,
  Lluis

Re: [Qemu-devel] [PATCH 2/2] target/i386: trap on instructions longer than >15 bytes

2017-10-12 Thread Richard Henderson

On 10/12/2017 07:35 AM, Paolo Bonzini wrote:
> Besides being more correct, arbitrarily long instruction allow the
> generation of a translation block that spans three pages.  This
> confuses the generator and even allows ring 3 code to poison the
> translation block cache and inject code into other processes that are
> in guest ring 3.
> 
> This is an improved (and more invasive) fix for the bug fixed in commit
> 30663fd ("tcg/i386: Check the size of instruction being translated",
> 2017-03-24).  In addition to being more precise (and generating the
> right exception, which is #GP rather than #UD), it distinguishes better
> between page faults and too long instructions, as shown by this test case:
> 
> #include 
> #include 
> #include 
> 
> int main()
> {
> char *x = mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
>MAP_PRIVATE|MAP_ANON, -1, 0);
> memset(x, 0x66, 4096);
> x[4096] = 0x90;
> x[4097] = 0xc3;
> char *i = x + 4096 - 15;
> mprotect(x + 4096, 4096, PROT_READ|PROT_WRITE);
> ((void(*)(void)) i) ();
> }
> 
> ... which produces a #GP without the mprotect, and a #PF with it.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  target/i386/translate.c | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)

Reviewed-by: Richard Henderson 

> +if (sigsetjmp(s->jmpbuf, 0) != 0) {

Any particular reason to use sigsetjmp(x, 0) instead of setjmp(x)?
Certainly there are no signal frames that the longjmp will pass...


r~

[Qemu-devel] [Bug 1723161] Re: Migration failing in qemu-2.10.1 but working qemu-2.9.1 and earlier with same options

2017-10-12 Thread Dr. David Alan Gilbert

Interesting; As a test could you try removing the cdrom entry (I doubt that'll 
help but it's worth a go).
Also, is it possible for you to try a bisection?

If I was to guess, I'd bet it's something relating to the pflash, but I
don't know; it's unfortunate the block migration doesn't say what it got
upset by.

Are there no errors on the source side?

Assuming there are no errors on the source side, I'd have to suggest
taking a printf to migration/block.c block_load to find out which exit
it's falling out of in there (at least I think that's what would cause
that error)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1723161

Title:
  Migration failing in qemu-2.10.1 but working qemu-2.9.1 and earlier
  with same options

Status in QEMU:
  New

Bug description:
  
  Qemu-2.10.1 migration failing with the following error:
  Receiving block device images
  qemu-system-x86_64: error while loading section id 2(block)
  qemu-system-x86_64: load of migration failed: Input/output error

  Migration is setup on the destination system of the migration using:
  -incoming tcp:0:

  Migration is initiated from the source using the following commands in its 
qemu monitor:
  migrate -b "tcp:localhost:"

  The command-line used in both the source and destination is:
  qemu-system-x86_64 \
  -nodefaults \
  -pidfile vm0.pid \
  -enable-kvm \
  -machine q35 \
  -cpu host -smp 2 \
  -m 4096M \
  -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly \
  -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd \
  -drive file=${HDRIVE},format=qcow2 \
  -drive media=cdrom \
  -usb -device usb-tablet \
  -vga std -vnc :0 \
  -net nic,macaddr=${TAPMACADDR} -net 
user,net=192.168.2.0/24,dhcpstart=192.168.2.10 \
  -serial stdio \
  -monitor unix:${MONITORSOCKET},server,nowait

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1723161/+subscriptions

Re: [Qemu-devel] [PATCH 0/2] target/i386: trap on instructions longer than >15 bytes

2017-10-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20171012143548.18581-1-pbonz...@redhat.com
Subject: [Qemu-devel] [PATCH 0/2] target/i386: trap on instructions longer than 
>15 bytes

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4e49551072 target/i386: trap on instructions longer than >15 bytes
74202a7aef target/i386: introduce x86_ld*_code

=== OUTPUT BEGIN ===
Checking PATCH 1/2: target/i386: introduce x86_ld*_code...
Checking PATCH 2/2: target/i386: trap on instructions longer than >15 bytes...
ERROR: Use of volatile is usually wrong: see 
Documentation/volatile-considered-harmful.txt
#69: FILE: target/i386/translate.c:1881:
+volatile uint8_t unused =

total: 1 errors, 0 warnings, 53 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH v4 18/20] vpc: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.  Drop the now-unused
get_sector_offset().

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: rebase to master
v2: drop get_sector_offset() [Kevin], rebase to mapping flag
---
 block/vpc.c | 42 ++
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 1576d7b595..4100ce1ed3 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -705,52 +705,54 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
 BDRVVPCState *s = bs->opaque;
 VHDFooter *footer = (VHDFooter*) s->footer_buf;
-int64_t start, offset;
+int64_t start, image_offset;
 bool allocated;
-int64_t ret;
+int ret;
 int n;

 if (be32_to_cpu(footer->type) == VHD_FIXED) {
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

 qemu_co_mutex_lock(>lock);

-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, NULL);
-start = offset;
-allocated = (offset != -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+start = image_offset & BDRV_BLOCK_OFFSET_MASK;
+allocated = (image_offset != -1);
 *pnum = 0;
 ret = 0;

 do {
 /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);

 *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
 /* *pnum can't be greater than one block for allocated
  * sectors since there is always a bitmap in between. */
 if (allocated) {
 *file = bs->file->bs;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+*map = start;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 break;
 }
-if (nb_sectors == 0) {
+if (bytes == 0) {
 break;
 }
-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false,
-  NULL);
+image_offset = get_image_offset(bs, offset, false, NULL);
 } while (offset == -1);

 qemu_co_mutex_unlock(>lock);
@@ -1097,7 +1099,7 @@ static BlockDriver bdrv_vpc = {

 .bdrv_co_preadv = vpc_co_preadv,
 .bdrv_co_pwritev= vpc_co_pwritev,
-.bdrv_co_get_block_status   = vpc_co_get_block_status,
+.bdrv_co_block_status   = vpc_co_block_status,

 .bdrv_get_info  = vpc_get_info,

-- 
2.13.6

[Qemu-devel] [PATCH v4 20/20] block: Drop unused .bdrv_co_get_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to earlier changes
---
 include/block/block_int.h |  3 ---
 block/io.c| 34 +-
 2 files changed, 1 insertion(+), 36 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7c8503f693..6e7ae5fc8d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -213,9 +213,6 @@ struct BlockDriver {
  * guarantees input aligned to request_alignment, as well as
  * non-NULL pnum, map, and file.
  */
-int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file);
 int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
 bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file);
diff --git a/block/io.c b/block/io.c
index 6a2a2e1484..7ae4cdadf5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1847,7 +1847,7 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 bytes = n;
 }

-if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
+if (!bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (offset + bytes == total_size) {
@@ -1865,40 +1865,9 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,

 /* Round out to request_alignment boundaries */
 align = bs->bl.request_alignment;
-if (bs->drv->bdrv_co_get_block_status && align < BDRV_SECTOR_SIZE) {
-align = BDRV_SECTOR_SIZE;
-}
 aligned_offset = QEMU_ALIGN_DOWN(offset, align);
 aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;

-if (bs->drv->bdrv_co_get_block_status) {
-int count; /* sectors */
-int64_t longret;
-
-assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
-   BDRV_SECTOR_SIZE));
-/*
- * The contract allows us to return pnum smaller than bytes, even
- * if the next query would see the same status; we truncate the
- * request to avoid overflowing the driver's 32-bit interface.
- */
-longret = bs->drv->bdrv_co_get_block_status(
-bs, aligned_offset >> BDRV_SECTOR_BITS,
-MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, ,
-_file);
-if (longret < 0) {
-assert(INT_MIN <= longret);
-ret = longret;
-goto out;
-}
-if (longret & BDRV_BLOCK_OFFSET_VALID) {
-local_map = longret & BDRV_BLOCK_OFFSET_MASK;
-}
-ret = longret & ~BDRV_BLOCK_OFFSET_MASK;
-*pnum = count * BDRV_SECTOR_SIZE;
-goto refine;
-}
-
 ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
 aligned_bytes, pnum, _map,
 _file);
@@ -1916,7 +1885,6 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 *pnum = total_size - aligned_offset;
 }

-refine:
 /*
  * The driver's result must be a multiple of request_alignment.
  * Clamp pnum and adjust map to original request.
-- 
2.13.6

[Qemu-devel] [PATCH v4 16/20] vdi: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/vdi.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 6f83221ddc..b30886823d 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -171,8 +171,6 @@ typedef struct {
 uint32_t *bmap;
 /* Size of block (bytes). */
 uint32_t block_size;
-/* Size of block (sectors). */
-uint32_t block_sectors;
 /* First sector of block map. */
 uint32_t bmap_sector;
 /* VDI header (converted to host endianness). */
@@ -462,7 +460,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 bs->total_sectors = header.disk_size / SECTOR_SIZE;

 s->block_size = header.block_size;
-s->block_sectors = header.block_size / SECTOR_SIZE;
 s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
 s->header = header;

@@ -508,33 +505,29 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 return 0;
 }

-static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
-/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
 BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
-size_t bmap_index = sector_num / s->block_sectors;
-size_t sector_in_block = sector_num % s->block_sectors;
-int n_sectors = s->block_sectors - sector_in_block;
+size_t bmap_index = offset / s->block_size;
+size_t index_in_block = offset % s->block_size;
 uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
-uint64_t offset;
 int result;

-logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
-*pnum = n_sectors;
+logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum);
+*pnum = MIN(s->block_size, bytes);
 result = VDI_IS_ALLOCATED(bmap_entry);
 if (!result) {
 return 0;
 }

-offset = s->header.offset_data +
-  (uint64_t)bmap_entry * s->block_size +
-  sector_in_block * SECTOR_SIZE;
+*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
+index_in_block;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static int coroutine_fn
@@ -902,7 +895,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create = vdi_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = vdi_co_get_block_status,
+.bdrv_co_block_status = vdi_co_block_status,
 .bdrv_make_empty = vdi_make_empty,

 .bdrv_co_preadv = vdi_co_preadv,
-- 
2.13.6

[Qemu-devel] [PATCH v4 17/20] vmdk: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/vmdk.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index c665bcc977..624b5c296a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1303,25 +1303,27 @@ static inline uint64_t 
vmdk_find_index_in_cluster(VmdkExtent *extent,
 return offset / BDRV_SECTOR_SIZE;
 }

-static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vmdk_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVVmdkState *s = bs->opaque;
 int64_t index_in_cluster, n, ret;
-uint64_t offset;
+uint64_t cluster_offset;
 VmdkExtent *extent;

-extent = find_extent(s, sector_num, NULL);
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL);
 if (!extent) {
 return 0;
 }
 qemu_co_mutex_lock(>lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, ,
+ret = get_cluster_offset(bs, extent, NULL, offset, false, _offset,
  0, 0);
 qemu_co_mutex_unlock(>lock);

-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
+index_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 switch (ret) {
 case VMDK_ERROR:
 ret = -EIO;
@@ -1336,18 +1338,14 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 ret = BDRV_BLOCK_DATA;
 if (!extent->compressed) {
 ret |= BDRV_BLOCK_OFFSET_VALID;
-ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
-& BDRV_BLOCK_OFFSET_MASK;
+*map = cluster_offset + index_in_cluster;
 }
 *file = extent->file->bs;
 break;
 }

-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
-*pnum = n;
+n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster;
+*pnum = MIN(n, bytes);
 return ret;
 }

@@ -2393,7 +2391,7 @@ static BlockDriver bdrv_vmdk = {
 .bdrv_close   = vmdk_close,
 .bdrv_create  = vmdk_create,
 .bdrv_co_flush_to_disk= vmdk_co_flush,
-.bdrv_co_get_block_status = vmdk_co_get_block_status,
+.bdrv_co_block_status = vmdk_co_block_status,
 .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
 .bdrv_has_zero_init   = vmdk_has_zero_init,
 .bdrv_get_specific_info   = vmdk_get_specific_info,
-- 
2.13.6

[Qemu-devel] nvme: Add tracing v3

2017-10-12 Thread Doug Gale

>From c7f12a5949458fdd195b5e0b52f158e8f114f203 Mon Sep 17 00:00:00 2001
From: Doug Gale 
Date: Thu, 12 Oct 2017 14:29:07 -0400
Subject: [PATCH] nvme: Add tracing

Add trace output for commands, errors, and undefined behavior.
Add guest error log output for undefined behavior.
Report and ignore invalid undefined accesses to MMIO.
Annotate unlikely error checks with unlikely.

Signed-off-by: Doug Gale 
---
 hw/block/nvme.c   | 347 ++
 hw/block/trace-events |  93 ++
 2 files changed, 387 insertions(+), 53 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9aa32692a3..adac19f5b0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -34,8 +34,17 @@
 #include "qapi/visitor.h"
 #include "sysemu/block-backend.h"

+#include "qemu/log.h"
+#include "trace.h"
 #include "nvme.h"

+#define NVME_GUEST_ERR(trace, fmt, ...) \
+do { \
+(trace_##trace)(__VA_ARGS__); \
+qemu_log_mask(LOG_GUEST_ERROR, #trace \
+" in %s: " fmt "\n", __func__, ## __VA_ARGS__); \
+} while (0)
+
 static void nvme_process_sq(void *opaque);

 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
@@ -86,10 +95,14 @@ static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
 if (msix_enabled(&(n->parent_obj))) {
+trace_nvme_irq_msix(cq->vector);
 msix_notify(&(n->parent_obj), cq->vector);
 } else {
+trace_nvme_irq_pin();
 pci_irq_pulse(>parent_obj);
 }
+} else {
+trace_nvme_irq_masked();
 }
 }

@@ -100,7 +113,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg,
QEMUIOVector *iov, uint64_t prp1,
 trans_len = MIN(len, trans_len);
 int num_prps = (len >> n->page_bits) + 1;

-if (!prp1) {
+if (unlikely(!prp1)) {
+trace_nvme_err_invalid_prp();
 return NVME_INVALID_FIELD | NVME_DNR;
 } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
@@ -113,7 +127,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg,
QEMUIOVector *iov, uint64_t prp1,
 }
 len -= trans_len;
 if (len) {
-if (!prp2) {
+if (unlikely(!prp2)) {
+trace_nvme_err_invalid_prp2_missing();
 goto unmap;
 }
 if (len > n->page_size) {
@@ -128,7 +143,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg,
QEMUIOVector *iov, uint64_t prp1,
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);

 if (i == n->max_prp_ents - 1 && len > n->page_size) {
-if (!prp_ent || prp_ent & (n->page_size - 1)) {
+if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
+trace_nvme_err_invalid_prplist_ent(prp_ent);
 goto unmap;
 }

@@ -140,7 +156,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg,
QEMUIOVector *iov, uint64_t prp1,
 prp_ent = le64_to_cpu(prp_list[i]);
 }

-if (!prp_ent || prp_ent & (n->page_size - 1)) {
+if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
+trace_nvme_err_invalid_prplist_ent(prp_ent);
 goto unmap;
 }

@@ -154,7 +171,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg,
QEMUIOVector *iov, uint64_t prp1,
 i++;
 }
 } else {
-if (prp2 & (n->page_size - 1)) {
+if (unlikely(prp2 & (n->page_size - 1))) {
+trace_nvme_err_invalid_prp2_align(prp2);
 goto unmap;
 }
 if (qsg->nsg) {
@@ -178,16 +196,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n,
uint8_t *ptr, uint32_t len,
 QEMUIOVector iov;
 uint16_t status = NVME_SUCCESS;

+trace_nvme_dma_read(prp1, prp2);
+
 if (nvme_map_prp(, , prp1, prp2, len, n)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 if (qsg.nsg > 0) {
-if (dma_buf_read(ptr, len, )) {
+if (unlikely(dma_buf_read(ptr, len, ))) {
+trace_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
 qemu_sglist_destroy();
 } else {
-if (qemu_iovec_to_buf(, 0, ptr, len) != len) {
+if (unlikely(qemu_iovec_to_buf(, 0, ptr, len) != len)) {
+trace_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
 qemu_iovec_destroy();
@@ -273,7 +295,8 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n,
NvmeNamespace *ns, NvmeCmd *cmd,
 uint64_t aio_slba = slba << (data_shift - BDRV_SECTOR_BITS);
 uint32_t aio_nlb = nlb << (data_shift - BDRV_SECTOR_BITS);

-if (slba + nlb > ns->id_ns.nsze) {
+if (unlikely(slba + nlb > ns->id_ns.nsze)) {
+trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);

[Qemu-devel] [PATCH v4 13/20] raw: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping
---
 block/raw-format.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c0954..830243a8e4 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -250,17 +250,17 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero, int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
 BlockDriverState **file)
 {
 BDRVRawState *s = bs->opaque;
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->file->bs;
-sector_num += s->offset / BDRV_SECTOR_SIZE;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+*map = offset + s->offset;
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -496,7 +496,7 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = _co_pwritev,
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
-.bdrv_co_get_block_status = _co_get_block_status,
+.bdrv_co_block_status = _co_block_status,
 .bdrv_truncate= _truncate,
 .bdrv_getlength   = _getlength,
 .has_variable_length  = true,
-- 
2.13.6

[Qemu-devel] [PATCH v4 15/20] vdi: Avoid bitrot of debugging code

2017-10-12 Thread Eric Blake

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 

---
v2: no change
---
 block/vdi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 8da5dfc897..6f83221ddc 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -86,12 +86,18 @@
 #define DEFAULT_CLUSTER_SIZE (1 * MiB)

 #if defined(CONFIG_VDI_DEBUG)
-#define logout(fmt, ...) \
-fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#define VDI_DEBUG 1
 #else
-#define logout(fmt, ...) ((void)0)
+#define VDI_DEBUG 0
 #endif

+#define logout(fmt, ...) \
+do {\
+if (VDI_DEBUG) {\
+fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__); \
+}   \
+} while (0)
+
 /* Image signature. */
 #define VDI_SIGNATURE 0xbeda107f

-- 
2.13.6

[Qemu-devel] [PATCH v4 11/20] qcow2: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

For now, we are ignoring the 'want_zero' hint.  However, it should
be relatively straightforward to honor the hint as a way to return
larger *pnum values when we have consecutive clusters with the same
data/zero status but which differ only in having non-consecutive
mappings.

Signed-off-by: Eric Blake 

---
v4: update to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/qcow2.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 92cb9f9bfa..a84e50a6e6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1625,32 +1625,34 @@ static void qcow2_join_options(QDict *options, QDict 
*old_options)
 }
 }

-static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset, int64_t count,
+  int64_t *pnum, int64_t *map,
+  BlockDriverState **file)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
 int index_in_cluster, ret;
 unsigned int bytes;
-int64_t status = 0;
+int status = 0;

-bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
+bytes = MIN(INT_MAX, count);
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << BDRV_SECTOR_BITS, ,
-   _offset);
+ret = qcow2_get_cluster_offset(bs, offset, , _offset);
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
 }

-*pnum = bytes >> BDRV_SECTOR_BITS;
+*pnum = bytes;

 if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
 !s->crypto) {
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+index_in_cluster = offset & (s->cluster_size - 1);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+status |= BDRV_BLOCK_OFFSET_VALID;
 }
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
@@ -4333,7 +4335,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create= qcow2_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = qcow2_co_get_block_status,
+.bdrv_co_block_status = qcow2_co_block_status,

 .bdrv_co_preadv = qcow2_co_preadv,
 .bdrv_co_pwritev= qcow2_co_pwritev,
-- 
2.13.6

[Qemu-devel] [PATCH v4 14/20] sheepdog: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake 

---
v4: update to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/sheepdog.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 696a71442a..0af8b07892 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2988,19 +2988,19 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState 
*bs, int64_t offset,
 return acb.ret;
 }

-static coroutine_fn int64_t
-sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors,
-   int *pnum, BlockDriverState **file)
+static coroutine_fn int
+sd_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
+   int64_t bytes, int64_t *pnum, int64_t *map,
+   BlockDriverState **file)
 {
 BDRVSheepdogState *s = bs->opaque;
 SheepdogInode *inode = >inode;
 uint32_t object_size = (UINT32_C(1) << inode->block_size_shift);
-uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
 unsigned long start = offset / object_size,
-  end = DIV_ROUND_UP((sector_num + nb_sectors) *
- BDRV_SECTOR_SIZE, object_size);
+  end = DIV_ROUND_UP(offset + bytes, object_size);
 unsigned long idx;
-int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+*map = offset;
+int ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;

 for (idx = start; idx < end; idx++) {
 if (inode->data_vdi_id[idx] == 0) {
@@ -3017,9 +3017,9 @@ sd_co_get_block_status(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 }
 }

-*pnum = (idx - start) * object_size / BDRV_SECTOR_SIZE;
-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+*pnum = (idx - start) * object_size;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
 *file = bs;
@@ -3097,7 +3097,7 @@ static BlockDriver bdrv_sheepdog = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
@@ -3133,7 +3133,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
@@ -3169,7 +3169,7 @@ static BlockDriver bdrv_sheepdog_unix = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
-- 
2.13.6

[Qemu-devel] [PATCH v4 19/20] vvfat: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.  Note that we
can rely on the block driver having already clamped limits to our
block size, and simplify accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to earlier changes, simplify
---
 block/vvfat.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index a0f2335894..9142117fc6 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3082,15 +3082,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }

-static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
+static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *n,
+  int64_t *map,
+  BlockDriverState **file)
 {
-*n = bs->total_sectors - sector_num;
-if (*n > nb_sectors) {
-*n = nb_sectors;
-} else if (*n < 0) {
-return 0;
-}
+*n = bytes;
 return BDRV_BLOCK_DATA;
 }

@@ -3251,7 +3249,7 @@ static BlockDriver bdrv_vvfat = {

 .bdrv_co_preadv = vvfat_co_preadv,
 .bdrv_co_pwritev= vvfat_co_pwritev,
-.bdrv_co_get_block_status = vvfat_co_get_block_status,
+.bdrv_co_block_status   = vvfat_co_block_status,
 };

 static void bdrv_vvfat_init(void)
-- 
2.13.6

[Qemu-devel] [PATCH v4 09/20] parallels: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak, R-b dropped
v3: no change
v2: rebase to mapping parameter; it is ignored, so R-b kept
---
 block/parallels.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2b6c6e5709..4a086f29e9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -278,23 +278,31 @@ static coroutine_fn int 
parallels_co_flush_to_os(BlockDriverState *bs)
 }


-static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn parallels_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t offset;
+int count;

+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
 qemu_co_mutex_lock(>lock);
-offset = block_status(s, sector_num, nb_sectors, pnum);
+offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, );
 qemu_co_mutex_unlock(>lock);

 if (offset < 0) {
 return 0;
 }

+*pnum = count * BDRV_SECTOR_SIZE;
+*map = offset;
 *file = bs->file->bs;
-return (offset << BDRV_SECTOR_BITS) |
-BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
@@ -781,7 +789,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_open = parallels_open,
 .bdrv_close= parallels_close,
 .bdrv_child_perm  = bdrv_format_default_perms,
-.bdrv_co_get_block_status = parallels_co_get_block_status,
+.bdrv_co_block_status = parallels_co_block_status,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
 .bdrv_co_readv  = parallels_co_readv,
-- 
2.13.6

[Qemu-devel] [PATCH v4 12/20] qed: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly, taking the opportunity
to inline qed_is_allocated_cb() into its lone caller (the callback
used to be important, until we switched qed to coroutines).  There is
no intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 

---
v4: rebase to interface change, inline pointless callback
v3: no change
v2: rebase to mapping flag, fix mask in qed_is_allocated_cb
---
 block/qed.c | 84 +
 1 file changed, 28 insertions(+), 56 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 28e2ec89e8..cbab5b6f83 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -688,74 +688,46 @@ finish:
 return ret;
 }

-typedef struct {
-BlockDriverState *bs;
-Coroutine *co;
-uint64_t pos;
-int64_t status;
-int *pnum;
-BlockDriverState **file;
-} QEDIsAllocatedCB;
-
-/* Called with table_lock held.  */
-static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t 
len)
-{
-QEDIsAllocatedCB *cb = opaque;
-BDRVQEDState *s = cb->bs->opaque;
-*cb->pnum = len / BDRV_SECTOR_SIZE;
-switch (ret) {
-case QED_CLUSTER_FOUND:
-offset |= qed_offset_into_cluster(s, cb->pos);
-cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
-*cb->file = cb->bs->file->bs;
-break;
-case QED_CLUSTER_ZERO:
-cb->status = BDRV_BLOCK_ZERO;
-break;
-case QED_CLUSTER_L2:
-case QED_CLUSTER_L1:
-cb->status = 0;
-break;
-default:
-assert(ret < 0);
-cb->status = ret;
-break;
-}
-
-if (cb->co) {
-aio_co_wake(cb->co);
-}
-}
-
-static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
+static int coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t pos, int64_t bytes,
+ int64_t *pnum, int64_t *map,
  BlockDriverState **file)
 {
 BDRVQEDState *s = bs->opaque;
-size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
-QEDIsAllocatedCB cb = {
-.bs = bs,
-.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
-.status = BDRV_BLOCK_OFFSET_MASK,
-.pnum = pnum,
-.file = file,
-};
+size_t len;
+int status;
 QEDRequest request = { .l2_table = NULL };
 uint64_t offset;
 int ret;

 qemu_co_mutex_lock(>table_lock);
-ret = qed_find_cluster(s, , cb.pos, , );
-qed_is_allocated_cb(, ret, offset, len);
+ret = qed_find_cluster(s, , pos, , );

-/* The callback was invoked immediately */
-assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
+*pnum = len;
+switch (ret) {
+case QED_CLUSTER_FOUND:
+*map = offset | qed_offset_into_cluster(s, pos);
+status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*file = bs->file->bs;
+break;
+case QED_CLUSTER_ZERO:
+status = BDRV_BLOCK_ZERO;
+break;
+case QED_CLUSTER_L2:
+case QED_CLUSTER_L1:
+status = 0;
+break;
+default:
+assert(ret < 0);
+status = ret;
+break;
+}

 qed_unref_l2_cache_entry(request.l2_table);
 qemu_co_mutex_unlock(>table_lock);

-return cb.status;
+return status;
 }

 static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
@@ -1595,7 +1567,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create  = bdrv_qed_create,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
+.bdrv_co_block_status = bdrv_qed_co_block_status,
 .bdrv_co_readv= bdrv_qed_co_readv,
 .bdrv_co_writev   = bdrv_qed_co_writev,
 .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes,
-- 
2.13.6

[Qemu-devel] [PATCH v4 08/20] null: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping parameter
---
 block/null.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index dd9c13f9ba..20c7eb0ee2 100644
--- a/block/null.c
+++ b/block/null.c
@@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }

-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int coroutine_fn null_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int64_t ret = BDRV_BLOCK_OFFSET_VALID;

-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs;

 if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
 }
+return ret;
 }

 static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = {
 .bdrv_co_flush_to_disk  = null_co_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
@@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_aio_flush = null_aio_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
-- 
2.13.6

[Qemu-devel] [PATCH v4 07/20] iscsi: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL map
and file parameter, even though the block layer passes non-NULL
values, because we also call the function directly.  For now, there
are no optimizations done based on the want_zero flag.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweaks
v3: no change
v2: rebase to mapping parameter
---
 block/iscsi.c | 64 +--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 4896d50d6e..cb30e9652c 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -647,28 +647,31 @@ out_unlock:



-static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
-  int64_t sector_num,
-  int nb_sectors, int *pnum,
-  BlockDriverState **file)
+static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 IscsiLun *iscsilun = bs->opaque;
 struct scsi_get_lba_status *lbas = NULL;
 struct scsi_lba_status_descriptor *lbasd = NULL;
 struct IscsiTask iTask;
-int64_t ret;
+int ret;

 iscsi_co_init_iscsitask(iscsilun, );

-if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+if (!is_byte_request_lun_aligned(offset, bytes, iscsilun)) {
 ret = -EINVAL;
 goto out;
 }

 /* default to all sectors allocated */
-ret = BDRV_BLOCK_DATA;
-ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
-*pnum = nb_sectors;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+if (map) {
+*map = offset;
+}
+*pnum = bytes;

 /* LUN does not support logical block provisioning */
 if (!iscsilun->lbpme) {
@@ -678,7 +681,7 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,
 qemu_mutex_lock(>mutex);
 retry:
 if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
-  sector_qemu2lun(sector_num, iscsilun),
+  offset / iscsilun->block_size,
   8 + 16, iscsi_co_generic_cb,
   ) == NULL) {
 ret = -ENOMEM;
@@ -717,12 +720,12 @@ retry:

 lbasd = >descriptors[0];

-if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+if (offset / iscsilun->block_size != lbasd->lba) {
 ret = -EIO;
 goto out_unlock;
 }

-*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
+*pnum = lbasd->num_blocks * iscsilun->block_size;

 if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
 lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
@@ -733,15 +736,13 @@ retry:
 }

 if (ret & BDRV_BLOCK_ZERO) {
-iscsi_allocmap_set_unallocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
-   *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum);
 } else {
-iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
- *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
 }

-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 out_unlock:
 qemu_mutex_unlock(>mutex);
@@ -749,7 +750,7 @@ out:
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
 }
-if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
+if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
 *file = bs;
 }
 return ret;
@@ -788,25 +789,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
  nb_sectors * BDRV_SECTOR_SIZE) &&
 !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
  nb_sectors * BDRV_SECTOR_SIZE)) {
-int pnum;
-BlockDriverState *file;
+int64_t pnum;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
-int head;
-int64_t ret;
+int64_t head;
+int ret;

-assert(cluster_sectors);
-head = sector_num % cluster_sectors;
-ret = iscsi_co_get_block_status(bs, sector_num - head,
-

[Qemu-devel] [PATCH v4 05/20] iscsi: Switch cluster_sectors to byte-based

2017-10-12 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the cluster size in sectors, along with adding assertions that we
are not dividing by zero.

Signed-off-by: Eric Blake 

---
v2: no change
---
 block/iscsi.c | 56 +++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 4683f3b244..8f903d8370 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -84,7 +84,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap;
 unsigned long *allocmap_valid;
 long allocmap_size;
-int cluster_sectors;
+int cluster_size;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -427,9 +427,10 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 {
 iscsi_allocmap_free(iscsilun);

+assert(iscsilun->cluster_size);
 iscsilun->allocmap_size =
-DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
- iscsilun->cluster_sectors);
+DIV_ROUND_UP(iscsilun->num_blocks * iscsilun->block_size,
+ iscsilun->cluster_size);

 iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
 if (!iscsilun->allocmap) {
@@ -437,7 +438,7 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 if (open_flags & BDRV_O_NOCACHE) {
-/* in case that cache.direct = on all allocmap entries are
+/* when cache.direct = on all allocmap entries are
  * treated as invalid to force a relookup of the block
  * status on every read request */
 return 0;
@@ -458,17 +459,19 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
   int nb_sectors, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-cl_num_expanded = sector_num / iscsilun->cluster_sectors;
+assert(cluster_sectors);
+cl_num_expanded = sector_num / cluster_sectors;
 nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   iscsilun->cluster_sectors) - 
cl_num_expanded;
+   cluster_sectors) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / iscsilun->cluster_sectors
+cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
+nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
   - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
@@ -532,9 +535,12 @@ iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t 
sector_num,
 if (iscsilun->allocmap == NULL) {
 return true;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num / iscsilun->cluster_sectors) == size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
@@ -544,9 +550,12 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun 
*iscsilun,
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return (find_next_zero_bit(iscsilun->allocmap_valid, size,
-   sector_num / iscsilun->cluster_sectors) == 
size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static int coroutine_fn
@@ -781,16 +790,21 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
 BlockDriverState *file;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int64_t ret = iscsi_co_get_block_status(bs,
-  sector_num - sector_num % iscsilun->cluster_sectors,
-  BDRV_REQUEST_MAX_SECTORS, , );
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
+int head;
+int64_t ret;
+
+assert(cluster_sectors);
+head = sector_num %

[Qemu-devel] [PATCH v4 10/20] qcow: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.  There is no
intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: rebase to master
v2: rebase to mapping flag
---
 block/qcow.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 9569deeaf0..f09661dc0c 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -513,23 +513,28 @@ static int get_cluster_offset(BlockDriverState *bs,
 return 1;
 }

-static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVQcowState *s = bs->opaque;
-int index_in_cluster, n, ret;
+int index_in_cluster, ret;
+int64_t n;
 uint64_t cluster_offset;

 qemu_co_mutex_lock(>lock);
-ret = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0, _offset);
+ret = get_cluster_offset(bs, offset, 0, 0, 0, 0, _offset);
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
 }
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-n = s->cluster_sectors - index_in_cluster;
-if (n > nb_sectors)
-n = nb_sectors;
+index_in_cluster = offset & (s->cluster_size - 1);
+n = s->cluster_size - index_in_cluster;
+if (n > bytes) {
+n = bytes;
+}
 *pnum = n;
 if (!cluster_offset) {
 return 0;
@@ -537,9 +542,9 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
 if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypto) {
 return BDRV_BLOCK_DATA;
 }
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
@@ -,7 +1116,7 @@ static BlockDriver bdrv_qcow = {

 .bdrv_co_readv  = qcow_co_readv,
 .bdrv_co_writev = qcow_co_writev,
-.bdrv_co_get_block_status   = qcow_co_get_block_status,
+.bdrv_co_block_status   = qcow_co_block_status,

 .bdrv_make_empty= qcow_make_empty,
 .bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed,
-- 
2.13.6

[Qemu-devel] [PATCH v4 03/20] file-posix: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping lseek(), and merely state that all bytes
are allocated.

Signed-off-by: Eric Blake 

---
v4: tweak commit message [Fam], rebase to interface tweak
v3: no change
v2: tweak comment, add mapping support
---
 block/file-posix.c | 59 +++---
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e940..536cd1db03 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2128,24 +2128,26 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 }

 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * If 'offset' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped.
  */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
+BlockDriverState **file)
 {
-off_t start, data = 0, hole = 0;
+off_t data = 0, hole = 0;
 int64_t total_size;
 int ret;

@@ -2154,39 +2156,46 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
 return total_size;
-} else if (start >= total_size) {
+} else if (offset >= total_size) {
 *pnum = 0;
 return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+} else if (offset + bytes > total_size) {
+bytes = total_size - offset;
 }

-ret = find_allocation(bs, start, , );
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+}
+
+ret = find_allocation(bs, offset, , );
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
+*map = offset;
 *file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }

 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
@@ -2280,7 +2289,7 @@

[Qemu-devel] [PATCH v4 06/20] iscsi: Switch iscsi_allocmap_update() to byte-based

2017-10-12 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the allocmap (no semantic change).  Callers that already had bytes
available are simpler, and callers that now scale to bytes will be
easier to switch to byte-based in the future.

Signed-off-by: Eric Blake 

---
v2: rebase to count/bytes rename
---
 block/iscsi.c | 90 +--
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 8f903d8370..4896d50d6e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -455,24 +455,22 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 static void
-iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
-  int nb_sectors, bool allocated, bool valid)
+iscsi_allocmap_update(IscsiLun *iscsilun, int64_t offset,
+  int64_t bytes, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-assert(cluster_sectors);
-cl_num_expanded = sector_num / cluster_sectors;
-nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   cluster_sectors) - cl_num_expanded;
+assert(iscsilun->cluster_size);
+cl_num_expanded = offset / iscsilun->cluster_size;
+nb_cls_expanded = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size)
+- cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
-  - cl_num_shrunk;
+cl_num_shrunk = DIV_ROUND_UP(offset, iscsilun->cluster_size);
+nb_cls_shrunk = (offset + bytes) / iscsilun->cluster_size - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
 } else {
@@ -495,26 +493,26 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
 }

 static void
-iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t sector_num,
- int nb_sectors)
+iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t offset,
+ int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, true, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, true, true);
 }

 static void
-iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
 /* Note: if cache.direct=on the fifth argument to iscsi_allocmap_update
  * is ignored, so this will in effect be an iscsi_allocmap_set_invalid.
  */
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, true);
 }

-static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, false);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, false);
 }

 static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
@@ -528,34 +526,30 @@ static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
 }

 static inline bool
-iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t sector_num,
-int nb_sectors)
+iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t offset,
+int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap == NULL) {
 return true;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num + nb_sectors,
-iscsilun->cluster_size >> BDRV_SECTOR_BITS);
+size = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num * BDRV_SECTOR_SIZE /
-   iscsilun->cluster_size) == size);
+   offset / iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
-   int64_t sector_num, int nb_sectors)
+   int64_t offset, int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num +

[Qemu-devel] [PATCH v4 02/20] block: Switch passthrough drivers to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the generic helpers, and all passthrough clients
(blkdebug, commit, mirror, throttle) accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: rebase to addition of throttle driver
v2: rebase to master, retitle while merging blkdebug, commit, and mirror
---
 include/block/block_int.h | 28 
 block/io.c| 36 
 block/blkdebug.c  | 20 +++-
 block/commit.c|  2 +-
 block/mirror.c|  2 +-
 block/throttle.c  |  2 +-
 6 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4153cd646d..7c8503f693 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1018,23 +1018,27 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t *nperm, uint64_t *nshared);

 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
-BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
+int64_t *map,
+BlockDriverState **file);
 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their backing file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState 
*bs,
-   int64_t sector_num,
-   int nb_sectors,
-   int *pnum,
-   BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/io.c b/block/io.c
index ef9ea44667..6a2a2e1484 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1754,30 +1754,34 @@ typedef struct BdrvCoBlockStatusData {
 bool done;
 } BdrvCoBlockStatusData;

-int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
-BlockDriverState 
**file)
+int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
+int64_t *map,
+BlockDriverState **file)
 {
 assert(bs->file && bs->file->bs);
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

-int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState 
*bs,
-   int64_t sector_num,
-   int nb_sectors,
-   int

[Qemu-devel] [PATCH v4 04/20] gluster: Switch to .bdrv_co_block_status()

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping find_allocation(), and merely state that
all bytes are allocated.

Signed-off-by: Eric Blake 

---
v4: tweak commit message [Fam], rebase to interface tweak
v3: no change
v2: tweak comments [Prasanna], add mapping, drop R-b
---
 block/gluster.c | 67 +
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 0f4265a3a4..518f1984de 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1349,26 +1349,30 @@ exit:
 }

 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * If 'offset' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped.
  *
- * (Based on raw_co_get_block_status() from file-posix.c.)
+ * (Based on raw_co_block_status() from file-posix.c.)
  */
-static int64_t coroutine_fn qemu_gluster_co_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVGlusterState *s = bs->opaque;
-off_t start, data = 0, hole = 0;
+off_t data = 0, hole = 0;
 int64_t total_size;
 int ret = -EINVAL;

@@ -1376,41 +1380,48 @@ static int64_t coroutine_fn 
qemu_gluster_co_get_block_status(
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
 return total_size;
-} else if (start >= total_size) {
+} else if (offset >= total_size) {
 *pnum = 0;
 return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+} else if (offset + bytes > total_size) {
+bytes = total_size - offset;
 }

-ret = find_allocation(bs, start, , );
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+}
+
+ret = find_allocation(bs, offset, , );
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }

+*map = offset;
 *file = bs;

-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }


@@ -1438,7

Re: [Qemu-devel] proposed schedule for 2.11 release

2017-10-12 Thread Peter Maydell

On 10 October 2017 at 17:26, Paolo Bonzini  wrote:
> On 10/10/2017 18:10, Daniel P. Berrange wrote:
>> On Tue, Oct 10, 2017 at 03:07:26PM +0100, Peter Maydell wrote:
>>> OK, here's a strawman proposal for the 2.11 release cycle,
>>> based on aiming for a 5 Dec release date and working backwards
>>> with the same cadence as 2.10.
>>>
>>> 2017-10-31 Soft freeze -- all feature changes must already be in a
>>>pull request that's been sent to the mailing list
>>> 2017-11-07 Hard freeze, tag rc0
>>> 2017-11-14 Tag rc1
>>> 2017-11-21 Tag rc2
>>> 2017-11-28 Tag rc3
>>> 2017-12-05 Release (or rc4 if we need it)
>>> 2017-12-12 Release if we needed an rc4
>>>
>>> That makes softfreeze in 3 weeks time. I'd rather not push
>>> the times back any further because we end up in serious
>>> risk of running into the christmas holiday period.
>>
>> Soft freeze date is not great from a KVM Forum timing POV - the 31st is
>> likely to be a travel day for most of the Red Hat devs.
>
> Might as well have soft freeze on Oct 24 (and keep hard freeze on Nov
> 7)...  Non-maintainers posting patches on the Oct 24-31 week probably
> wouldn't have them included in 2.11 anyway.

Hmm, I'd forgotten that KVM Forum dates would mess this up.
I'm going to be travelling/at the conference for the 23-27th
week, so I can't guarantee that I'll be able to apply any pull
requests then. But I will be back for the 30th, so I'd
rather not officially set the soft freeze date to the 24th.
Maintainers who are going to be away then will just, as
Daniel suggests, want to plan ahead to have got their
pull requests on list.

thanks
-- PMM

[Qemu-devel] [PATCH v4 01/20] block: Add .bdrv_co_block_status() callback

2017-10-12 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().
The old code now uses a goto in order to minimize churn at that later
removal.  Update documentation in this patch so that the later
removal can be a straight delete.

The new code also passes through the 'want_zero' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated (mapping is false), rather
than full details about runs of zeroes and which offsets the
allocation actually maps to (mapping is true).

Note that most drivers give sector-aligned answers, except at
end-of-file, even when request_alignment is smaller than a sector.
However, bdrv_getlength() is sector-aligned (even though it gives a
byte answer), often by exceeding the actual file size.  If we were to
give back strict results, at least file-posix.c would report a
transition from DATA to HOLE at the end of a file even in the middle
of a sector, which can throw off callers; so we intentionally lie and
state that any partial sector at the end of a file has the same
status for the entire sector.  Maybe at some future day we can
report actual file size instead of rounding up, but not for this
series.

Signed-off-by: Eric Blake 

---
v4: rebase to master
v3: no change
v2: improve alignment handling, ensure all iotests still pass
---
 include/block/block.h |  9 -
 include/block/block_int.h | 12 +---
 block/io.c| 30 +-
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index fbc21daf62..c5d6b2c933 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -136,11 +136,10 @@ typedef struct HDGeometry {
  * that the block layer recompute the answer from the returned
  * BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
  *
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK) of
- * the return value (old interface) or the entire map parameter (new
- * interface) represent the offset in the returned BDS that is allocated for
- * the corresponding raw data.  However, whether that offset actually
- * contains data also depends on BDRV_BLOCK_DATA, as follows:
+ * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
+ * host offset within the returned BDS that is allocated for the
+ * corresponding raw guest data.  However, whether that offset
+ * actually contains data also depends on BDRV_BLOCK_DATA, as follows:
  *
  * DATA ZERO OFFSET_VALID
  *  ttt   sectors read as zero, returned file is zero at offset
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4b9b23a08d..4153cd646d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -206,13 +206,19 @@ struct BlockDriver {
  * bdrv_is_allocated[_above].  The driver should answer only
  * according to the current layer, and should not set
  * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
- * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
- * layer guarantees input aligned to request_alignment, as well as
- * non-NULL pnum and file.
+ * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  As a hint,
+ * the flag want_zero is true if the caller cares more about
+ * precise mappings (favor _OFFSET_VALID/_ZERO) or false for
+ * overall allocation (favor larger *pnum).  The block layer
+ * guarantees input aligned to request_alignment, as well as
+ * non-NULL pnum, map, and file.
  */
 int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, int *pnum,
 BlockDriverState **file);
+int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
+bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+int64_t *map, BlockDriverState **file);

 /*
  * Invalidate any cached meta-data.
diff --git a/block/io.c b/block/io.c
index e4caa4acf1..ef9ea44667 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1843,7 +1843,7 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 bytes = n;
 }

-if (!bs->drv->bdrv_co_get_block_status) {
+if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (offset + bytes == total_size) {
@@ -1860,13 +1860,14 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 bdrv_inc_in_flight(bs);

 /* Round out to request_alignment boundaries */
-/* TODO: until we

[Qemu-devel] [PATCH v4 00/20] add byte-based block_status driver callbacks

2017-10-12 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

Now that 2.11 is open, I'm rebasing/reposting the remaining patches.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (merged, commit 51b0a488)
part 2: dirty-bitmap (merged, commit ca759622)
part 3: bdrv_get_block_status (v6 is posted [1], partially reviewed)
part 4: .bdrv_co_block_status (this series, v3 was here [2])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-callback-v4

Based-on: <20171012034720.11947-1-ebl...@redhat.com>
([PATCH v6 00/24] make bdrv_get_block_status byte-based)

Since v3:
- rebase to series 3 tweak in get_status interface
- further simplify qed code
- better documentation of optimization flag (s/mapping/want_zero/)

001/20:[0033] [FC] 'block: Add .bdrv_co_block_status() callback'
002/20:[0077] [FC] 'block: Switch passthrough drivers to 
.bdrv_co_block_status()'
003/20:[0020] [FC] 'file-posix: Switch to .bdrv_co_block_status()'
004/20:[0019] [FC] 'gluster: Switch to .bdrv_co_block_status()'
005/20:[] [--] 'iscsi: Switch cluster_sectors to byte-based'
006/20:[] [--] 'iscsi: Switch iscsi_allocmap_update() to byte-based'
007/20:[0022] [FC] 'iscsi: Switch to .bdrv_co_block_status()'
008/20:[0013] [FC] 'null: Switch to .bdrv_co_block_status()'
009/20:[0017] [FC] 'parallels: Switch to .bdrv_co_block_status()'
010/20:[0014] [FC] 'qcow: Switch to .bdrv_co_block_status()'
011/20:[0019] [FC] 'qcow2: Switch to .bdrv_co_block_status()'
012/20:[0086] [FC] 'qed: Switch to .bdrv_co_block_status()'
013/20:[0015] [FC] 'raw: Switch to .bdrv_co_block_status()'
014/20:[0011] [FC] 'sheepdog: Switch to .bdrv_co_block_status()'
015/20:[] [--] 'vdi: Avoid bitrot of debugging code'
016/20:[0017] [FC] 'vdi: Switch to .bdrv_co_block_status()'
017/20:[0013] [FC] 'vmdk: Switch to .bdrv_co_block_status()'
018/20:[0019] [FC] 'vpc: Switch to .bdrv_co_block_status()'
019/20:[0010] [FC] 'vvfat: Switch to .bdrv_co_block_status()'
020/20:[0019] [FC] 'block: Drop unused .bdrv_co_get_block_status()'

Eric Blake (20):
  block: Add .bdrv_co_block_status() callback
  block: Switch passthrough drivers to .bdrv_co_block_status()
  file-posix: Switch to .bdrv_co_block_status()
  gluster: Switch to .bdrv_co_block_status()
  iscsi: Switch cluster_sectors to byte-based
  iscsi: Switch iscsi_allocmap_update() to byte-based
  iscsi: Switch to .bdrv_co_block_status()
  null: Switch to .bdrv_co_block_status()
  parallels: Switch to .bdrv_co_block_status()
  qcow: Switch to .bdrv_co_block_status()
  qcow2: Switch to .bdrv_co_block_status()
  qed: Switch to .bdrv_co_block_status()
  raw: Switch to .bdrv_co_block_status()
  sheepdog: Switch to .bdrv_co_block_status()
  vdi: Avoid bitrot of debugging code
  vdi: Switch to .bdrv_co_block_status()
  vmdk: Switch to .bdrv_co_block_status()
  vpc: Switch to .bdrv_co_block_status()
  vvfat: Switch to .bdrv_co_block_status()
  block: Drop unused .bdrv_co_get_block_status()

 include/block/block.h |   9 ++-
 include/block/block_int.h |  43 +++--
 block/io.c|  80 +++-
 block/blkdebug.c  |  20 +++---
 block/commit.c|   2 +-
 block/file-posix.c|  59 ++
 block/gluster.c   |  67 +++-
 block/iscsi.c | 154 +-
 block/mirror.c|   2 +-
 block/null.c  |  23 +++
 block/parallels.c |  22 ---
 block/qcow.c  |  27 
 block/qcow2.c |  24 
 block/qed.c   |  84 +
 block/raw-format.c|  16 ++---
 block/sheepdog.c  |  26 
 block/throttle.c  |   2 +-
 block/vdi.c   |  45 +++---
 block/vmdk.c  |  28 -
 block/vpc.c   |  42 +++--
 block/vvfat.c |  16 +++--
 21 files changed, 404 insertions(+), 387 deletions(-)

-- 
2.13.6

Re: [Qemu-devel] [PATCH v3 0/8] Add the ZynqMP PMU and IPI

2017-10-12 Thread Edgar E. Iglesias

On Oct 10, 2017 7:59 PM, "Alistair Francis" 
wrote:

On Tue, Oct 10, 2017 at 7:48 AM, Edgar E. Iglesias
 wrote:
> On Mon, Oct 09, 2017 at 05:12:39PM -0700, Alistair Francis wrote:
>> On Sun, Oct 8, 2017 at 3:20 PM, Edgar E. Iglesias
>>  wrote:
>> > On Wed, Sep 20, 2017 at 03:01:31PM -0700, Alistair Francis wrote:
>> >>
>> >> This series adds the ZynqMP Power Management Unit (PMU) machine with
basic
>> >> functionality.
>> >>
>> >> The machine only has the
>> >>  - CPU
>> >>  - Memory
>> >>  - Interrupt controller
>> >>  - IPI device
>> >>
>> >> connected, but that is enough to run some of the ROM and firmware
>> >> code on the machine
>> >>
>> >> The series also adds the IPI device and connects it to the ZynqMP ARM
>> >> side and the ZynqMP PMU. These IPI devices don't connect between the
ARM
>> >> and MicroBlaze instances though.
>> >>
>> >> v3:
>> >>  - Add the interrupt controller
>> >>  - Replace some of the error_fatals with errp
>> >>  - Fix the PMU CPU name
>> >
>> > Hi Alistair,
>> >
>> >
>> > Sorry for the super long delay...
>> >
>> > I think this mostly looks good but I was wondering if we really need
>> > to have a board specific (zcu102) PMU?
>>
>> It doesn't have to be board specific. What I wanted though was an SoC
>> and a machine so that maybe one day we could add the PMU SoC to the
>> ARM ZCU102 machine. After that it was hard to think of a name to
>> differentiate the SoC and the machine. Do you have a recommendation on
>> names?
>
>
> Hi Alistair,
>
> Yes, I agree with your approach but I got a little confused by the names.
>
> I think all the stuff that is inside the PMU subsystem architecture-wise
> should have generic PMU names (no ZCU102). I.e the ROM, the RAM, the
IOModule,
> interrupt controller etc.

Ok I can rename them to the ZynqMP PMU (the machine/board) and the
ZynqMP PMU SoC. Does that work?

>
> The IPI block can be outside of the PMU module and be instantiated by the
> board or perhaps better if we could reuse some of the ZynqMP modules
> instantiated by the ZCU102 machine to get a CPU-less PS for the PMU
> to interact with. Or something along those lines.
> How does that sound?>

I'm a little unclear what you mean here.

Thanks,
Alistair

> Best regards,
> Edgar
>


Hi Alistair,

To clarify, if you create a pmu object, try to only put stuff in that
object that belong to the pmu subsystem. The IPI block for example doesn't,
it belongs outside of the pmu object allthough you will want to instate an
ipi instance in the pmu machine.

Hope that helps
Edgar

Sent from my phone

Re: [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-12 Thread Konrad Rzeszutek Wilk

On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > 
> > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > Haozhong Zhang  wrote:
> > > > > 
> > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > > guest address space for vNVDIMM devices.
> > > > > > 
> > > > > > All patches can be found at
> > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > 
> > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > label data, as the Xen side support for labels is not implemented 
> > > > > > yet.
> > > > > > 
> > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a 
> > > > > > hotplug
> > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > device plugging path work on Xen.
> > > > > > 
> > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU 
> > > > > > is
> > > > > > used as the Xen device model.
> > > > > 
> > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > number of xen_enabled() invariants it introduced as well as
> > > > > with partial blobs it creates.
> > > > 
> > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > > more xen_enabled() if possible.
> > > > 
> > > > Haozhong, was there a design document thread on xen-devel about this? If
> > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > add a link to the design doc in the introductory email, so that
> > > > everybody can read it and be on the same page.
> > > 
> > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > the guest ACPI.
> > > 
> > > [1] 
> > > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > 
> > Igor, did you have a chance to read it?
> > 
> > .. see below
> > > 
> > > > 
> > > > 
> > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > >  1. use fw_cfg
> > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > 
> > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > 
> > > > > So what's stopping xen from using it elsewhere?,
> > > > > instead of adding more xen specific code to do 'the same'
> > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > 
> > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > guest context and generates the ACPI tables. I have no opinions on
> > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > > a full few of the system.
> > > > 
> > > > Here the question is: does it have to be QEMU the one to generate the
> > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > > like the rest, instead of introducing this split-brain design about
> > > > ACPI. We need to see a design doc to fully understand this.
> > > >
> > > 
> > > hvmloader runs in the guest and is responsible to build/load guest
> > > ACPI. However, it's not capable to build AML at runtime (for the lack
> > > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > > then be loaded by hvmloader at runtime.
> > > 
> > > Xen includes an OperationRegion "BIOS" in the static generated guest
> > > DSDT, whose address is hardcoded and which contains a list of values
> > > filled by hvmloader at runtime. Other ACPI objects can refer to those
> > > values (e.g., the number of vCPUs). But it's not enough for generating
> > > guest NVDIMM ACPI objects at compile time and then being customized
> > > and loaded by hvmload, because its structure (i.e., the number of
> > > namespace devices) cannot be decided util the guest config is known.
> > > 
> > > Alternatively, we may introduce an AML builder in hvmloader and build
> > > all guest ACPI completely in hvmloader. Looking at the similar
> > > implementation in QEMU, it would not be small, compared to the current
> > > size

Re: [Qemu-devel] [Xen-devel] [PATCH 3/3 v4] xenfb: Add [feature|request]-raw-pointer

2017-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2017, Paul Durrant wrote:
> > -Original Message-
> > From: Gerd Hoffmann [mailto:kra...@redhat.com]
> > Sent: 12 October 2017 10:26
> > To: Paul Durrant ; 'Stefano Stabellini'
> > ; Anthony Perard 
> > Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Owen Smith
> > 
> > Subject: Re: [Xen-devel] [PATCH 3/3 v4] xenfb: Add [feature|request]-raw-
> > pointer
> > 
> >   Hi,
> > 
> > > It's probably OS specific though. I guess the behaviour changed
> > > because the OS favours absolute pointing devices over relative ones
> > > and how it has two absolute ones to choose from. How it reconciles
> > > those, who knows?
> > 
> > Typically hid emulation calls qemu_input_handler_activate() when the
> > guest initializes the device, which moves the device to the top of the
> > priority list.
> > 
> > Visible effect on a typical guest with ps/2 mouse and usb-tablet is
> > that qemu switches from relative mode (mouse) to absolute mode (tablet)
> >  when the guest loads the usb hid driver.
> > 
> > I suspect pvmouse is doing the same thing.  So it may simply depend on
> > guest driver load order whenever pvmouse or usb-tablet is used.
> > 
> > Simplest fix is probably to only attach the device you plan to use to
> > the guest.  If you can't turn off pvmouse for xen guests then you might
> > want drop the qemu_input_handler_activate() call, so it behaves simliar
> > to the ps/2 mouse (is used in case no other pointer device is present).
> 
> Avoiding the activate call sounds reasonable and should avoid the behavioural 
> change.

+1

Owen, are you up for resubmitting the series with this small change?

Re: [Qemu-devel] [PATCH 2/2] linux-user: Fix TARGET_MTIOCTOP/MTIOCGET/MTIOCPOS values

2017-10-12 Thread Laurent Vivier

Le 12/10/2017 à 18:53, Peter Maydell a écrit :
> On 12 October 2017 at 17:49, Laurent Vivier  wrote:
>> Le 12/10/2017 à 17:30, Peter Maydell a écrit :
>>> +#if defined(TARGET_SPARC) || defined(TARGET_MIPS)
>>> +typedef abi_long target_kernel_daddr_t;
>>> +#else
>>> +typedef abi_int target_kernel_daddr_t;
>>> +#endif
>>
>> Perhaps you can add these ones into include/exec/user/abitypes.h ?
> 
> I don't think they belong there -- that file is for basic
> CPU ABI dependent types, not things which are just part of
> the kernel interface.

I agree

Laurent

Re: [Qemu-devel] [PATCH] xen: Log errno rather than return value

2017-10-12 Thread Paul Durrant

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+paul.durrant=citrix@nongnu.org] On Behalf Of Paul Durrant
> Sent: 12 October 2017 10:47
> To: Ross Lagerwall ; qemu-devel@nongnu.org
> Cc: Anthony Perard ; Ross Lagerwall
> ; Stefano Stabellini 
> Subject: Re: [Qemu-devel] [PATCH] xen: Log errno rather than return value
> 
> > -Original Message-
> > From: Qemu-devel [mailto:qemu-devel-
> > bounces+paul.durrant=citrix@nongnu.org] On Behalf Of Ross
> Lagerwall
> > Sent: 11 October 2017 16:52
> > To: qemu-devel@nongnu.org
> > Cc: Anthony Perard ; Ross Lagerwall
> > ; Stefano Stabellini 
> > Subject: [Qemu-devel] [PATCH] xen: Log errno rather than return value
> >
> > xen_modified_memory() sets errno to communicate what went wrong so
> > log
> > this rather than the return value which is not interesting.
> >
> > Signed-off-by: Ross Lagerwall 
> > ---
> >  hw/i386/xen/xen-hvm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> > index d9ccd5d..8028bed 100644
> > --- a/hw/i386/xen/xen-hvm.c
> > +++ b/hw/i386/xen/xen-hvm.c
> > @@ -1446,7 +1446,7 @@ void xen_hvm_modified_memory(ram_addr_t
> > start, ram_addr_t length)
> >  if (rc) {
> >  fprintf(stderr,
> >  "%s failed for "RAM_ADDR_FMT" ("RAM_ADDR_FMT"): %i,
> %s\n",
> > -__func__, start, nb_pages, rc, strerror(-rc));
> > +__func__, start, nb_pages, errno, strerror(errno));
> 
> I think this is actually a deeper problem. If QEMU is using compat code, which
> one way or another will go via xencall, then xen_modified_memory() will
> return a hypercall errno.

Having waded through the code now it turns out that (for linux at least) 
xencall returns the value returned from the underlying ioctl() call so it will 
actually return -1 with errno set correctly, so the above statement is 
incorrect. Thus there is no inconsistency and the patch DTRT.

Sorry for the noise.

  Paul

> However, if it goes via libxendevicemodel and an
> up-to-date privcmd, then it will return -1 and errno will be set. Thus I think
> the correct fix is not this patch, but a fix in xen_modified_memory() to 
> return
> -errno and a fix in libxendevicemodel and the compat code in libxencntrl to
> behave consistently. It's all rather horrible.
> 
>   Paul
> 
> >  }
> >  }
> >  }
> > --
> > 2.9.5
> >
>

Re: [Qemu-devel] [PATCH] tco: add trace events

2017-10-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1507816448-86665-1-git-send-email-pbonz...@redhat.com
Subject: [Qemu-devel] [PATCH] tco: add trace events

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1976413f8f tco: add trace events

=== OUTPUT BEGIN ===
Checking PATCH 1/1: tco: add trace events...
ERROR: code indent should never use tabs
#43: FILE: hw/acpi/tco.c:67:
+^I^I^Ilpc->pin_strap.spkr_hi,$

ERROR: code indent should never use tabs
#44: FILE: hw/acpi/tco.c:68:
+^I^I^I!!(gcs & ICH9_CC_GCS_NO_REBOOT));$

total: 2 errors, 0 warnings, 36 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH 2/2] linux-user: Fix TARGET_MTIOCTOP/MTIOCGET/MTIOCPOS values

2017-10-12 Thread Peter Maydell

On 12 October 2017 at 17:49, Laurent Vivier  wrote:
> Le 12/10/2017 à 17:30, Peter Maydell a écrit :
>> +#if defined(TARGET_SPARC) || defined(TARGET_MIPS)
>> +typedef abi_long target_kernel_daddr_t;
>> +#else
>> +typedef abi_int target_kernel_daddr_t;
>> +#endif
>
> Perhaps you can add these ones into include/exec/user/abitypes.h ?

I don't think they belong there -- that file is for basic
CPU ABI dependent types, not things which are just part of
the kernel interface.

>> +struct target_mtget {
>> +abi_long mt_type;
>> +abi_long mt_resid;
>> +abi_long mt_dsreg;
>> +abi_long mt_gstat;
>> +abi_long mt_erreg;
>> +target_kernel_daddr_t mt_fileno;
>> +target_kernel_daddr_t mt_blkno;
>> +};
>
> I think you need to update STRUCT(mtget, ...) in
> linux-user/syscall_types.h to reflect the size difference for MIPS and
> SPARC.

I thought about that but I wasn't feeling too enthusiastic
due to not having a test case... I suppose it's better to
change them both though.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/2] linux-user: Fix TARGET_MTIOCTOP/MTIOCGET/MTIOCPOS values

2017-10-12 Thread Laurent Vivier

Le 12/10/2017 à 17:30, Peter Maydell a écrit :
> The TARGET_MTIOCTOP/TARGET_MTIOCGET/TARGET_MTIOCPOS values
> were being defined in terms of host struct types, but
> these structures are such that their size might differ
> on different hosts. Switch to using a target struct
> definition instead.
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/syscall_defs.h | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index f7cc9f9..16d56fa 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -2706,9 +2706,34 @@ struct target_f_owner_ex {
>  #define TARGET_VFAT_IOCTL_READDIR_BOTHTARGET_IORU('r', 1)
>  #define TARGET_VFAT_IOCTL_READDIR_SHORT   TARGET_IORU('r', 2)
>  
> -#define TARGET_MTIOCTOPTARGET_IOW('m', 1, struct mtop)
> -#define TARGET_MTIOCGETTARGET_IOR('m', 2, struct mtget)
> -#define TARGET_MTIOCPOSTARGET_IOR('m', 3, struct mtpos)
> +struct target_mtop {
> +abi_short mt_op;
> +abi_int mt_count;
> +};
> +
> +#if defined(TARGET_SPARC) || defined(TARGET_MIPS)
> +typedef abi_long target_kernel_daddr_t;
> +#else
> +typedef abi_int target_kernel_daddr_t;
> +#endif

Perhaps you can add these ones into include/exec/user/abitypes.h ?

> +struct target_mtget {
> +abi_long mt_type;
> +abi_long mt_resid;
> +abi_long mt_dsreg;
> +abi_long mt_gstat;
> +abi_long mt_erreg;
> +target_kernel_daddr_t mt_fileno;
> +target_kernel_daddr_t mt_blkno;
> +};

I think you need to update STRUCT(mtget, ...) in
linux-user/syscall_types.h to reflect the size difference for MIPS and
SPARC.

Thanks,
Laurent

Re: [Qemu-devel] [PATCH 00/31] Allow configuring the qcow2 L2 cache entry size

2017-10-12 Thread Fam Zheng

On Thu, 10/12 09:11, Eric Blake wrote:
> Fam,
> 
> On 10/12/2017 09:02 AM, no-re...@patchew.org wrote:
> > Hi,
> > 
> > This series failed automatic build test. Please find the testing commands 
> > and
> > their output below. If you have docker installed, you can probably 
> > reproduce it
> > locally.
> > 
> 
> >   CC  block/qed.o
> >   CC  block/qed-l2-cache.o
> > [01m[K/tmp/qemu-test/src/block/qcow2.c:[m[K In function 
> > '[01m[Kqcow2_update_options_prepare[m[K':
> > [01m[K/tmp/qemu-test/src/block/qcow2.c:899:25:[m[K [01;31m[Kerror: 
> > [m[K'[01m[Kl2_cache_entry_size[m[K' may be used uninitialized in this 
> > function [[01;31m[K-Werror=maybe-uninitialized[m[K]
> >  r->l2_table_cache = [01;31m[Kqcow2_cache_create(bs, 
> > l2_cache_size,[m[K
> >  
> > [01;31m[K^[m[K
> >  [01;31m[K   
> > l2_cache_entry_size)[m[K;
> > 
> > [01;31m[K[m[K
> 
> Any chance we can tweak patchew to tell gcc to NOT use color escape
> sequences on errors?  While useful in an ANSI terminal, it tends to come
> across awkwardly in email clients that don't render ANSI escapes, and
> just adds bulk to the transcript.
> 

It's odd that three is "TERM=xterm" in this env. I'll check where it is from.

Fam

Re: [Qemu-devel] [PATCH] include/hw/or-irq.h: Drop unused in_irqs field

2017-10-12 Thread Alistair Francis

On Thu, Oct 12, 2017 at 6:17 AM, Peter Maydell  wrote:
> The struct OrIRQState has an unused member field in_irqs.
> This is a legacy of earlier versions of the patch; the
> code that used it was dropped from the final version of
> the code that went into master, but we forgot to delete
> the no-longer-used struct field. Do so now.
>
> Signed-off-by: Peter Maydell 

Good catch.

Reviewed-by: Alistair Francis 

Thanks,
Alistair

> ---
>  include/hw/or-irq.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/include/hw/or-irq.h b/include/hw/or-irq.h
> index d400a81..fd900fc 100644
> --- a/include/hw/or-irq.h
> +++ b/include/hw/or-irq.h
> @@ -38,7 +38,6 @@ struct OrIRQState {
>  DeviceState parent_obj;
>
>  qemu_irq out_irq;
> -qemu_irq *in_irqs;
>  bool levels[MAX_OR_LINES];
>  uint16_t num_lines;
>  };
> --
> 2.7.4
>
>

Re: [Qemu-devel] [PATCH 1/2] linux-user: Fix target FS_IOC_GETFLAGS and FS_IOC_SETFLAGS numbers

2017-10-12 Thread Laurent Vivier

Le 12/10/2017 à 17:30, Peter Maydell a écrit :
> We were defining TARGET_FS_IOC_GETFLAGS and TARGET_FS_IOC_SETFLAGS
> using the host 'long' type in the size field, which meant that
> they had the wrong values if the host and guest had different
> sized longs. Switch to abi_long instead.
> 
> This fixes a bug where these ioctls don't work on 32-bit guests
> on 64-bit hosts (and makes the LTP test 'setxattr03' pass
> where it did not previously.)
> 
> Reported-by: pgndev 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/syscall_defs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index 40c5027..f7cc9f9 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -1101,8 +1101,8 @@ struct target_pollfd {
>  /* Note that the ioctl numbers claim type "long" but the actual type
>   * used by the kernel is "int".
>   */
> -#define TARGET_FS_IOC_GETFLAGS TARGET_IOR('f', 1, long)
> -#define TARGET_FS_IOC_SETFLAGS TARGET_IOW('f', 2, long)
> +#define TARGET_FS_IOC_GETFLAGS TARGET_IOR('f', 1, abi_long)
> +#define TARGET_FS_IOC_SETFLAGS TARGET_IOW('f', 2, abi_long)
>  
>  #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
>  
> 

Reviewed-by: Laurent Vivier

[Qemu-devel] [PATCH v3 1/2] spapr_pci: fail gracefully with non-pseries machine types

2017-10-12 Thread Greg Kurz

QEMU currently crashes when the user tries to add an spapr-pci-host-bridge
on a non-pseries machine:

$ qemu-system-ppc64 -M ppce500 -device spapr-pci-host-bridge,index=1
hw/ppc/spapr_pci.c:1535:spapr_phb_realize:
Object 0x1003dacae60 is not an instance of type spapr-machine
Aborted (core dumped)

The same thing happens with the deprecated but still available child type
spapr-pci-vfio-host-bridge.

Fix both by checking the machine type with object_dynamic_cast().

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Greg Kurz 
---
v3: - make opencoded call to object_dynamic_cast()
- add detailed comment
---
 hw/ppc/spapr_pci.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5049ced4e8b4..5a3122a9f9f9 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1507,7 +1507,12 @@ static void spapr_pci_unplug_request(HotplugHandler 
*plug_handler,
 
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
 {
-sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+/* We don't use SPAPR_MACHINE() in order to exit gracefully if the user
+ * tries to add a sPAPR PHB to a non-pseries machine.
+ */
+sPAPRMachineState *spapr =
+(sPAPRMachineState *) object_dynamic_cast(qdev_get_machine(),
+  TYPE_SPAPR_MACHINE);
 SysBusDevice *s = SYS_BUS_DEVICE(dev);
 sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
 PCIHostState *phb = PCI_HOST_BRIDGE(s);
@@ -1519,6 +1524,11 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 const unsigned windows_supported =
 sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
 
+if (!spapr) {
+error_setg(errp, TYPE_SPAPR_PCI_HOST_BRIDGE " needs a pseries 
machine");
+return;
+}
+
 if (sphb->index != (uint32_t)-1) {
 sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 Error *local_err = NULL;

Re: [Qemu-devel] [PATCH 1/2] target/i386: introduce x86_ld*_code

2017-10-12 Thread Richard Henderson

On 10/12/2017 07:35 AM, Paolo Bonzini wrote:
> These take care of advancing s->pc, and will provide a unified point
> where to check for the 15-byte instruction length limit.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  target/i386/translate.c | 228 
> ++--
>  1 file changed, 125 insertions(+), 103 deletions(-)

Reviewed-by: Richard Henderson 


r~

1 2 3 >

1 - 100 of 293 matches

Mail list logo