date:20140716

On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
 On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
 
  On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com
  wrote:
 
  On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
 
  On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru
  wrote:
 
  On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com
  wrote:
 
  On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
 
  Hello,
 
  the issue is not specific to the iothread code because generic
  virtio-blk also hangs up:
 
 
  Do you know which version works well?  If you could bisect, that'll
  help a lot.
 
  Thanks,
  Amit
 
 
  Hi,
 
  2.0 works definitely well. I`ll try to finish bisection today, though
  every step takes about 10 minutes to complete.
 
 
  Yay! It is even outside of virtio-blk.
 
  commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
  Author: Marcelo Tosatti mtosa...@redhat.com
  Date:   Tue Jun 3 13:34:48 2014 -0300
 
  kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec
  calculation
 
  Ensure proper env-tsc value for kvmclock_current_nsec calculation.
 
  Reported-by: Marcin Gibuła m.gib...@beyond.pl
  Cc: qemu-sta...@nongnu.org
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 
 
  Andrey,
 
  Can you please provide instructions on how to create reproducible
  environment?
 
  The following patch is equivalent to the original patch, for
  the purposes of fixing the kvmclock problem.
 
  Perhaps it becomes easier to spot the reason for the hang you are
  experiencing.
 
 
  diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
  index 272a88a..feb5fc5 100644
  --- a/hw/i386/kvm/clock.c
  +++ b/hw/i386/kvm/clock.c
  @@ -17,7 +17,6 @@
   #include qemu/host-utils.h
   #include sysemu/sysemu.h
   #include sysemu/kvm.h
  -#include sysemu/cpus.h
   #include hw/sysbus.h
   #include hw/kvm/clock.h
 
  @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 
   cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time));
 
  -assert(time.tsc_timestamp = migration_tsc);
   delta = migration_tsc - time.tsc_timestamp;
   if (time.tsc_shift  0) {
   delta = -time.tsc_shift;
  @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
  int running,
   if (s-clock_valid) {
   return;
   }
  -
  -cpu_synchronize_all_states();
   ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
   if (ret  0) {
   fprintf(stderr, KVM_GET_CLOCK failed: %s\n,
  strerror(ret));
  diff --git a/migration.c b/migration.c
  index 8d675b3..34f2325 100644
  --- a/migration.c
  +++ b/migration.c
  @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
   qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
   old_vm_running = runstate_is_running();
 
  +cpu_synchronize_all_states();
   ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
   if (ret = 0) {
   qemu_file_set_rate_limit(s-file, INT64_MAX);
 
 
  It could also be useful to apply the above patch _and_ revert
  a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
 
  Paolo

 Yes, it solved the issue for me! (it took much time to check because
 most of country` debian mirrors went inconsistent by some reason)

 Also trivial addition:

 diff --git a/migration.c b/migration.c
 index 34f2325..65d1c88 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -25,6 +25,7 @@
  #include qemu/thread.h
  #include qmp-commands.h
  #include trace.h
 +#include sysemu/cpus.h

 And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?

 That is, test with a stock qemu.git tree and the patch sent today,
 on this thread, to move cpu_synchronize_all_states ?



The main reason for things to work for me is a revert of
9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
patches. I had tested two cases, with Alexander`s patch completely
reverted plus suggestion from Marcelo and only with revert 9b178682
plug same suggestion. The difference is that the until Alexander`
patch is not reverted, live migration is always failing by the timeout
value, and when reverted migration always succeeds in 8-10 seconds.
Appropriate diffs are attached for the reference.
diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..93e1829 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
 #include qemu/host-utils.h
 #include sysemu/sysemu.h
 #include sysemu/kvm.h
-#include sysemu/cpus.h
 #include hw/sysbus.h
 #include hw/kvm/clock.h
 
@@ -36,49 +35,6 @@ typedef struct KVMClockState {

Re: [Qemu-devel] hw/arm: add Lego NXT board

2014-07-16 Thread Alexander Graf


On 07/15/2014 10:09 PM, Paolo Bonzini wrote:

BTW, sorry for the confusion.  There is another frequent contributor to
QEMU with the same name as yours.  I only noticed now that the email
address is different.


Oh right I noticed that; should have said something.


4. Now, thanks to the help on this list I know there is a the -chardev
functionality in qemu that basically archives what I did by hand using
pipes. Now my idea is to port my own proprietary implementation to
the qemu way - chardevs. You said you think it is a bad idea to build
a device that directly translates I/O memory access to a chardev. I
still don't understand why. Is this all about legal issues or is there a
technical reason?



I think that there are two other ways to do it.


And what about the chardev way? Your silence in this regard is 
misterious ;-)


1) [...]
2) [...]


Thanks a lot for your suggestions. I will play a bit and come back when 
I have some code to show.


Alexander

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging

2014-07-16 Thread Alexander Graf

 Am 16.07.2014 um 10:32 schrieb Joakim Tjernlund 
 joakim.tjernl...@transmode.se:

 Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 09:55:50:

 From: Riku Voipio riku.voi...@iki.fi
 To: Joakim Tjernlund joakim.tjernl...@transmode.se,
 Cc: Alexander Graf ag...@suse.de, Peter Maydell
 peter.mayd...@linaro.org, qemu-...@nongnu.org qemu-...@nongnu.org, 
 QEMU Developers qemu-devel@nongnu.org
 Date: 2014/07/16 09:55
 Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive
 logging

 On Sat, Jul 12, 2014 at 04:06:09PM +0200, Joakim Tjernlund wrote:
 Alexander Graf ag...@suse.de wrote on 2014/07/12 12:41:05:

 On 12.07.14 12:40, Peter Maydell wrote:
 On 12 July 2014 10:39, Alexander Graf ag...@suse.de wrote:
 On 12.07.14 10:58, Peter Maydell wrote:
 On 12 July 2014 01:39, Alexander Graf ag...@suse.de wrote:
 What do the other platforms do on illegal instructions during
 user 
 mode?
 Any way we can get consistency across the board?
 Mostly it looks like they just silently generate the SIGILL.
 Consistency has never been our strong point :-)

 That means this patch brings things towards consistency? It's
 certainly good
 for me then :)
 No, this just removes one use of this logging. If you
 wanted consistency we should remove all of them...

 Agreed :)

 So can I infer from this discussion that you will apply the patch?

 I think Peter and Alex suggest that the EXCP_DUMP() loggings should be
 removed from all cases in PPC code where TARGET_SIGILL is risen. Your
 patch removes just once case, and that would make PPC code become
 internally inconsistent where some SIGILLs are logged and others aren't.

 Something like that. This is one step in that direction. We(or I cannot) 
 do
 the consistency for all cases/arches at once. With the patch we become
 one step closer to the Linux kernel so I don't see why not apply it.

That's not how it will end up though. If we apply this one patch now, it will 
stay that way forever and be even more confusing and inconsistent than today.

I think what we really want is proper -d int logging on all archs for 
linux-user. This patch is getting us nowhere close to it.

Alex

Re: [Qemu-devel] hw/arm: add Lego NXT board


Il 16/07/2014 10:40, Alexander Graf ha scritto:





I think that there are two other ways to do it.


And what about the chardev way? Your silence in this regard is
misterious ;-)


The main problem with chardevs is that they are completely asynchronous. 
 So they may not be a good match for your use case.


(In fact, this is a problem for the i2c-over-chardev and especially 
spi-over-chardev ideas I shot out yesterday.  For i2c-over-chardev one 
could add support to clock stretching in QEMU, but SPI is entirely 
synchronous).


Note that -qmp and -qtest both use chardevs internally to connect the 
external program with QEMU.


Paolo



1) [...]
2) [...]


Thanks a lot for your suggestions. I will play a bit and come back when
I have some code to show.

[Qemu-devel] [PULL for-2.1] virtio-serial: bugfix for virtconsole port unplug

Hi,

This is a small bugfix to prevent port unplug causing port 0 getting
unreserved for virtconsole use.

Patch has been on-list for a few days and has been reviewed.

Please pull.


The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-serial.git for-2.1

for you to fetch changes up to 57d84cf35302fe51789c18354bf09a521bb603df:

  virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug 
(2014-07-16 14:32:40 +0530)


Amit Shah (1):
  virtio-serial-bus: keep port 0 reserved for virtconsole even on unplug

 hw/char/virtio-serial-bus.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)


Amit

[Qemu-devel] [PULL for-2.1] migration: detect section renames in vmstate static checker script

Hi,

A minor enhancement to the static checker script that detects section
renames.  One such rename was added recently (ICH9-LPC - ICH9 LPC).

This doesn't affect the qemu code, and enhances a test program, so I
recommend we pull it for 2.1.  Patch has been on list for a few days.

The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git for-2.1

for you to fetch changes up to 79fe16c0489ca658f53796206067a551fc915ba2:

  vmstate static checker: detect section renames (2014-07-16 14:29:34 +0530)


Amit Shah (1):
  vmstate static checker: detect section renames

 scripts/vmstate-static-checker.py | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)


Amit

[Qemu-devel] [PULL for-2.1] virtio-rng: Fix abort on invalid input

Hi,

This patch returns an error instead of aborting, which is desirable
not just for cmdline invocation, but prevents an abort in case of
device hotplug.

Patch is small, and reviewed on-list.

Please pull,

The following changes since commit 5a7348045091a2bc15d85bb177e5956aa6114e5a:

  Update version for v2.1.0-rc2 release (2014-07-15 18:55:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-rng.git for-2.1

for you to fetch changes up to 9ef6be93250e46d35062c84d5c75c7cb515dc27c:

  virtio-rng: Add human-readable error message for negative max-bytes parameter 
(2014-07-16 14:25:29 +0530)


John Snow (1):
  virtio-rng: Add human-readable error message for negative max-bytes 
parameter

 hw/virtio/virtio-rng.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Amit

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging

2014-07-16 Thread Alex Bennée


Joakim Tjernlund writes:

 Alexander Graf ag...@suse.de wrote on 2014/07/12 02:39:21:
 
 
 On 11.07.14 20:22, Peter Maydell wrote:
  On 11 July 2014 19:15, Joakim Tjernlund 
 joakim.tjernl...@transmode.se wrote:
  Peter Maydell peter.mayd...@linaro.org wrote on 2014/07/11 
 19:14:25:
  On 11 July 2014 16:18, Joakim Tjernlund 
 joakim.tjernl...@transmode.se
  wrote:
  ppc logs every type of Invalid instruction. This generates a lot
  Rather than just deleting this EXCP_DUMP, I would suggest
  changing the EXCP_DUMP macro so  it only does anything
  if the user has passed the -d int debug logging flag:
  I don't think ppc wants that. They want unconditionally
  debug on to get relevant bug reports. This one is getting in the
  way of normal operations so I think it should be deleted.
  If the PPC maintainers want that behaviour then they need
  to defend it. No other architecture's linux-user code spews
  junk to stderr for exceptions, and PPC shouldn't either.
  The debug log switches are exactly for allowing us to
  say please turn on debug logging when bugs are reported,
  and those are what we should use.
 
 I agree - and it's how we behave in system emulation mode already :).
 
 What do the other platforms do on illegal instructions during user mode? 

 Any way we can get consistency across the board?

 In this case it is not an illegal insn, it is a valid insn but not just
 for the QEMU emulated CPU.

On aarch64 we do:

#define unsupported_encoding(s, insn)\
do { \
qemu_log_mask(LOG_UNIMP, \
  %s:%d: unsupported instruction encoding 0x%08x   \
  at pc=%016 PRIx64 \n,  \
  __FILE__, __LINE__, insn, s-pc - 4);  \
unallocated_encoding(s); \
} while (0);

So we signal it's unimplemented before falling through to the signal
generating code.

-- 
Alex Bennée

Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops

* Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
 * Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
 From: Dr. David Alan Gilbert dgilb...@redhat.com
 
 Postcopy needs to have two migration streams loading concurrently;
 one from memory (with the device state) and the other from the fd
 with the memory transactions.
 
 Can you explain this?
 
 I would have though the order is
 
 precopy RAM and everything
 prepare postcopy RAM (sent  dirty bitmap)
 finish precopy non-RAM
 finish devices
 postcopy RAM
 
 Why do you need to have all the packaging stuff and a separate memory-based
 migration stream for devices?  I'm sure I'm missing something. :)
 
 The thing you're missing is the details of 'finish devices'.
 The device emulation may access guest memory as part of loading it's
 state, so you can't successfully complete 'finish devices' without
 having the 'postcopy RAM' available to provide pages.
 
 I see.  Can you document the flow (preferrably as a reply to this email
 _and_ in docs/ when you send v2 of the code :))?

I thought I documented enough in the docs/migration.txt stuff in the last
patch (see the 'Postcopy states' section); however lets see if I the following
is better:


Postcopy stream

Loading of device data may cause the device emulation to access guest RAM
that may trigger faults that have to be resolved by the source, as such
the migration stream has to be able to respond with page data *during* the
device load, and hence the device data has to be read from the stream completely
before the device load begins to free the stream up.  This is acheived by
'packaging' the device data into a blob that's read in one go.

Source behaviour

Until postcopy is entered the migration stream is identical to normal postcopy,
except for the addition of a 'postcopy advise' command at the beginning to
let the destination know that postcopy might happen.  When postcopy starts
the source sends the page discard data and then forms the 'package' containing:

   Command: 'postcopy ram listen'
   The device state
  A series of sections, identical to the precopy streams device state stream
  containing everything except postcopiable devices (i.e. RAM)
   Command: 'postcopy ram run'

The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
contents are formatted in the same way as the main migration stream.

Destination behaviour

Initially the destination looks the same as precopy, with a single thread
reading the migration stream; the 'postcopy advise' and 'discard' commands
are processed to change the way RAM is managed, but don't affect the stream
processing.

--
1  2   3 4 5  6   7
main -DISCARD-CMD_PACKAGED ( LISTEN  DEVICE DEVICE DEVICE RUN )
thread |   |
   | (page request)   
   |\___
   v\
listen thread: --- page -- page -- page -- page -- page --

   a   bc
--

On receipt of CMD_PACKAGED (1)
   All the data associated with the package - the ( ... ) section in the
diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
recurses into qemu_loadvm_state_main to process the contents of the package (2)
which contains commands (3,6) and devices (4...)

On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
a new thread (a) is started that takes over servicing the migration stream,
while the main thread carries on loading the package.   It loads normal
background page data (b) but if during a device load a fault happens (5) the
returned page (c) is loaded by the listen thread allowing the main threads
device load to carry on.

The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the 
destination
CPUs start running.
At the end of the CMD_PACKAGED (7) the main thread returns to normal running 
behaviour
and is no longer used by migration, while the listen thread carries
on servicing page data until the end of migration.



Is that any better?

Dave
P.S. I know of at least one bug in this code at the moment, it happens
on a VM that doesn't have many dirty pages where all the pages are
transmitted, and hence the listen thread finishes, before the main thread
gets to 'run'.

 From my cursory read of the code it is something like this on the source:
 
 finish precopy non-RAM
 start RAM postcopy
 for each device
 pack up data
 send it to destination
 
 and on the destination:
 
 while source sends packet
 pick up packet atomically
 pass

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
 From: Dr. David Alan Gilbert dgilb...@redhat.com
 
 Postcopy needs a method to send messages from the destination back to
 the source, this is the 'return path'.
 

snip

 +/* Give a QEMUFile* off the same socket but data in the opposite
 + * direction.
 + * qemu_fopen_socket marks write fd's as blocking, but doesn't
 + * touch read fd's status, so we dup the fd just to keep settings
 + * separate. [TBD: Do I need to explicitly mark as non-block on read?]
 + */
 +static QEMUFile *socket_dup_return_path(void *opaque)
 +{
 +QEMUFileSocket *qfs = opaque;
 +int revfd;
 +bool this_is_read;
 +QEMUFile *result;
 +
 +/* If it's already open, return it */
 +if (qfs-file-return_path) {
 +return qfs-file-return_path;
 
 Wouldn't this leave a dangling file descriptor if you call
 socket_dup_return_path twice, and then close the original QEMUFile?

Hmm - how?

 +}
 +
 +if (qemu_file_get_error(qfs-file)) {
 +/* If the forward file is in error, don't try and open a return */
 +return NULL;
 +}
 +
 +/* I don't think there's a better way to tell which direction 'this' is 
 */
 +this_is_read = qfs-file-ops-get_buffer != NULL;
 +
 +revfd = dup(qfs-fd);
 +if (revfd == -1) {
 +error_report(Error duplicating fd for return path: %s,
 +  strerror(errno));
 +return NULL;
 +}
 +
 +qemu_set_nonblock(revfd);
 
 Blocking/nonblocking is per-file *description*, not descriptor.  So you're
 making the original QEMUFile nonblocking as well.  Can you explain why this
 is needed before I reach the meat of the patch series?

Yes, I went through that a few times until I got that it was per-entity not
the fd itself, it still makes life easier for the rest of the QEMUFile
code to have a separate fd for it (hence still worth the dup).

 In other words, can you draw a table with source/dest and read/write, and
 whether it should be blocking or non-blocking?

Sure; the non-blocking ness is mostly on the source side;
modifying the table in the docs patch a little:

  Source side
 Forward path - written by migration thread
: It's OK for this to be blocking, but we end up with it being
  non-blocking, and modify the socket code to emulate blocking.

 Return path  - opened by main thread, read by fd_handler on main thread
: Must be non-blocking so as not to block the main thread while
  waiting for a partially sent command.

  Destination side
 Forward path - read by main thread
 Return path  - opened by main thread, written by main thread AND postcopy
thread (protected by rp_mutex)

 I think I'm OK with both these being blocking.

Dave

 
 Paolo
 
 +result = qemu_fopen_socket(revfd, this_is_read ? wb : rb);
 +qfs-file-return_path = result;
 +
 +if (result) {
 +/* We are the reverse path of our reverse path (although I don't
 +   expect this to be used, it would stop another dup if it was */
 +result-return_path = qfs-file;
 +} else {
 +close(revfd);
 +}
 +
 +return result;
 +}
 +
  static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int 
  iovcnt,
  int64_t pos)
  {
 @@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
  }
 
  static const QEMUFileOps socket_read_ops = {
 -.get_fd = socket_get_fd,
 -.get_buffer = socket_get_buffer,
 -.close =  socket_close
 +.get_fd  = socket_get_fd,
 +.get_buffer  = socket_get_buffer,
 +.close   = socket_close,
 +.get_return_path = socket_dup_return_path
  };
 
  static const QEMUFileOps socket_write_ops = {
 -.get_fd = socket_get_fd,
 -.writev_buffer = socket_writev_buffer,
 -.close =  socket_close
 +.get_fd  = socket_get_fd,
 +.writev_buffer   = socket_writev_buffer,
 +.close   = socket_close,
 +.get_return_path = socket_dup_return_path
  };
 
 +/*
 + * Result: QEMUFile* for a 'return path' for comms in the opposite direction
 + * NULL if not available
 + */
 +QEMUFile *qemu_file_get_return_path(QEMUFile *f)
 +{
 +if (!f-ops-get_return_path) {
 +return NULL;
 +}
 +return f-ops-get_return_path(f-opaque);
 +}
 +
  bool qemu_file_mode_is_not_valid(const char *mode)
  {
  if (mode == NULL ||
 
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets


Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:


+
+/* If it's already open, return it */
+if (qfs-file-return_path) {
+return qfs-file-return_path;


Wouldn't this leave a dangling file descriptor if you call
socket_dup_return_path twice, and then close the original QEMUFile?


Hmm - how?


The problem is that there is no reference count on QEMUFile, so if you do

  f1 = open_return_path(f0);
  f2 = open_return_path(f0);
  /* now f1 == f2 */
  qemu_fclose(f1);
  /* now f2 is dangling */

The remark about closing the original QEMUFile is also related to this 
part:


if (result) {
/* We are the reverse path of our reverse path (although I don't
   expect this to be used, it would stop another dup if it was /
result-return_path = qfs-file;

which has a similar bug

  f1 = open_return_path(f0);
  f2 = open_return_path(f1);
  /* now f2 == f0 */
  qemu_fclose(f0);
  /* now f2 is dangling */

If this is correct, the simplest fix is to drop the optimization.



  Source side
 Forward path - written by migration thread
: It's OK for this to be blocking, but we end up with it being
  non-blocking, and modify the socket code to emulate blocking.


This likely has a performance impact though.  The first migration thread 
code drop from Juan already improved throughput a lot, even if it kept 
the iothread all the time and only converted from nonblocking writes to 
blocking.



 Return path  - opened by main thread, read by fd_handler on main thread
: Must be non-blocking so as not to block the main thread while
  waiting for a partially sent command.


Why can't you handle this in the migration thread (or a new postcopy 
thread on the source side)?  Then it can stay blocking.



  Destination side
 Forward path - read by main thread


This must be nonblocking so that the monitor keeps responding.


 Return path  - opened by main thread, written by main thread AND postcopy
thread (protected by rp_mutex)


When does the main thread needs to write?

If it doesn't need that, you can just switch to blocking when you 
process the listen command (i.e. when the postcopy thread starts).


Paolo


 I think I'm OK with both these being blocking.

Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking on tap

2014-07-16 Thread Wangkai (Kevin,C)

 -Original Message-
 From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
 Sent: Tuesday, July 15, 2014 11:00 PM
 To: Wangkai (Kevin,C)
 Cc: Stefan Hajnoczi; Lee yang; qemu-devel@nongnu.org;
 aligu...@amazon.com
 Subject: Re: [Qemu-devel] [PATCH] Tap: fix vcpu long time io blocking
 on tap

 On Mon, Jul 14, 2014 at 10:44:58AM +, Wangkai (Kevin,C) wrote:
  Here the detail network:

  ++
  | The host add tap1 and eth10 to bridge 'br1'| +-
 ---+
  | ++ | |
 send  |
  | |   VM  eth1-+-tap1 --- bridge --- eth10 --+-+
  | | packets|
  | ++ | |
 |
  ++
  ++ ++

  Qemu start vm by virtio, use tap interface, option is:
  -net nic,vlan=101, model=virtio -net
  tap,vlan=101,ifname=tap1,script=no,downscript=no

 Use the newer -netdev/-device syntax to get offload support and
 slightly better performance:

 -netdev tap,id=tap0,ifname=tap1,script=no,downscript=no \ -device
 virtio-net-pci,netdev=tap0

  And add tap1 and eth10 to bridge br1 in the host:
  Brctl addif br1 tap1
  Brctl addif br1 eth10

  total recv 505387 time 2000925 us:
  means call tap_send once dealing 505387 packets, the packet payload
  was 300 bytes, and time use for tap_send() was 2,000,925
  micro-seconds, time was measured by record time stamp at function
 tap_send() start and end.

  We just test the performance of VM.

 That is 150 MB of incoming packets in a single tap_send().  Network rx
 queues are maybe a few 1000 packets so I wonder what is going on here.

 Maybe more packets are arriving while QEMU is reading them and we keep
 looping.  That's strange though because the virtio-net rx virtqueue
 should fill up (it only has 256 entries).

 Can you investigate more and find out exactly what is going on?  It's
 not clear yet that adding a budget is the solution or just hiding a
 deeper problem.

 Stefan
[Wangkai (Kevin,C)] 

Hi Stefan,

Yes, I have one machine keep sending packets 300M/second, about 1M packets 
per second, I am checking the code, you mean virtqueue has only 256 entries,
is this member keep the value? 
vq-vring.num

I will check and add some debug info to check this problem.

I have got the most time taking by the virtio_net_receive()
virtio_net_receive
qemu_deliver_packet
qemu_net_queue_send
net_hub_port_receive
qemu_deliver_packet
qemu_net_queue_send
tap_send
qemu_iohandler_poll
main_loop_wait
main_loop
main

regards
Wangkai

[Qemu-devel] [Bug 1185228] Re: with 'monitor pty', it needs to flush pts device after sending command to it

This is not a bug.  What happens is that QEMU echoes the command you
type onto the PTY. If you do not do a cat /dev/pts/8, the PTY's buffer
gets full and QEMU stops reading from the PTY until it is emptied.

It doesn't happen if you use QMP.

** Changed in: qemu
   Status: New = Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1185228

Title:
  with 'monitor pty', it needs to flush pts device after sending command
  to it

Status in QEMU:
  Invalid

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit: 660696d1d16a71e15549ce1bf74953be1592bcd3
  qemu.git Commit: 64afc2b4d48fb21e085517c38a59a3f61a11283c
  Host Kernel Version:3.9.0-rc3
  Hardware: SandyBridge

  Bug detailed description:
  --
  when creating guest with parameter -monitor pty, then send command (e.g. 
migrate, device_add) to the pts device.  If I don't flush the pts device (e.g. 
using cat /dev/pts/8) to flush the pts device, the monitor command doesn't take 
effect. I know some old qemu (e.g. 1.3) didn't have this issue.

  
  create KVM guest:
  [root@vt-snb9 ~]# qemu-system-x86_64 -enable-kvm -m 1024 -smp 2  -net none 
/root/cathy/rhel6u4.qcow -monitor pty
  char device redirected to /dev/pts/8 (label compat_monitor0)
  VNC server running on `::1:5900'

  hot plug a PCI device:
  [root@vt-snb9 ~]# echo device_add pci-assign,host=05:10.0,id=nic /dev/pts/8
   (here, the device cannot hot plug to the guest.)

  if you run command cat /dev/pts/8, the device will hot plug to the
  guest.

  hot remove the device:
  [root@vt-snb9 ~]# echo device_del nic/dev/pts/8
  (here, the device cannot hot remove from the guest.)

  if you run command cat /dev/pts/8 the device can hot remove from the
  guest


  Reproduce steps:
  
  1. qemu-system-x86_64 -enable-kvm -m 1024 -smp 2  -net none 
/root/cathy/rhel6u4.qcow -monitor pty
  2. echo device_add pci-assign,host=05:10.0,id=nic /dev/pts/8
  (or using 'migrate' command in qemu monitor redirection device /dev/pts/8 )

  
  Current result:
  
  device can not hot plug or hot remove from the guest before exec 'cat 
/dev/pts/8'.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1185228/+subscriptions

[Qemu-devel] [PATCH 0/2] virtio-serial: Prevent addition of ports with same name

Hi,

The 2nd patch prevents adding ports to the system with conflicts in
the 'name' parameter.  This is done by going through all the ports in
the system, including ports on other virtio-serial devices.

The first patch prepares for this by creating a linked list of all
available virtio-serial devices, so they can be walked over to check
their ports' names.

Please review.

Amit Shah (2):
  virtio-serial: create a linked list of all active devices
  virtio-serial: search for duplicate port names before adding new ports

 hw/char/virtio-serial-bus.c   | 32 
 include/hw/virtio/virtio-serial.h |  2 ++
 2 files changed, 34 insertions(+)

-- 
1.9.3

[Qemu-devel] [PATCH 1/2] virtio-serial: create a linked list of all active devices

To ensure two virtserialports don't get added to the system with the
same 'name' parameter, we need to access all the ports on all the
devices added, and compare the names.

We currently don't have a list of all VirtIOSerial devices added to the
system.  This commit adds a simple linked list in which devices are put
when they're initialized, and removed when they go away.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/char/virtio-serial-bus.c   | 10 ++
 include/hw/virtio/virtio-serial.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 07bebc0..8c26f4e 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -26,6 +26,10 @@
 #include hw/virtio/virtio-serial.h
 #include hw/virtio/virtio-access.h
 
+struct VirtIOSerialDevices {
+QLIST_HEAD(, VirtIOSerial) devices;
+} vserdevices;
+
 static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id)
 {
 VirtIOSerialPort *port;
@@ -975,6 +979,8 @@ static void virtio_serial_device_realize(DeviceState *dev, 
Error **errp)
  */
 register_savevm(dev, virtio-console, -1, 3, virtio_serial_save,
 virtio_serial_load, vser);
+
+QLIST_INSERT_HEAD(vserdevices.devices, vser, next);
 }
 
 static void virtio_serial_port_class_init(ObjectClass *klass, void *data)
@@ -1003,6 +1009,8 @@ static void virtio_serial_device_unrealize(DeviceState 
*dev, Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOSerial *vser = VIRTIO_SERIAL(dev);
 
+QLIST_REMOVE(vser, next);
+
 unregister_savevm(dev, virtio-console, vser);
 
 g_free(vser-ivqs);
@@ -1027,6 +1035,8 @@ static void virtio_serial_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
 
+QLIST_INIT(vserdevices.devices);
+
 dc-props = virtio_serial_properties;
 set_bit(DEVICE_CATEGORY_INPUT, dc-categories);
 vdc-realize = virtio_serial_device_realize;
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index 4746312..a679e54 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -202,6 +202,8 @@ struct VirtIOSerial {
 
 QTAILQ_HEAD(, VirtIOSerialPort) ports;
 
+QLIST_ENTRY(VirtIOSerial) next;
+
 /* bitmap for identifying active ports */
 uint32_t *ports_map;
 
-- 
1.9.3

[Qemu-devel] [PATCH 2/2] virtio-serial: search for duplicate port names before adding new ports

Before adding new ports to VirtIOSerial devices, check if there's a
conflict in the 'name' parameter.  This ensures two virtserialports with
identical names are not initialized.

Reported-by: mazh...@redhat.com
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/char/virtio-serial-bus.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 8c26f4e..76e4228 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -56,6 +56,22 @@ static VirtIOSerialPort *find_port_by_vq(VirtIOSerial *vser, 
VirtQueue *vq)
 return NULL;
 }
 
+static VirtIOSerialPort *find_port_by_name(char *name)
+{
+VirtIOSerial *vser;
+
+QLIST_FOREACH(vser, vserdevices.devices, next) {
+VirtIOSerialPort *port;
+
+QTAILQ_FOREACH(port, vser-ports, next) {
+if (!strcmp(port-name, name)) {
+return port;
+}
+}
+}
+return NULL;
+}
+
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
@@ -847,6 +863,12 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+if (find_port_by_name(port-name)) {
+error_setg(errp, virtio-serial-bus: A port already exists by name %s,
+   port-name);
+return;
+}
+
 if (port-id == VIRTIO_CONSOLE_BAD_ID) {
 if (plugging_port0) {
 port-id = 0;
-- 
1.9.3

Re: [Qemu-devel] [PATCH] linux-user: Add binfmt wrapper

2014-07-16 Thread Joakim Tjernlund

Riku Voipio riku.voi...@iki.fi wrote on 2014/07/16 08:54:45:
 
 On Tue, Jul 15, 2014 at 05:11:48PM +0200, Joakim Tjernlund wrote:
  Riku Voipio riku.voi...@iki.fi wrote on 2014/07/15 16:12:26:
   On Mon, Jul 14, 2014 at 05:38:49PM +0200, Joakim Tjernlund wrote:
Alexander Graf ag...@suse.de wrote on 2014/07/14 17:21:33:
 On 14.07.14 16:38, Joakim Tjernlund wrote:
  The popular binfmt-wrapper patch adds an additional
  executable which mangle argv suitable for binfmt flag P.
  In a chroot you need the both (statically linked) qemu-$arch
  and qemu-$arch-binfmt-wrapper. This is sub optimal and a
  better approach is to recognize the -binfmt-wrapper extension
  within linux-user(qemu-$arch) and mangle argv there.
  This just produces on executable which can be either copied to
  the chroot or bind mounted with the appropriate 
-binfmt-wrapper
  suffix.
 
  Signed-off-by: Joakim Tjernlund 
joakim.tjernl...@transmode.se
 
 Please make sure to CC Riku on patches like this - he is the 
  linux-user 
 maintainer.

Doesn't he read the devel list? Anyhow CC:ed
   
   I do - but CC gets directly to my inbox while qemu-devel goes to an
   folder.
   
   I take from this discussion, that this patch has been superceded by 
the
   Patch at: http://patchwork.ozlabs.org/patch/369770/ ?
  
  BTW, any chance qemu binfmt could fixup the ps output from within a 
  container:
   jocke-ppc2 ~ # ps uaxww
  USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME 
COMMAND
  root 1  0.1  0.0 4138016 7600 ?Ss   17:02   0:00 
  /usr/bin/qemu-ppc /sbin/init /sbin/init
  root79  0.0  0.0 4138016 5792 ?Ss   17:02   0:00 
  /usr/bin/qemu-ppc /usr/sbin/sshd /usr/sbin/sshd
  root   293  0.0  0.0 4137952 4072 ?Ss   17:02   0:00 
  /usr/bin/qemu-ppc /bin/busybox /bin/busybox udhcpc -x 
hostname:jocke-ppc2 
  --interface=eth0 --now --script=/lib/netifrc/sh/udhcpc-hook.sh 
  --pidfile=/var/run/udhcpc-eth0.pid
  root   334  0.3  0.0 4138016 5964 tty1 Ss+  17:02   0:00 
  /usr/bin/qemu-ppc /sbin/agetty /sbin/agetty 38400 tty1 linux
  root   335  3.1  0.0 4138048 7064 console  Ss   17:02   0:00 
  /usr/bin/qemu-ppc /bin/login /bin/login -- root
  root   336  3.3  0.0 4138016 9764 console  S17:02   0:00 
  /usr/bin/qemu-ppc /bin/bash -bash
  root   340  0.0  0.0 4138016 6336 ?R+   Jul10   0:00 
/bin/ps 
  ps uaxww
  
  As you can see, qemu-ppc is visible. 
 
 This isn't something binfmt could do. ps uses /proc/$pid/exe to map the
 right binary. However, qemu already fixes /proc/self/exe to point to
 right file - as you can see from the last line where ps uaxww doesn't
 have qemu shown. So it would be possible to make do_open() in syscall.c 
do
 similar mapping for /proc/$pid/exe. 
 
 Exactly how and when the qemu processes should be hidden within qemu
 needs careful thought thou. A naive approach would check if
 /proc/$pid/exe points to same binary as /proc/self/exe, hiding at least
 same architecture qemu processes.

Took a look at do_open(), what horror to read what you do there.
How on earth is this supposed to work?
You don't separate normal qemu invocation from binfmt, only adjust
self stuff, always only remove one entry(binfmt P flag is 2 entries)

Reasonably this hiding should only be performed when started by binfmt?
What are the other uses for ?

 Jocke

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
 
 +
 +/* If it's already open, return it */
 +if (qfs-file-return_path) {
 +return qfs-file-return_path;
 
 Wouldn't this leave a dangling file descriptor if you call
 socket_dup_return_path twice, and then close the original QEMUFile?
 
 Hmm - how?
 
 The problem is that there is no reference count on QEMUFile, so if you do
 
   f1 = open_return_path(f0);
   f2 = open_return_path(f0);
   /* now f1 == f2 */
   qemu_fclose(f1);
   /* now f2 is dangling */

I think from the way I'm using it, I can remove the optimisation, but I do
need to check.

I'm not too sure what your worry is about 'f2' in this case; I guess the caller
needs to know that it should only close the return path once - is that
your worry?

 The remark about closing the original QEMUFile is also related to this
 part:
 
 if (result) {
 /* We are the reverse path of our reverse path (although I don't
expect this to be used, it would stop another dup if it was /
 result-return_path = qfs-file;
 
 which has a similar bug
 
   f1 = open_return_path(f0);
   f2 = open_return_path(f1);
   /* now f2 == f0 */
   qemu_fclose(f0);
   /* now f2 is dangling */
 
 If this is correct, the simplest fix is to drop the optimization.

I'm more nervous about dropping that one, because the current scheme
does provide a clean way of finding the forward path if you've got the
reverse (although I don't think I make use of it).

   Source side
  Forward path - written by migration thread
 : It's OK for this to be blocking, but we end up with it being
   non-blocking, and modify the socket code to emulate blocking.
 
 This likely has a performance impact though.  The first migration thread
 code drop from Juan already improved throughput a lot, even if it kept the
 iothread all the time and only converted from nonblocking writes to
 blocking.

Can you give some more reasoning as to why you think this will hit the
performance so much; I thought the output buffers were quite big anyway.

  Return path  - opened by main thread, read by fd_handler on main thread
 : Must be non-blocking so as not to block the main thread while
   waiting for a partially sent command.
 
 Why can't you handle this in the migration thread (or a new postcopy thread
 on the source side)?  Then it can stay blocking.

Handling it within the migration thread would make it much more complicated
(which would be bad since it's already complex enough);

A return path thread on the source side, hmm yes that could do it  - especially
since the migration thread is already a separate thread from the main thread
handling this and thus already needs locking paraphernalia.

   Destination side
  Forward path - read by main thread
 
 This must be nonblocking so that the monitor keeps responding.

Interesting, I suspect none of the code in there is set up for that at the
moment, so how does that work during migration at the moment?

Actually, I see I missed something here; this should be:

   Destination side
 Forward path - read by main thread, and listener thread (see the
 separate mail that described that listner thread)

and that means that once postcopy is going (and the listener thread started)
it can't block the monitor.

  Return path  - opened by main thread, written by main thread AND 
  postcopy
 thread (protected by rp_mutex)
 
 When does the main thread needs to write?

Not much; the only things the main thread currently responds to are the
ReqAck (ping like) requests; those are turning out to be very useful during 
debug;
I've also got the ability for the destination to send a migration result back 
to the
source which seems useful to be able to 'fail' early.

 If it doesn't need that, you can just switch to blocking when you process
 the listen command (i.e. when the postcopy thread starts).

Why don't I just do it anyway? Prior to postcopy starting we're in the same
situation as we're in with precopy today, so can already get mainblock 
threading.

Dave

 Paolo
 
  I think I'm OK with both these being blocking.
 
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcelo Tosatti

On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
 On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
  On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com wrote:
   Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
  
   On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com
   wrote:
  
   On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
  
   On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru
   wrote:
  
   On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com
   wrote:
  
   On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
  
   Hello,
  
   the issue is not specific to the iothread code because generic
   virtio-blk also hangs up:
  
  
   Do you know which version works well?  If you could bisect, that'll
   help a lot.
  
   Thanks,
   Amit
  
  
   Hi,
  
   2.0 works definitely well. I`ll try to finish bisection today, though
   every step takes about 10 minutes to complete.
  
  
   Yay! It is even outside of virtio-blk.
  
   commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
   Author: Marcelo Tosatti mtosa...@redhat.com
   Date:   Tue Jun 3 13:34:48 2014 -0300
  
   kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec
   calculation
  
   Ensure proper env-tsc value for kvmclock_current_nsec 
   calculation.
  
   Reported-by: Marcin Gibuła m.gib...@beyond.pl
   Cc: qemu-sta...@nongnu.org
   Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
   Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  
  
   Andrey,
  
   Can you please provide instructions on how to create reproducible
   environment?
  
   The following patch is equivalent to the original patch, for
   the purposes of fixing the kvmclock problem.
  
   Perhaps it becomes easier to spot the reason for the hang you are
   experiencing.
  
  
   diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
   index 272a88a..feb5fc5 100644
   --- a/hw/i386/kvm/clock.c
   +++ b/hw/i386/kvm/clock.c
   @@ -17,7 +17,6 @@
#include qemu/host-utils.h
#include sysemu/sysemu.h
#include sysemu/kvm.h
   -#include sysemu/cpus.h
#include hw/sysbus.h
#include hw/kvm/clock.h
  
   @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState 
   *s)
  
cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time));
  
   -assert(time.tsc_timestamp = migration_tsc);
delta = migration_tsc - time.tsc_timestamp;
if (time.tsc_shift  0) {
delta = -time.tsc_shift;
   @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
   int running,
if (s-clock_valid) {
return;
}
   -
   -cpu_synchronize_all_states();
ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
if (ret  0) {
fprintf(stderr, KVM_GET_CLOCK failed: %s\n,
   strerror(ret));
   diff --git a/migration.c b/migration.c
   index 8d675b3..34f2325 100644
   --- a/migration.c
   +++ b/migration.c
   @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
old_vm_running = runstate_is_running();
  
   +cpu_synchronize_all_states();
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
if (ret = 0) {
qemu_file_set_rate_limit(s-file, INT64_MAX);
  
  
   It could also be useful to apply the above patch _and_ revert
   a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
  
   Paolo
 
  Yes, it solved the issue for me! (it took much time to check because
  most of country` debian mirrors went inconsistent by some reason)
 
  Also trivial addition:
 
  diff --git a/migration.c b/migration.c
  index 34f2325..65d1c88 100644
  --- a/migration.c
  +++ b/migration.c
  @@ -25,6 +25,7 @@
   #include qemu/thread.h
   #include qmp-commands.h
   #include trace.h
  +#include sysemu/cpus.h
 
  And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
 
  That is, test with a stock qemu.git tree and the patch sent today,
  on this thread, to move cpu_synchronize_all_states ?
 
 
 
 The main reason for things to work for me is a revert of
 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
 patches. I had tested two cases, with Alexander`s patch completely
 reverted plus suggestion from Marcelo and only with revert 9b178682
 plug same suggestion. The difference is that the until Alexander`
 patch is not reverted, live migration is always failing by the timeout
 value, and when reverted migration always succeeds in 8-10 seconds.
 Appropriate diffs are attached for the reference.

Andrey,

Can you please apply only the following attached patch to an upstream
QEMU git tree (move_synchronize_all_states.patch), plus

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcelo Tosatti

On Wed, Jul 16, 2014 at 09:35:16AM +0200, Marcin Gibuła wrote:
 Andrey,
 
 Can you please provide instructions on how to create reproducible
 environment?
 
 The following patch is equivalent to the original patch, for
 the purposes of fixing the kvmclock problem.
 
 Perhaps it becomes easier to spot the reason for the hang you are
 experiencing.
 
 Marcelo,
 
 the original reason for patch adding cpu_synchronize_all_states()
 there was because this bug affected non-migration operations as well
 -
 http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html.
 
 Won't moving it only to migration code break these things again?

Yes - its just for debug purposes.

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] ppc: remove excessive logging

2014-07-16 Thread Peter Maydell

On 16 July 2014 10:17, Alex Bennée alex.ben...@linaro.org wrote:
 On aarch64 we do:

 #define unsupported_encoding(s, insn)\
 do { \
 qemu_log_mask(LOG_UNIMP, \
   %s:%d: unsupported instruction encoding 0x%08x   \
   at pc=%016 PRIx64 \n,  \
   __FILE__, __LINE__, insn, s-pc - 4);  \
 unallocated_encoding(s); \
 } while (0);

 So we signal it's unimplemented before falling through to the signal
 generating code.

Yes, but that macro is only for instructions which exist but which
QEMU doesn't implement, not for instructions which exist and
which we do have an implementation of but which we generate an
exception for because they aren't present on this particular CPU.
That's a separate issue from whether you might want to optionally
log every SIGILL a process generates.

thanks
-- PMM

[Qemu-devel] [RFC PATCH V4 3/6] icount: Make icount_time_shift available everywhere

icount_time_shift is used for calculting the delay
qemu has to sleep in order to synchronise the
host and guest clocks. Therefore, we need it in
cpu-exec.c.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpus.c| 8 ++--
 include/qemu-common.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 34cc4c8..de0a8b2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -105,8 +105,12 @@ static bool all_cpu_threads_idle(void)
 /* Compensate for varying guest execution speed.  */
 static int64_t qemu_icount_bias;
 static int64_t vm_clock_warp_start;
-/* Conversion factor from emulated instructions to virtual clock ticks.  */
-static int icount_time_shift;
+/* Conversion factor from emulated instructions to virtual clock ticks.
+ * icount_time_shift is defined as extern in include/qemu-common.h because
+ * it is used (in cpu-exec.c) for calculating the delay for sleeping
+ * qemu in order to align the host and virtual clocks.
+ */
+int icount_time_shift;
 /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
 #define MAX_ICOUNT_SHIFT 10
 
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 860bb15..d47aa02 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -109,6 +109,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 extern int icount_align_option;
+extern int icount_time_shift;
 
 #include qemu/osdep.h
 #include qemu/bswap.h
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 1/6] icount: Add QemuOpts for icount

Make icount parameter use QemuOpts style options in order
to easily add other suboptions.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpus.c| 10 +-
 include/qemu-common.h |  3 ++-
 qemu-options.hx   |  4 ++--
 qtest.c   | 13 +++--
 vl.c  | 35 ---
 5 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/cpus.c b/cpus.c
index 5e7f2cf..dcca96a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -440,13 +440,21 @@ static const VMStateDescription vmstate_timers = {
 }
 };
 
-void configure_icount(const char *option)
+void configure_icount(QemuOpts *opts, Error **errp)
 {
+const char *option;
+
 seqlock_init(timers_state.vm_clock_seqlock, NULL);
 vmstate_register(NULL, 0, vmstate_timers, timers_state);
+option = qemu_opt_get(opts, shift);
 if (!option) {
 return;
 }
+/* When using -icount shift, the shift option will be
+   misinterpreted as a boolean */
+if (strcmp(option, on) == 0 || strcmp(option, off) == 0) {
+error_setg(errp, The shift option must be a number or auto);
+}
 
 icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
   icount_warp_rt, NULL);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index ae76197..cc346ec 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -41,6 +41,7 @@
 #include assert.h
 #include signal.h
 #include glib-compat.h
+#include qemu/option.h
 
 #ifdef _WIN32
 #include sysemu/os-win32.h
@@ -105,7 +106,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 #endif
 
 /* icount */
-void configure_icount(const char *option);
+void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 
 #include qemu/osdep.h
diff --git a/qemu-options.hx b/qemu-options.hx
index 9e54686..143def4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3011,11 +3011,11 @@ re-inject them.
 ETEXI
 
 DEF(icount, HAS_ARG, QEMU_OPTION_icount, \
--icount [N|auto]\n \
+-icount [shift=N|auto]\n \
 enable virtual instruction counter with 2^N clock ticks 
per\n \
 instruction\n, QEMU_ARCH_ALL)
 STEXI
-@item -icount [@var{N}|auto]
+@item -icount [shift=@var{N}|auto]
 @findex -icount
 Enable virtual instruction counter.  The virtual cpu will execute one
 instruction every 2^@var{N} ns of virtual time.  If @code{auto} is specified
diff --git a/qtest.c b/qtest.c
index 04a6dc1..ef0d991 100644
--- a/qtest.c
+++ b/qtest.c
@@ -19,6 +19,9 @@
 #include hw/irq.h
 #include sysemu/sysemu.h
 #include sysemu/cpus.h
+#include qemu/config-file.h
+#include qemu/option.h
+#include qemu/error-report.h
 
 #define MAX_IRQ 256
 
@@ -509,10 +512,16 @@ static void qtest_event(void *opaque, int event)
 }
 }
 
-int qtest_init_accel(MachineClass *mc)
+static void configure_qtest_icount(const char *options)
 {
-configure_icount(0);
+QemuOpts *opts  = qemu_opts_parse(qemu_find_opts(icount), options, 1);
+configure_icount(opts, error_abort);
+qemu_opts_del(opts);
+}
 
+int qtest_init_accel(MachineClass *mc)
+{
+configure_qtest_icount(0);
 return 0;
 }
 
diff --git a/vl.c b/vl.c
index 41ddcd2..103027f 100644
--- a/vl.c
+++ b/vl.c
@@ -537,6 +537,20 @@ static QemuOptsList qemu_mem_opts = {
 },
 };
 
+static QemuOptsList qemu_icount_opts = {
+.name = icount,
+.implied_opt_name = shift,
+.merge_lists = true,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_icount_opts.head),
+.desc = {
+{
+.name = shift,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
 /**
  * Get machine options
  *
@@ -2896,13 +2910,12 @@ int main(int argc, char **argv, char **envp)
 {
 int i;
 int snapshot, linux_boot;
-const char *icount_option = NULL;
 const char *initrd_filename;
 const char *kernel_filename, *kernel_cmdline;
 const char *boot_order;
 DisplayState *ds;
 int cyls, heads, secs, translation;
-QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
 QemuOptsList *olist;
 int optind;
 const char *optarg;
@@ -2967,6 +2980,7 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(qemu_msg_opts);
 qemu_add_opts(qemu_name_opts);
 qemu_add_opts(qemu_numa_opts);
+qemu_add_opts(qemu_icount_opts);
 
 runstate_init();
 
@@ -3817,7 +3831,11 @@ int main(int argc, char **argv, char **envp)
 }
 break;
 case QEMU_OPTION_icount:
-icount_option = optarg;
+icount_opts = qemu_opts_parse(qemu_find_opts(icount),
+  optarg, 1);
+if (!icount_opts) {
+exit(1);
+

[Qemu-devel] [RFC PATCH V4 4/6] cpu_exec: Add sleeping algorithm

The goal is to sleep qemu whenever the guest clock
is in advance compared to the host clock (we use
the monotonic clocks). The amount of time to sleep
is calculated in the execution loop in cpu_exec.

At first, we tried to approximate at each for loop the real time elapsed
while searching for a TB (generating or retrieving from cache) and
executing it. We would then approximate the virtual time corresponding
to the number of virtual instructions executed. The difference between
these 2 values would allow us to know if the guest is in advance or delayed.
However, the function used for measuring the real time
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
We had an added overhead of 13% of the total run time.

Therefore, we modified the algorithm and only take into account the
difference between the 2 clocks at the begining of the cpu_exec function.
During the for loop we try to reduce the advance of the guest only by
computing the virtual time elapsed and sleeping if necessary. The overhead
is thus reduced to 3%. Even though this method still has a noticeable
overhead, it no longer is a bottleneck in trying to achieve a better
guest frequency for which the guest clock is faster than the host one.

As for the the alignement of the 2 clocks, with the first algorithm
the guest clock was oscillating between -1 and 1ms compared to the host clock.
Using the second algorithm we notice that the guest is 5ms behind the host, 
which
is still acceptable for our use case.

The tests where conducted using fio and stress. The host machine in an i5 CPU at
3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm 
versatile-pb
built with buildroot.

Currently, on our test machine, the lowest icount we can achieve that is 
suitable for
aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) 
are
slower than the cpu tests (using stress).

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpu-exec.c | 97 ++
 1 file changed, 97 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 38e5f02..9ed6ac1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -22,6 +22,90 @@
 #include tcg.h
 #include qemu/atomic.h
 #include sysemu/qtest.h
+#include qemu/timer.h
+
+/* -icount align implementation. */
+
+typedef struct SyncClocks {
+int64_t diff_clk;
+int64_t original_instr_counter;
+} SyncClocks;
+
+#if !defined(CONFIG_USER_ONLY)
+/* Allow the guest to have a max 3ms advance.
+ * The difference between the 2 clocks could therefore
+ * oscillate around 0.
+ */
+#define VM_CLOCK_ADVANCE 300
+
+static int64_t delay_host(int64_t diff_clk)
+{
+if (diff_clk  VM_CLOCK_ADVANCE) {
+#ifndef _WIN32
+struct timespec sleep_delay, rem_delay;
+sleep_delay.tv_sec = diff_clk / 10LL;
+sleep_delay.tv_nsec = diff_clk % 10LL;
+if (nanosleep(sleep_delay, rem_delay)  0) {
+diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 10LL;
+diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec;
+} else {
+diff_clk = 0;
+}
+#else
+Sleep(diff_clk / SCALE_MS);
+diff_clk = 0;
+#endif
+}
+return diff_clk;
+}
+
+static int64_t instr_to_vtime(int64_t instr_counter, const CPUState *cpu)
+{
+int64_t instr_exec_time;
+instr_exec_time = instr_counter -
+  (cpu-icount_extra +
+   cpu-icount_decr.u16.low);
+instr_exec_time = instr_exec_time  icount_time_shift;
+
+return instr_exec_time;
+}
+
+static void align_clocks(SyncClocks *sc, const CPUState *cpu)
+{
+if (!icount_align_option) {
+return;
+}
+sc-diff_clk += instr_to_vtime(sc-original_instr_counter, cpu);
+sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low;
+sc-diff_clk = delay_host(sc-diff_clk);
+}
+
+static void init_delay_params(SyncClocks *sc,
+  const CPUState *cpu)
+{
+static int64_t clocks_offset;
+if (!icount_align_option) {
+return;
+}
+int64_t realtime_clock_value, virtual_clock_value;
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+/* Compute offset between the 2 clocks. */
+if (!clocks_offset) {
+clocks_offset = realtime_clock_value - virtual_clock_value;
+}
+sc-diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset;
+sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low;
+}
+#else
+static void align_clocks(SyncClocks *sc, const CPUState *cpu)
+{
+}
+
+static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
+{
+}
+#endif /* CONFIG USER ONLY */
 
 void cpu_loop_exit(CPUState *cpu)
 {
@@ -227,6 +311,8 @@ int cpu_exec(CPUArchState *env)

[Qemu-devel] [RFC PATCH V4 5/6] cpu_exec: Print to console if the guest is late

If the align option is enabled, we print to the user whenever
the guest clock is behind the host clock in order for he/she
to have a hint about the actual performance. The maximum
print interval is 2s and we limit the number of messages to 100.
If desired, this can be changed in cpu-exec.c

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr

Conflicts:
cpu-exec.c
---
 cpu-exec.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 9ed6ac1..b62df15 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 typedef struct SyncClocks {
 int64_t diff_clk;
 int64_t original_instr_counter;
+int64_t realtime_clock;
 } SyncClocks;
 
 #if !defined(CONFIG_USER_ONLY)
@@ -37,6 +38,9 @@ typedef struct SyncClocks {
  * oscillate around 0.
  */
 #define VM_CLOCK_ADVANCE 300
+#define THRESHOLD_REDUCE 1.5
+#define MAX_DELAY_PRINT_RATE 2
+#define MAX_NB_PRINTS 100
 
 static int64_t delay_host(int64_t diff_clk)
 {
@@ -80,6 +84,28 @@ static void align_clocks(SyncClocks *sc, const CPUState *cpu)
 sc-diff_clk = delay_host(sc-diff_clk);
 }
 
+static void print_delay(const SyncClocks *sc)
+{
+static float threshold_delay;
+static int64_t last_realtime_clock;
+static int nb_prints;
+
+if (icount_align_option 
+(sc-realtime_clock - last_realtime_clock) / 10LL
+= MAX_DELAY_PRINT_RATE  nb_prints  MAX_NB_PRINTS) {
+if ((-sc-diff_clk / (float)10LL  threshold_delay) ||
+(-sc-diff_clk / (float)10LL 
+ (threshold_delay - THRESHOLD_REDUCE))) {
+threshold_delay = (-sc-diff_clk / 10LL) + 1;
+printf(Warning: The guest is now late by %.1f to %.1f seconds\n,
+   threshold_delay - 1,
+   threshold_delay);
+nb_prints++;
+last_realtime_clock = sc-realtime_clock;
+}
+}
+}
+
 static void init_delay_params(SyncClocks *sc,
   const CPUState *cpu)
 {
@@ -87,15 +113,18 @@ static void init_delay_params(SyncClocks *sc,
 if (!icount_align_option) {
 return;
 }
-int64_t realtime_clock_value, virtual_clock_value;
-realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = realtime_clock_value - virtual_clock_value;
+clocks_offset = sc-realtime_clock -
+qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
-sc-diff_clk = virtual_clock_value - realtime_clock_value + clocks_offset;
+sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
+   sc-realtime_clock + clocks_offset;
 sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low;
+/* Print every 2s max if the guest is late. We limit the number
+   of printed messages to NB_PRINT_MAX(currently 100) */
+print_delay(sc);
 }
 #else
 static void align_clocks(SyncClocks *sc, const CPUState *cpu)
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 2/6] icount: Add align option to icount

The align option is used for activating the align algorithm
in order to synchronise the host clock and the guest clock.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr
---
 cpus.c| 19 ---
 include/qemu-common.h |  1 +
 qemu-options.hx   | 15 +--
 vl.c  |  4 
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/cpus.c b/cpus.c
index dcca96a..34cc4c8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -443,25 +443,30 @@ static const VMStateDescription vmstate_timers = {
 void configure_icount(QemuOpts *opts, Error **errp)
 {
 const char *option;
+char *rem_str = NULL;
 
 seqlock_init(timers_state.vm_clock_seqlock, NULL);
 vmstate_register(NULL, 0, vmstate_timers, timers_state);
 option = qemu_opt_get(opts, shift);
 if (!option) {
+if (qemu_opt_get(opts, align) != NULL) {
+error_setg(errp, Please specify shift option when using align);
+}
 return;
 }
-/* When using -icount shift, the shift option will be
-   misinterpreted as a boolean */
-if (strcmp(option, on) == 0 || strcmp(option, off) == 0) {
-error_setg(errp, The shift option must be a number or auto);
-}
-
+icount_align_option = qemu_opt_get_bool(opts, align, false);
 icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
   icount_warp_rt, NULL);
 if (strcmp(option, auto) != 0) {
-icount_time_shift = strtol(option, NULL, 0);
+errno = 0;
+icount_time_shift = strtol(option, rem_str, 0);
+if (errno != 0 || *rem_str != '\0' || !strlen(option)) {
+error_setg(errp, icount: Invalid shift value);
+}
 use_icount = 1;
 return;
+} else if (icount_align_option) {
+error_setg(errp, shift=auto and align=on are incompatible);
 }
 
 use_icount = 2;
diff --git a/include/qemu-common.h b/include/qemu-common.h
index cc346ec..860bb15 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -108,6 +108,7 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 /* icount */
 void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
+extern int icount_align_option;
 
 #include qemu/osdep.h
 #include qemu/bswap.h
diff --git a/qemu-options.hx b/qemu-options.hx
index 143def4..379d932 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3011,9 +3011,9 @@ re-inject them.
 ETEXI
 
 DEF(icount, HAS_ARG, QEMU_OPTION_icount, \
--icount [shift=N|auto]\n \
+-icount [shift=N|auto][,align=on|off]\n \
 enable virtual instruction counter with 2^N clock ticks 
per\n \
-instruction\n, QEMU_ARCH_ALL)
+instruction and enable aligning the host and virtual 
clocks\n, QEMU_ARCH_ALL)
 STEXI
 @item -icount [shift=@var{N}|auto]
 @findex -icount
@@ -3026,6 +3026,17 @@ Note that while this option can give deterministic 
behavior, it does not
 provide cycle accurate emulation.  Modern CPUs contain superscalar out of
 order cores with complex cache hierarchies.  The number of instructions
 executed often has little or no correlation with actual performance.
+
+@option{align=on} will activate the delay algorithm which will try to
+to synchronise the host clock and the virtual clock. The goal is to
+have a guest running at the real frequency imposed by the shift option.
+Whenever the guest clock is behind the host clock and if
+@option{align=on} is specified then we print a messsage to the user
+to inform about the delay.
+Currently this option does not work when @option{shift} is @code{auto}.
+Note: The sync algorithm will work for those shift values for which
+the guest clock runs ahead of the host clock. Typically this happens
+when the shift value is high (how high depends on the host machine).
 ETEXI
 
 DEF(watchdog, HAS_ARG, QEMU_OPTION_watchdog, \
diff --git a/vl.c b/vl.c
index 103027f..d270070 100644
--- a/vl.c
+++ b/vl.c
@@ -183,6 +183,7 @@ uint8_t *boot_splash_filedata;
 size_t boot_splash_filedata_size;
 uint8_t qemu_extra_params_fw[2];
 
+int icount_align_option;
 typedef struct FWBootEntry FWBootEntry;
 
 struct FWBootEntry {
@@ -546,6 +547,9 @@ static QemuOptsList qemu_icount_opts = {
 {
 .name = shift,
 .type = QEMU_OPT_STRING,
+}, {
+.name = align,
+.type = QEMU_OPT_BOOL,
 },
 { /* end of list */ }
 },
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'

Show in 'info jit' the current delay between the host clock
and the guest clock. In addition, print the maximum advance
and delay of the guest compared to the host.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
Tested-by: Camille Bégué camille.be...@openwide.fr
---
 cpu-exec.c| 19 ++-
 cpus.c| 17 +
 include/qemu-common.h |  5 +
 monitor.c |  1 +
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b62df15..055e0e3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -109,19 +109,28 @@ static void print_delay(const SyncClocks *sc)
 static void init_delay_params(SyncClocks *sc,
   const CPUState *cpu)
 {
-static int64_t clocks_offset;
-if (!icount_align_option) {
-return;
+static int64_t realtime_clock_value;
+if (icount_align_option || !realtime_clock_value) {
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
-sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = sc-realtime_clock -
+clocks_offset = realtime_clock_value -
 qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
+if (!icount_align_option) {
+return;
+}
+sc-realtime_clock = realtime_clock_value;
 sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
sc-realtime_clock + clocks_offset;
 sc-original_instr_counter = cpu-icount_extra + cpu-icount_decr.u16.low;
+if (sc-diff_clk  max_delay) {
+max_delay = sc-diff_clk;
+}
+if (sc-diff_clk  max_advance) {
+max_advance = sc-diff_clk;
+}
 /* Print every 2s max if the guest is late. We limit the number
of printed messages to NB_PRINT_MAX(currently 100) */
 print_delay(sc);
diff --git a/cpus.c b/cpus.c
index de0a8b2..3c3fb64 100644
--- a/cpus.c
+++ b/cpus.c
@@ -64,6 +64,9 @@
 #endif /* CONFIG_LINUX */
 
 static CPUState *next_cpu;
+int64_t clocks_offset;
+int64_t max_delay;
+int64_t max_advance;
 
 bool cpu_is_stopped(CPUState *cpu)
 {
@@ -1504,3 +1507,17 @@ void qmp_inject_nmi(Error **errp)
 error_set(errp, QERR_UNSUPPORTED);
 #endif
 }
+
+void dump_drift_info(FILE *f, fprintf_function cpu_fprintf)
+{
+cpu_fprintf(f, Host - Guest clock  %ld(ms)\n,
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset -
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS);
+if (icount_align_option) {
+cpu_fprintf(f, Max guest delay %ld(ms)\n, -max_delay/SCALE_MS);
+cpu_fprintf(f, Max guest advance   %ld(ms)\n, max_advance/SCALE_MS);
+} else {
+cpu_fprintf(f, Max guest delay NA\n);
+cpu_fprintf(f, Max guest advance   NA\n);
+}
+}
diff --git a/include/qemu-common.h b/include/qemu-common.h
index d47aa02..fbc1bca 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -110,6 +110,11 @@ void configure_icount(QemuOpts *opts, Error **errp);
 extern int use_icount;
 extern int icount_align_option;
 extern int icount_time_shift;
+extern int64_t clocks_offset;
+/* drift information for info jit command */
+extern int64_t max_delay;
+extern int64_t max_advance;
+void dump_drift_info(FILE *f, fprintf_function cpu_fprintf);
 
 #include qemu/osdep.h
 #include qemu/bswap.h
diff --git a/monitor.c b/monitor.c
index 5bc70a6..cdbaa60 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1047,6 +1047,7 @@ static void do_info_registers(Monitor *mon, const QDict 
*qdict)
 static void do_info_jit(Monitor *mon, const QDict *qdict)
 {
 dump_exec_info((FILE *)mon, monitor_fprintf);
+dump_drift_info((FILE *)mon, monitor_fprintf);
 }
 
 static void do_info_history(Monitor *mon, const QDict *qdict)
-- 
2.0.0.rc2

[Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks

The icount option already implemented in QEMU allows the guest to run at a 
theoretical
frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch is 
to have a
real guest frequency close to the one imposed by using the icount option.

The main idea behind the algorithm is that we compare the virtual monotonic 
clock and the
host monotonic clock. For big icounts (on our test machine, an i5 CPU @ 
3.10GHz, icounts
starting at 6) the guest clock will be ahead of the host clock. In this case, 
we try to
sleep QEMU for the difference between the 2 clocks. Therefore, the guest would 
have
executed for a period almost equally to the one imposed by icount. We should 
point out
that the algorithm works only for those icounts that allow the guest clock to 
be in front
of the host clock.

The first patch adds support for QemuOpts for the 'icount' parameter. It also 
adds a
suboption called 'shift' that will hold the value for 'icount'. Therefore we 
now have
-icount shift=N|auto or -icount N|auto.

The second patch adds the 'align' suboption for icount.

The third patch exports 'icount_time_shift' so that it can be used in places 
other than
cpus.c; we need it in cpu-exec.c for calculating for how long we want QEMU to 
sleep.

The forth patch implements the algorithm used for calculating the delay we want 
to sleep.
It uses the number of instructions executed by the virtual cpu and also 
'icount_time_shift'.

The fifth patch prints to the console whenever the guest clock runs behind the 
host
clock. The fastest printing speed is every 2 seconds, and we only print if the 
align option
is enabled. We also have a limit to 100 printed messages.

The sixth patch adds information about the difference between the host and 
guest clocks
(taking into account the offset) in the 'info jit' command. We also print the 
maximum
delay and advance of the guest clock compared to the host clock.

v3 - v4

* Add better error handling for 'strtol' in patch 2
* Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4
* Remove function pointers from patches 4 and 5

Sebastian Tanase (6):
  icount: Add QemuOpts for icount
  icount: Add align option to icount
  icount: Make icount_time_shift available everywhere
  cpu_exec: Add sleeping algorithm
  cpu_exec: Print to console if the guest is late
  monitor: Add drift info to 'info jit'

 cpu-exec.c| 135 ++
 cpus.c|  44 ++--
 include/qemu-common.h |  10 +++-
 monitor.c |   1 +
 qemu-options.hx   |  17 +--
 qtest.c   |  13 -
 vl.c  |  39 ---
 7 files changed, 241 insertions(+), 18 deletions(-)

-- 
2.0.0.rc2

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets


Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:

* Paolo Bonzini (pbonz...@redhat.com) wrote:

Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:


+
+/* If it's already open, return it */
+if (qfs-file-return_path) {
+return qfs-file-return_path;


Wouldn't this leave a dangling file descriptor if you call
socket_dup_return_path twice, and then close the original QEMUFile?


Hmm - how?


The problem is that there is no reference count on QEMUFile, so if you do

  f1 = open_return_path(f0);
  f2 = open_return_path(f0);
  /* now f1 == f2 */
  qemu_fclose(f1);
  /* now f2 is dangling */


I think from the way I'm using it, I can remove the optimisation, but I do
need to check.

I'm not too sure what your worry is about 'f2' in this case; I guess the caller
needs to know that it should only close the return path once - is that
your worry?


Yes.  The API is not well encapsulated; a random caller of 
open_return_path does not know (and cannot know) whether it should close 
the returned file or not.



I'm more nervous about dropping that one, because the current scheme
does provide a clean way of finding the forward path if you've got the
reverse (although I don't think I make use of it).


If it isn't used, why keep it?


 Source side
Forward path - written by migration thread
   : It's OK for this to be blocking, but we end up with it being
 non-blocking, and modify the socket code to emulate blocking.


This likely has a performance impact though.  The first migration thread
code drop from Juan already improved throughput a lot, even if it kept the
iothread all the time and only converted from nonblocking writes to
blocking.


Can you give some more reasoning as to why you think this will hit the
performance so much; I thought the output buffers were quite big anyway.


I don't really know, it's

Return path  - opened by main thread, read by fd_handler on main thread
   : Must be non-blocking so as not to block the main thread while
 waiting for a partially sent command.


Why can't you handle this in the migration thread (or a new postcopy thread
on the source side)?  Then it can stay blocking.


Handling it within the migration thread would make it much more complicated
(which would be bad since it's already complex enough);


Ok.  I'm not sure why it is more complicated since migration is 
essentially two-phase, one where the source drives it and one where the 
source just waits for requests, but I'll trust you on this. :)



 Destination side
Forward path - read by main thread


This must be nonblocking so that the monitor keeps responding.


Interesting, I suspect none of the code in there is set up for that at the
moment, so how does that work during migration at the moment?


It sure is. :)

On the destination side, migration is done in a coroutine (see 
process_incoming_migration) so it's all transparent.  Only 
socket_get_buffer has to know about this:


len = qemu_recv(s-fd, buf, size, 0);
if (len != -1) {
break;
}
if (socket_error() == EAGAIN) {
yield_until_fd_readable(s-fd);
} else if (socket_error() != EINTR) {
break;
}

If the socket is put in blocking mode recv will never return EAGAIN, so 
this code will only run if the socket is nonblocking.



Actually, I see I missed something here; this should be:

   Destination side
 Forward path - read by main thread, and listener thread (see the
 separate mail that described that listner thread)

and that means that once postcopy is going (and the listener thread started)
it can't block the monitor.


Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) 
once it gets its hands on the QEMUFile.



Return path  - opened by main thread, written by main thread AND postcopy
   thread (protected by rp_mutex)


When does the main thread needs to write?


Not much; the only things the main thread currently responds to are the
ReqAck (ping like) requests; those are turning out to be very useful during 
debug;
I've also got the ability for the destination to send a migration result back 
to the
source which seems useful to be able to 'fail' early.


Why can't this be done in the listener thread?  (Thus transforming it 
into a more general postcopy migration thread; later we could even 
change incoming migration from a coroutine to a thread).



If it doesn't need that, you can just switch to blocking when you process
the listen command (i.e. when the postcopy thread starts).


Why don't I just do it anyway? Prior to postcopy starting we're in the same
situation as we're in with precopy today, so can already get mainblock 
threading.


See above for the explanation.

Paolo

[Qemu-devel] [PATCH for-2.2 0/5] scsi: enable passthrough of vendor-specific commands

Right now scsi-generic is parsing the CDB, in order to compute
the expected number of bytes to be transferred.  This is necessary
if DMA is done by the HBA via scsi_req_data, but it prevents executing
vendor-specific commands via scsi-generic because we don't know how
to parse them.

If DMA is delegated to the SCSI layer via get_sg_list, we know in
advance how many bytes the guest will want to receive and we can pass
the information straight from the guest to SG_IO.  In this case, it is
unnecessary to parse the CDB to get the same information.  scsi-disk needs
it to detect underruns and overruns, but scsi-generic and scsi-block can
just ask the HBA about the transfer direction and size.

This series introduces a new parse_cdb callback in both the device and
the HBA.  The latter is called by scsi_bus_parse_cdb, which devices can
call for passthrough requests in their implementation of parse_cdb.

Tamuki-san, can you please test if these patches are okay for your
usecase?

Paolo

Paolo Bonzini (5):
  scsi-bus: prepare scsi_req_new for introduction of parse_cdb
  scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
  scsi-block: extract scsi_block_is_passthrough
  scsi-block, scsi-generic: implement parse_cdb
  virtio-scsi: implement parse_cdb

 hw/scsi/scsi-bus.c | 68 ++
 hw/scsi/scsi-disk.c| 52 +-
 hw/scsi/scsi-generic.c |  7 ++
 hw/scsi/virtio-scsi.c  | 24 ++
 include/hw/scsi/scsi.h |  7 ++
 5 files changed, 124 insertions(+), 34 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough

This will be used for both scsi_block_new_request and the scsi-block
implementation of parse_cdb.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/scsi-disk.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d47ecd6..81b7276 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev)
 return scsi_initfn(s-qdev);
 }
 
-static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
-   uint32_t lun, uint8_t *buf,
-   void *hba_private)
+static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 {
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
-
 switch (buf[0]) {
 case READ_6:
 case READ_10:
@@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, 
uint32_t tag,
 case WRITE_VERIFY_12:
 case WRITE_VERIFY_16:
 /* If we are not using O_DIRECT, we might read stale data from the
-* host cache if writes were made using other commands than these
-* ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
-* O_DIRECT everything must go through SG_IO.
+ * host cache if writes were made using other commands than these
+ * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
+ * O_DIRECT everything must go through SG_IO.
  */
 if (!(bdrv_get_flags(s-qdev.conf.bs)  BDRV_O_NOCACHE)) {
 break;
@@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
*d, uint32_t tag,
  * just make scsi-block operate the same as scsi-generic for them.
  */
 if (s-qdev.type != TYPE_ROM) {
-return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun,
-  hba_private);
+return false;
 }
+break;
+
+default:
+break;
 }
 
-return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun,
-  hba_private);
+return true;
+}
+
+
+static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
+   uint32_t lun, uint8_t *buf,
+   void *hba_private)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
+
+if (scsi_block_is_passthrough(s, buf)) {
+return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun,
+  hba_private);
+} else {
+return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun,
+  hba_private);
+}
 }
 #endif
 
-- 
1.9.3

[Qemu-devel] [Bug 1342704] [NEW] error: Crash of qemu-img/qemu-io on the qcow2 image with large values in 'incompatible features' field

Public bug reported:

qemu-io and qemu-img fails with an assertion (see below) at attempt to
interact with the qcow2 image having large values in the 'incompatible
features' header field.

   util/error.c:34: error_set: Assertion `*errp == ((void *)0)' failed.


The backtrace file and the test image can be found in the attachment. The 
backtraces are for the next command: 

  qemu-img check -f qcow2 test_image


The image was generated by the qcow2 image fuzzer.

qemu.git head: 5a7348045091a2bc15

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: Backtraces and the test image
   
https://bugs.launchpad.net/bugs/1342704/+attachment/4153944/+files/artifacts.tar.gz

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1342704

Title:
  error: Crash of qemu-img/qemu-io on the qcow2 image with large values
  in 'incompatible features' field

Status in QEMU:
  New

Bug description:
  qemu-io and qemu-img fails with an assertion (see below) at attempt to
  interact with the qcow2 image having large values in the 'incompatible
  features' header field.

 util/error.c:34: error_set: Assertion `*errp == ((void *)0)'
  failed.

  
  The backtrace file and the test image can be found in the attachment. The 
backtraces are for the next command: 

qemu-img check -f qcow2 test_image

  
  The image was generated by the qcow2 image fuzzer.

  qemu.git head: 5a7348045091a2bc15

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1342704/+subscriptions

[Qemu-devel] [PATCH 2/5] scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo

These callbacks will let devices do their own request parsing, or
defer it to the bus.  If the bus does not provide an implementation,
in turn, fall back to the default parsing routine.

Swap the first two arguments to scsi_req_parse, and rename it to
scsi_req_parse_cdb, for consistency.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/scsi-bus.c | 25 ++---
 include/hw/scsi/scsi.h |  6 ++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index bf7ba8c..54fbf29 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -9,7 +9,7 @@
 
 static char *scsibus_get_dev_path(DeviceState *dev);
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
-static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf);
+static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 static void scsi_req_dequeue(SCSIRequest *req);
 static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len);
 static void scsi_target_free_buf(SCSIRequest *req);
@@ -54,6 +54,18 @@ static void scsi_device_destroy(SCSIDevice *s)
 }
 }
 
+int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+   void *hba_private)
+{
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev-qdev.parent_bus);
+
+if (bus-info-parse_cdb) {
+return bus-info-parse_cdb(dev, cmd, buf, hba_private);
+} else {
+return scsi_req_parse_cdb(dev, cmd, buf);
+}
+}
+
 static SCSIRequest *scsi_device_alloc_req(SCSIDevice *s, uint32_t tag, 
uint32_t lun,
   uint8_t *buf, void *hba_private)
 {
@@ -562,8 +574,10 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 {
 SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus);
 const SCSIReqOps *ops;
+SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(d);
 SCSIRequest *req;
 SCSICommand cmd;
+int ret;
 
 if ((d-unit_attention.key == UNIT_ATTENTION ||
  bus-unit_attention.key == UNIT_ATTENTION) 
@@ -586,8 +600,13 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 ops = NULL;
 }
 
+if (ops != NULL || !sc-parse_cdb) {
+ret = scsi_req_parse_cdb(d, cmd, buf);
+} else {
+ret = sc-parse_cdb(d, cmd, buf, hba_private);
+}
 
-if (scsi_req_parse(cmd, d, buf) != 0) {
+if (ret != 0) {
 trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]);
 req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private);
 } else {
@@ -1188,7 +1207,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd)
 return lba;
 }
 
-static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf)
+static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
 {
 int rc;
 
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 1adb549..4a0b860 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -76,6 +76,8 @@ typedef struct SCSIDeviceClass {
 DeviceClass parent_class;
 int (*init)(SCSIDevice *dev);
 void (*destroy)(SCSIDevice *s);
+int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+ void *hba_private);
 SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun,
   uint8_t *buf, void *hba_private);
 void (*unit_attention_reported)(SCSIDevice *s);
@@ -131,6 +133,8 @@ struct SCSIReqOps {
 struct SCSIBusInfo {
 int tcq;
 int max_channel, max_target, max_lun;
+int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+ void *hba_private);
 void (*transfer_data)(SCSIRequest *req, uint32_t arg);
 void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid);
 void (*cancel)(SCSIRequest *req);
@@ -244,6 +248,8 @@ void scsi_req_free(SCSIRequest *req);
 SCSIRequest *scsi_req_ref(SCSIRequest *req);
 void scsi_req_unref(SCSIRequest *req);
 
+int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
+   void *hba_private);
 void scsi_req_build_sense(SCSIRequest *req, SCSISense sense);
 void scsi_req_print(SCSIRequest *req);
 void scsi_req_continue(SCSIRequest *req);
-- 
1.9.3

[Qemu-devel] [PATCH 1/5] scsi-bus: prepare scsi_req_new for introduction of parse_cdb

The per-SCSIDevice parse_cdb callback must not be called if the
request will go through special SCSIReqOps, so detect the special
cases early enough.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/scsi-bus.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4341754..bf7ba8c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -561,9 +561,32 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
   uint8_t *buf, void *hba_private)
 {
 SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus);
+const SCSIReqOps *ops;
 SCSIRequest *req;
 SCSICommand cmd;
 
+if ((d-unit_attention.key == UNIT_ATTENTION ||
+ bus-unit_attention.key == UNIT_ATTENTION) 
+(buf[0] != INQUIRY 
+ buf[0] != REPORT_LUNS 
+ buf[0] != GET_CONFIGURATION 
+ buf[0] != GET_EVENT_STATUS_NOTIFICATION 
+
+ /*
+  * If we already have a pending unit attention condition,
+  * report this one before triggering another one.
+  */
+ !(buf[0] == REQUEST_SENSE  d-sense_is_ua))) {
+ops = reqops_unit_attention;
+} else if (lun != d-lun ||
+   buf[0] == REPORT_LUNS ||
+   (buf[0] == REQUEST_SENSE  d-sense_len)) {
+ops = reqops_target_command;
+} else {
+ops = NULL;
+}
+
+
 if (scsi_req_parse(cmd, d, buf) != 0) {
 trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]);
 req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private);
@@ -577,25 +600,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 
 if (cmd.xfer  INT32_MAX) {
 req = scsi_req_alloc(reqops_invalid_field, d, tag, lun, 
hba_private);
-} else if ((d-unit_attention.key == UNIT_ATTENTION ||
-   bus-unit_attention.key == UNIT_ATTENTION) 
-  (buf[0] != INQUIRY 
-   buf[0] != REPORT_LUNS 
-   buf[0] != GET_CONFIGURATION 
-   buf[0] != GET_EVENT_STATUS_NOTIFICATION 
-
-   /*
-* If we already have a pending unit attention condition,
-* report this one before triggering another one.
-*/
-   !(buf[0] == REQUEST_SENSE  d-sense_is_ua))) {
-req = scsi_req_alloc(reqops_unit_attention, d, tag, lun,
- hba_private);
-} else if (lun != d-lun ||
-   buf[0] == REPORT_LUNS ||
-   (buf[0] == REQUEST_SENSE  d-sense_len)) {
-req = scsi_req_alloc(reqops_target_command, d, tag, lun,
- hba_private);
+} else if (ops) {
+req = scsi_req_alloc(ops, d, tag, lun, hba_private);
 } else {
 req = scsi_device_alloc_req(d, tag, lun, buf, hba_private);
 }
-- 
1.9.3

[Qemu-devel] [PATCH 4/5] scsi-block, scsi-generic: implement parse_cdb

The callback lets the bus provide the direction and transfer count
for passthrough commands, enabling passthrough of vendor-specific
commands.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/scsi-bus.c |  3 +--
 hw/scsi/scsi-disk.c| 14 ++
 hw/scsi/scsi-generic.c |  7 +++
 include/hw/scsi/scsi.h |  1 +
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 54fbf29..6167bea 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -9,7 +9,6 @@
 
 static char *scsibus_get_dev_path(DeviceState *dev);
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
-static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 static void scsi_req_dequeue(SCSIRequest *req);
 static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len);
 static void scsi_target_free_buf(SCSIRequest *req);
@@ -1207,7 +1206,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd)
 return lba;
 }
 
-static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
+int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
 {
 int rc;
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 81b7276..d55521d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2564,6 +2564,19 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
*d, uint32_t tag,
   hba_private);
 }
 }
+
+static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
+  uint8_t *buf, void *hba_private)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
+
+if (scsi_block_is_passthrough(s, buf)) {
+return scsi_bus_parse_cdb(s-qdev, cmd, buf, hba_private);
+} else {
+return scsi_req_parse_cdb(s-qdev, cmd, buf);
+}
+}
+
 #endif
 
 #define DEFINE_SCSI_DISK_PROPERTIES()\
@@ -2672,6 +2685,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 sc-init = scsi_block_initfn;
 sc-destroy  = scsi_destroy;
 sc-alloc_req= scsi_block_new_request;
+sc-parse_cdb= scsi_block_parse_cdb;
 dc-fw_name = disk;
 dc-desc = SCSI block device passthrough;
 dc-reset = scsi_disk_reset;
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 3733d2c..0b2ff90 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -490,6 +490,12 @@ static Property scsi_generic_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static int scsi_generic_parse_cdb(SCSIDevice *dev, SCSICommand *cmd,
+  uint8_t *buf, void *hba_private)
+{
+return scsi_bus_parse_cdb(dev, cmd, buf, hba_private);
+}
+
 static void scsi_generic_class_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -498,6 +504,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, 
void *data)
 sc-init = scsi_generic_initfn;
 sc-destroy  = scsi_destroy;
 sc-alloc_req= scsi_new_request;
+sc-parse_cdb= scsi_generic_parse_cdb;
 dc-fw_name = disk;
 dc-desc = pass through generic scsi device (/dev/sg*);
 dc-reset = scsi_generic_reset;
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 4a0b860..a7a28e6 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -250,6 +250,7 @@ void scsi_req_unref(SCSIRequest *req);
 
 int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
void *hba_private);
+int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
 void scsi_req_build_sense(SCSIRequest *req, SCSISense sense);
 void scsi_req_print(SCSIRequest *req);
 void scsi_req_continue(SCSIRequest *req);
-- 
1.9.3

[Qemu-devel] [PATCH 5/5] virtio-scsi: implement parse_cdb

Enable passthrough of vendor-specific commands.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/virtio-scsi.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 0eb069a..e8f4c0c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -406,6 +406,29 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
uint32_t status,
 virtio_scsi_complete_cmd_req(req);
 }
 
+static int virtio_scsi_parse_cdb(SCSIDevice *dev, SCSICommand *cmd,
+ uint8_t *buf, void *hba_private)
+{
+VirtIOSCSIReq *req = hba_private;
+
+cmd-lba = -1;
+cmd-len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE);
+memcpy(cmd-buf, buf, cmd-len);
+
+/* Extract the direction and mode directly from the request, for
+ * host device passthrough.
+ */
+cmd-xfer = req-qsgl.size;
+if (cmd-xfer == 0) {
+cmd-mode = SCSI_XFER_NONE;
+} else if (iov_size(req-elem.in_sg, req-elem.in_num)  req-resp_size) {
+cmd-mode = SCSI_XFER_FROM_DEV;
+} else {
+cmd-mode = SCSI_XFER_TO_DEV;
+}
+return 0;
+}
+
 static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r)
 {
 VirtIOSCSIReq *req = r-hba_private;
@@ -658,6 +681,7 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .change = virtio_scsi_change,
 .hotplug = virtio_scsi_hotplug,
 .hot_unplug = virtio_scsi_hot_unplug,
+.parse_cdb = virtio_scsi_parse_cdb,
 .get_sg_list = virtio_scsi_get_sg_list,
 .save_request = virtio_scsi_save_request,
 .load_request = virtio_scsi_load_request,
-- 
1.9.3

Re: [Qemu-devel] [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit'


Il 16/07/2014 14:18, Sebastian Tanase ha scritto:

-static int64_t clocks_offset;
-if (!icount_align_option) {
-return;
+static int64_t realtime_clock_value;


Does this really need to be static?


+if (icount_align_option || !realtime_clock_value) {
+realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
-sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 /* Compute offset between the 2 clocks. */
 if (!clocks_offset) {
-clocks_offset = sc-realtime_clock -
+clocks_offset = realtime_clock_value -
 qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }


Isn't clocks_offset roughly the same as -timers_state.cpu_clock_offset? 
 If so, you could be some simplification in the code.  Feel free to 
move the TimersState struct definition to include/sysemu/cpus.h and make 
timers_state public.



+cpu_fprintf(f, Host - Guest clock  %ld(ms)\n,
+(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - clocks_offset -
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL))/SCALE_MS);


I think this is (cpu_get_clock() - cpu_get_icount()) / SCALE_MS.

Paolo

Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks


Il 16/07/2014 14:18, Sebastian Tanase ha scritto:

v3 - v4

* Add better error handling for 'strtol' in patch 2
* Add 'Sleep' instead of 'nanosleep' for Windows hosts in patch 4
* Remove function pointers from patches 4 and 5


Hi Sebastian, I think we're getting really close.

I asked a question about clocks_offset; I think it's not necessary to 
compute it in cpu-exec.c because the same information is available 
elsewhere.  It is also probably not necessary to make it public.  Can 
you please check this?  Once this is sorted out, the patch should be 
ready for inclusion in 2.2.


Thanks for your effort!

Paolo

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
 On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
  On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com 
  wrote:
   Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
  
   On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com
   wrote:
  
   On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
  
   On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru
   wrote:
  
   On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com
   wrote:
  
   On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
  
   Hello,
  
   the issue is not specific to the iothread code because generic
   virtio-blk also hangs up:
  
  
   Do you know which version works well?  If you could bisect, that'll
   help a lot.
  
   Thanks,
   Amit
  
  
   Hi,
  
   2.0 works definitely well. I`ll try to finish bisection today, 
   though
   every step takes about 10 minutes to complete.
  
  
   Yay! It is even outside of virtio-blk.
  
   commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
   Author: Marcelo Tosatti mtosa...@redhat.com
   Date:   Tue Jun 3 13:34:48 2014 -0300
  
   kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec
   calculation
  
   Ensure proper env-tsc value for kvmclock_current_nsec 
   calculation.
  
   Reported-by: Marcin Gibuła m.gib...@beyond.pl
   Cc: qemu-sta...@nongnu.org
   Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
   Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  
  
   Andrey,
  
   Can you please provide instructions on how to create reproducible
   environment?
  
   The following patch is equivalent to the original patch, for
   the purposes of fixing the kvmclock problem.
  
   Perhaps it becomes easier to spot the reason for the hang you are
   experiencing.
  
  
   diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
   index 272a88a..feb5fc5 100644
   --- a/hw/i386/kvm/clock.c
   +++ b/hw/i386/kvm/clock.c
   @@ -17,7 +17,6 @@
#include qemu/host-utils.h
#include sysemu/sysemu.h
#include sysemu/kvm.h
   -#include sysemu/cpus.h
#include hw/sysbus.h
#include hw/kvm/clock.h
  
   @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState 
   *s)
  
cpu_physical_memory_read(kvmclock_struct_pa, time, 
   sizeof(time));
  
   -assert(time.tsc_timestamp = migration_tsc);
delta = migration_tsc - time.tsc_timestamp;
if (time.tsc_shift  0) {
delta = -time.tsc_shift;
   @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
   int running,
if (s-clock_valid) {
return;
}
   -
   -cpu_synchronize_all_states();
ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
if (ret  0) {
fprintf(stderr, KVM_GET_CLOCK failed: %s\n,
   strerror(ret));
   diff --git a/migration.c b/migration.c
   index 8d675b3..34f2325 100644
   --- a/migration.c
   +++ b/migration.c
   @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
old_vm_running = runstate_is_running();
  
   +cpu_synchronize_all_states();
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
if (ret = 0) {
qemu_file_set_rate_limit(s-file, INT64_MAX);
  
  
   It could also be useful to apply the above patch _and_ revert
   a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
  
   Paolo
 
  Yes, it solved the issue for me! (it took much time to check because
  most of country` debian mirrors went inconsistent by some reason)
 
  Also trivial addition:
 
  diff --git a/migration.c b/migration.c
  index 34f2325..65d1c88 100644
  --- a/migration.c
  +++ b/migration.c
  @@ -25,6 +25,7 @@
   #include qemu/thread.h
   #include qmp-commands.h
   #include trace.h
  +#include sysemu/cpus.h
 
  And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
 
  That is, test with a stock qemu.git tree and the patch sent today,
  on this thread, to move cpu_synchronize_all_states ?
 
 

 The main reason for things to work for me is a revert of
 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
 patches. I had tested two cases, with Alexander`s patch completely
 reverted plus suggestion from Marcelo and only with revert 9b178682
 plug same suggestion. The difference is that the until Alexander`
 patch is not reverted, live migration is always failing by the timeout
 value, and when reverted migration always succeeds in 8-10 seconds.
 Appropriate diffs are attached for the reference.

 Andrey,

 Can you please apply only the following

Re: [Qemu-devel] [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks

2014-07-16 Thread Luiz Capitulino

On Wed, 16 Jul 2014 14:18:00 +0200
Sebastian Tanase sebastian.tan...@openwide.fr wrote:

 The icount option already implemented in QEMU allows the guest to run at a 
 theoretical
 frequency of 1/(2^N) GHz (N is the icount parameter). The goal of this patch 
 is to have a
 real guest frequency close to the one imposed by using the icount option.

Is this really an RFC?

[Qemu-devel] [Bug 1342686] [NEW] Windows 95 setup hangs

2014-07-16 Thread Piotr Gliźniewicz

Public bug reported:

Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version 2.0.0.
It was compiled from the sources on Fedora 20.
Setup starts without problems, it goes through the first phase and then hangs
on the Getting ready to run Windows 95 for the first time... screen
(http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect).

Steps to reproduce:
qemu-img create -f qcow2 win95.img 2G
qemu -L . -machine isapc -cpu pentium -m 32 -vga std -soundhw sb16 -hda
win95.img -cdrom POLWIN95.ISO -fda BOOT95A.IMA -boot a
after this select default options everywhere. When it asks to remove the boot
floppy do eject floppy0 and confirm.
It displays the Getting ready to run Windows 95 for the first time... and
hangs.

The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs
immediately after this screen, it hangs on displaying a underscore
cursor.

I managed to complete the setup on QEMU 1.6.2.

** Affects: qemu
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1342686

Title:
Windows 95 setup hangs

Status in QEMU:
New

Bug description:
Windows 95 (first version, not 95A or OSR2) setup hangs on QEMU version
2.0.0. It was compiled from the sources on Fedora 20.
Setup starts without problems, it goes through the first phase and then hangs
on the Getting ready to run Windows 95 for the first time... screen
(http://www.guidebookgallery.org/screenshots/win95#win95startupshutdownsplash1correctaspect).

The same behavior can be observed on 2.1.0-rc2. On 1.7.1 It hangs
immediately after this screen, it hangs on displaying a underscore
cursor.

I managed to complete the setup on QEMU 1.6.2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1342686/+subscriptions

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-16 Thread Konrad Rzeszutek Wilk

On Thu, Jul 03, 2014 at 11:27:40PM +0300, Michael S. Tsirkin wrote:
 On Thu, Jul 03, 2014 at 12:09:28PM -0700, Jesse Barnes wrote:
  On Thu, 3 Jul 2014 14:26:12 -0400
  Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:
  
   On Thu, Jul 03, 2014 at 10:32:12AM +0300, Michael S. Tsirkin wrote:
On Wed, Jul 02, 2014 at 12:23:37PM -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Jul 02, 2014 at 04:50:15PM +0200, Paolo Bonzini wrote:
  Il 02/07/2014 16:00, Konrad Rzeszutek Wilk ha scritto:
  With this long thread I lost a bit context about the challenges
  that exists. But let me try summarizing it here - which will 
  hopefully
  get some consensus.
  
  1). Fix IGD hardware to not use Southbridge magic addresses.
  We can moan and moan but I doubt it is going to change.
  
  There are two problems:
  
  - Northbridge (i.e. MCH i.e. PCI host bridge) configuration space 
  addresses
 
 Right. So in  drivers/gpu/drm/i915/i915_dma.c:
 1135 #define MCHBAR_I915 0x44 

 1136 #define MCHBAR_I965 0x48 
 
 1147 int reg = INTEL_INFO(dev)-gen = 4 ? MCHBAR_I965 : 
 MCHBAR_I915;
 1152 if (INTEL_INFO(dev)-gen = 4)   

 1153 pci_read_config_dword(dev_priv-bridge_dev, reg 
 + 4, temp_hi); 
 1154 pci_read_config_dword(dev_priv-bridge_dev, reg, 
 temp_lo); 
 1155 mchbar_addr = ((u64)temp_hi  32) | temp_lo;
 
 
 and
 
 1139 #define DEVEN_REG 0x54   

 
 1193 int mchbar_reg = INTEL_INFO(dev)-gen = 4 ? MCHBAR_I965 
 : MCHBAR_I915; 
 1202 if (IS_I915G(dev) || IS_I915GM(dev)) {   

 1203 pci_read_config_dword(dev_priv-bridge_dev, 
 DEVEN_REG, temp);  
 1204 enabled = !!(temp  DEVEN_MCHBAR_EN);

 1205 } else { 

 1206 pci_read_config_dword(dev_priv-bridge_dev, 
 mchbar_reg, temp); 
 1207 enabled = temp  1;  

 1208 }
  
  - Southbridge (i.e. PCH i.e. ISA bridge) vendor/device ID; some 
  versions of
  the driver identify it by class, some versions identify it by slot 
  (1f.0).
 
 Right, So in  drivers/gpu/drm/i915/i915_drv.c the giant 
 intel_detect_pch
 which sets the pch_type based on :
 
  432 if (pch-vendor == PCI_VENDOR_ID_INTEL) {

  433 unsigned short id = pch-device  
 INTEL_PCH_DEVICE_ID_MASK;
  434 dev_priv-pch_id = id;   

  435  

  436 if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) 
 { 
 
 It checks for 0x3b00, 0x1c00, 0x1e00, 0x8c00 and 0x9c00.
 The INTEL_PCH_DEVICE_ID_MASK is 0xff00
  
  To solve the first, make a new machine type, PIIX4-based, and pass 
  through
  the registers you need.  The patch must document _exactly_ why the 
  registers
  are safe to pass.  If they are not reserved on PIIX4, the patch must
  document what the same offsets mean on PIIX4, and why it's sensible 
  to
  assume that firmware for virtual machine will not read/write them.  
  Bonus
  point for also documenting the same for Q35.
 
 OK. They look to be related to setting up an MBAR , but I don't 
 understand
 why it is needed. Hopefully some of the i915 folks CC-ed here can 
 answer.

In particular, I think setting up MBAR should move out of i915 and 
become
the bridge driver tweak on linux and windows.
   
   That is an excellent idea.
   
   However I am still curious to _why_ it has to be done in the first place.
  
  The display part of the GPU is partly on the PCH, and it's possible to
  mix  match them a bit, so we have this detection code to figure things
  out.  In some cases, the PCH display portion may not even be present,
  so we look for that too.
  
  Practically speaking, we could probably assume specific CPU/PCH combos,
  as I don't think they're generally mixed across generations, though SNB
  and IVB did have compatible sockets, so there is the possibility of
  mixing CPT and PPT PCHs, but those are register identical as far as the
  graphics driver is concerned, so even that should be safe.
  
  Beyond that, the other MCH data we need to look at is mirrored into the
  GPU's

[Qemu-devel] [PATCH 0/2] Add SP810 to Versatile Express boards

2014-07-16 Thread Fabian Aggeler

Hi,

The Versatile Express emulation in QEMU currently does not
have a model of the SP810 used in real hardware. The registers
provided by this System Controller can be used to set the
frequency of the SP804 timers. On newer releases of the board
the SP804 is set to 32kHz by default and has to be increased
by writing to the SCCTRL. See: 
http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/038696.html

Since QEMU sets the SP804 timers to 1MHz by default this should
be reflected in the SCCTRL register. These two patches add a
model of the SP804 to the vexpress-boards and sets the SCCTRL 
register so that software that queries this register behaves 
as expected.

Feedback is greatly appreciated!

Best,
Fabian

Fabian Aggeler (2):
  hw/misc/arm_sp810: Create SP810 device
  hw/arm/vexpress: add SP810 to the vexpress

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/vexpress.c   |  11 ++-
 hw/misc/Makefile.objs   |   1 +
 hw/misc/arm_sp810.c | 184 
 4 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/arm_sp810.c

-- 
1.8.3.2

[Qemu-devel] [PATCH 1/2] hw/misc/arm_sp810: Create SP810 device

2014-07-16 Thread Fabian Aggeler

This adds a device model for the PrimeXsys System Controller (SP810)
which is present in the Versatile Express motherboards. It is
so far read-only but allows to set the SCCTRL register.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
---
 default-configs/arm-softmmu.mak |   1 +
 hw/misc/Makefile.objs   |   1 +
 hw/misc/arm_sp810.c | 184 
 3 files changed, 186 insertions(+)
 create mode 100644 hw/misc/arm_sp810.c

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index f3513fa..67ba99a 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -55,6 +55,7 @@ CONFIG_PL181=y
 CONFIG_PL190=y
 CONFIG_PL310=y
 CONFIG_PL330=y
+CONFIG_SP810=y
 CONFIG_CADENCE=y
 CONFIG_XGMAC=y
 CONFIG_EXYNOS4=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 979e532..49d023b 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -13,6 +13,7 @@ common-obj-$(CONFIG_PL310) += arm_l2x0.o
 common-obj-$(CONFIG_INTEGRATOR_DEBUG) += arm_integrator_debug.o
 common-obj-$(CONFIG_A9SCU) += a9scu.o
 common-obj-$(CONFIG_ARM11SCU) += arm11scu.o
+common-obj-$(CONFIG_SP810) += arm_sp810.o
 
 # PKUnity SoC devices
 common-obj-$(CONFIG_PUV3) += puv3_pm.o
diff --git a/hw/misc/arm_sp810.c b/hw/misc/arm_sp810.c
new file mode 100644
index 000..82cbcf0
--- /dev/null
+++ b/hw/misc/arm_sp810.c
@@ -0,0 +1,184 @@
+/*
+ * ARM PrimeXsys System Controller (SP810)
+ *
+ * Copyright (C) 2014 Fabian Aggeler aggel...@ethz.ch
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include hw/hw.h
+#include hw/sysbus.h
+
+#define LOCK_VALUE 0xa05f
+
+#define TYPE_ARM_SP810 arm_sp810
+#define ARM_SP810(obj) OBJECT_CHECK(arm_sp810_state, (obj), TYPE_ARM_SP810)
+
+typedef struct {
+SysBusDevice parent_obj;
+MemoryRegion iomem;
+
+uint32_t sc_ctrl; /* SCCTRL */
+} arm_sp810_state;
+
+#define TIMEREN0SEL (1  15)
+#define TIMEREN1SEL (1  17)
+#define TIMEREN2SEL (1  19)
+#define TIMEREN3SEL (1  21)
+
+static const VMStateDescription vmstate_arm_sysctl = {
+.name = arm_sp810,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(sc_ctrl, arm_sp810_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static uint64_t arm_sp810_read(void *opaque, hwaddr offset, unsigned size)
+{
+arm_sp810_state *s = (arm_sp810_state *)opaque;
+
+switch (offset) {
+case 0x000: /* SCCTRL */
+return s-sc_ctrl;
+case 0x004: /* SCSYSSTAT */
+case 0x008: /* SCIMCTRL */
+case 0x00C: /* SCIMSTAT */
+case 0x010: /* SCXTALCTRL */
+case 0x014: /* SCPLLCTRL */
+case 0x018: /* SCPLLFCTRL */
+case 0x01C: /* SCPERCTRL0 */
+case 0x020: /* SCPERCTRL1 */
+case 0x024: /* SCPEREN */
+case 0x028: /* SCPERDIS */
+case 0x02C: /* SCPERCLKEN */
+case 0x030: /* SCPERSTAT */
+case 0xEE0: /* SCSYSID0 */
+case 0xEE4: /* SCSYSID1 */
+case 0xEE8: /* SCSYSID2 */
+case 0xEEC: /* SCSYSID3 */
+case 0xF00: /* SCITCR */
+case 0xF04: /* SCITIR0 */
+case 0xF08: /* SCITIR1 */
+case 0xF0C: /* SCITOR */
+case 0xF10: /* SCCNTCTRL */
+case 0xF14: /* SCCNTDATA */
+case 0xF18: /* SCCNTSTEP */
+case 0xFE0: /* SCPERIPHID0 */
+case 0xFE4: /* SCPERIPHID1 */
+case 0xFE8: /* SCPERIPHID2 */
+case 0xFEC: /* SCPERIPHID3 */
+case 0xFF0: /* SCPCELLID0 */
+case 0xFF4: /* SCPCELLID1 */
+case 0xFF8: /* SCPCELLID2 */
+case 0xFFC: /* SCPCELLID3 */
+default:
+qemu_log_mask(LOG_UNIMP,
+arm_sp810_read: Register not yet implemented: 
+0x%x\n,
+(int)offset);
+return 0;
+}
+}
+
+
+
+static void arm_sp810_write(void *opaque, hwaddr offset,
+ uint64_t val, unsigned size)
+{
+switch (offset) {
+case 0x000: /* SCCTRL */
+case 0x004: /* SCSYSSTAT */
+case 0x008: /* SCIMCTRL */
+case 0x00C: /* SCIMSTAT */
+case 0x010: /* SCXTALCTRL */
+case 0x014: /* SCPLLCTRL */
+case 0x018: /* SCPLLFCTRL */
+case 0x01C: /* SCPERCTRL0 */
+case 0x020: /* SCPERCTRL1 */
+case 0x024: /* SCPEREN */
+case 0x028: /* SCPERDIS */
+case 0x02C: /* SCPERCLKEN */
+case 0x030: /* SCPERSTAT */
+case 0xEE0: /* SCSYSID0 */
+case 0xEE4: /* SCSYSID1 */
+case 0xEE8: /* SCSYSID2 */
+case 0xEEC: /* SCSYSID3 */
+case 0xF00: /* SCITCR */
+case 0xF04: /* SCITIR0 */
+case 0xF08: /* SCITIR1 */
+case

[Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress

2014-07-16 Thread Fabian Aggeler

The SP810, which is present in the Versatile Express motherboards,
allows to set the timing reference to either REFCLK or TIMCLK.
QEMU currently sets the SP804 timer to 1MHz by default. To reflect
this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and
TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1).

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
---
 hw/arm/vexpress.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index a88732c..b96c3fd 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, 
const char *name,
 static void vexpress_common_init(VEDBoardInfo *daughterboard,
  MachineState *machine)
 {
-DeviceState *dev, *sysctl, *pl041;
+DeviceState *dev, *sysctl, *pl041, *sp810;
 qemu_irq pic[64];
 uint32_t sys_id;
 DriveInfo *dinfo;
@@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo 
*daughterboard,
 qdev_init_nofail(sysctl);
 sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]);
 
-/* VE_SP810: not modelled */
+/* VE_SP810 */
+sp810 = qdev_create(NULL, arm_sp810);
+/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */
+qdev_prop_set_uint32(sp810, sc-ctrl, (1  15) | (1  17)
+  | (1  19) | (1  21));
+qdev_init_nofail(sp810);
+sysbus_mmio_map(SYS_BUS_DEVICE(sp810), 0, map[VE_SP810]);
+
 /* VE_SERIALPCI: not modelled */
 
 pl041 = qdev_create(NULL, pl041);
-- 
1.8.3.2

Re: [Qemu-devel] [PATCH 2/2] hw/arm/vexpress: add SP810 to the vexpress

2014-07-16 Thread Alex Bennée


Fabian Aggeler writes:

 The SP810, which is present in the Versatile Express motherboards,
 allows to set the timing reference to either REFCLK or TIMCLK.
 QEMU currently sets the SP804 timer to 1MHz by default. To reflect
 this, we set the TimerEn0Sel, TimerEn1Sel, TimerEn2Sel, and
 TimerEn3Sel of the system control register (SCCTRL) to TIMCLK (1).

 Signed-off-by: Fabian Aggeler aggel...@ethz.ch
 ---
  hw/arm/vexpress.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

 diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
 index a88732c..b96c3fd 100644
 --- a/hw/arm/vexpress.c
 +++ b/hw/arm/vexpress.c
 @@ -512,7 +512,7 @@ static pflash_t *ve_pflash_cfi01_register(hwaddr base, 
 const char *name,
  static void vexpress_common_init(VEDBoardInfo *daughterboard,
   MachineState *machine)
  {
 -DeviceState *dev, *sysctl, *pl041;
 +DeviceState *dev, *sysctl, *pl041, *sp810;
  qemu_irq pic[64];
  uint32_t sys_id;
  DriveInfo *dinfo;
 @@ -575,7 +575,14 @@ static void vexpress_common_init(VEDBoardInfo 
 *daughterboard,
  qdev_init_nofail(sysctl);
  sysbus_mmio_map(SYS_BUS_DEVICE(sysctl), 0, map[VE_SYSREGS]);
  
 -/* VE_SP810: not modelled */
 +/* VE_SP810 */
 +sp810 = qdev_create(NULL, arm_sp810);
 +/* SP804 is already running at 1MHz (TIMCLK) so SCCTRL TimerEnXSel=1 */
 +qdev_prop_set_uint32(sp810, sc-ctrl, (1  15) | (1  17)
 +  | (1  19) | (1  21));
snip

Could the #defines in the first patch be moved into a header and used
here rather than manually  setting these bits?

-- 
Alex Bennée

[Qemu-devel] [PATCH for-2.1 0/2] raw-posix: Fail gracefully if no working alignment is found

2014-07-16 Thread Kevin Wolf

Kevin Wolf (2):
  block: Add Error argument to bdrv_refresh_limits()
  raw-posix: Fail gracefully if no working alignment is found

 block.c   | 33 +++--
 block/iscsi.c |  3 +--
 block/qcow2.c |  4 +---
 block/qed.c   |  4 +---
 block/raw-posix.c | 39 ---
 block/raw_bsd.c   |  3 +--
 block/stream.c|  2 +-
 block/vmdk.c  |  4 +---
 include/block/block.h |  2 +-
 include/block/block_int.h |  2 +-
 10 files changed, 59 insertions(+), 37 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()

2014-07-16 Thread Kevin Wolf

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 33 +++--
 block/iscsi.c |  3 +--
 block/qcow2.c |  4 +---
 block/qed.c   |  4 +---
 block/raw-posix.c |  4 +---
 block/raw_bsd.c   |  3 +--
 block/stream.c|  2 +-
 block/vmdk.c  |  4 +---
 include/block/block.h |  2 +-
 include/block/block_int.h |  2 +-
 10 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 3e252a2..8cf519b 100644
--- a/block.c
+++ b/block.c
@@ -508,19 +508,24 @@ int bdrv_create_file(const char *filename, QemuOpts 
*opts, Error **errp)
 return ret;
 }
 
-int bdrv_refresh_limits(BlockDriverState *bs)
+void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs-drv;
+Error *local_err = NULL;
 
 memset(bs-bl, 0, sizeof(bs-bl));
 
 if (!drv) {
-return 0;
+return;
 }
 
 /* Take some limits from the children as a default */
 if (bs-file) {
-bdrv_refresh_limits(bs-file);
+bdrv_refresh_limits(bs-file, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 bs-bl.opt_transfer_length = bs-file-bl.opt_transfer_length;
 bs-bl.opt_mem_alignment = bs-file-bl.opt_mem_alignment;
 } else {
@@ -528,7 +533,11 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 }
 
 if (bs-backing_hd) {
-bdrv_refresh_limits(bs-backing_hd);
+bdrv_refresh_limits(bs-backing_hd, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 bs-bl.opt_transfer_length =
 MAX(bs-bl.opt_transfer_length,
 bs-backing_hd-bl.opt_transfer_length);
@@ -539,10 +548,8 @@ int bdrv_refresh_limits(BlockDriverState *bs)
 
 /* Then let the driver override it */
 if (drv-bdrv_refresh_limits) {
-return drv-bdrv_refresh_limits(bs);
+drv-bdrv_refresh_limits(bs, errp);
 }
-
-return 0;
 }
 
 /*
@@ -993,7 +1000,13 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 goto free_and_fail;
 }
 
-bdrv_refresh_limits(bs);
+bdrv_refresh_limits(bs, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto free_and_fail;
+}
+
 assert(bdrv_opt_mem_align(bs) != 0);
 assert((bs-request_alignment != 0) || bs-sg);
 return 0;
@@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_COMMIT,
 bs-backing_blocker);
 out:
-bdrv_refresh_limits(bs);
+bdrv_refresh_limits(bs, NULL);
 }
 
 /*
@@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
   BDRV_O_CACHE_WB);
 reopen_state-bs-read_only = !(reopen_state-flags  BDRV_O_RDWR);
 
-bdrv_refresh_limits(reopen_state-bs);
+bdrv_refresh_limits(reopen_state-bs, NULL);
 }
 
 /*
diff --git a/block/iscsi.c b/block/iscsi.c
index f3e83e2..a7bb697 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1450,7 +1450,7 @@ static void iscsi_close(BlockDriverState *bs)
 memset(iscsilun, 0, sizeof(IscsiLun));
 }
 
-static int iscsi_refresh_limits(BlockDriverState *bs)
+static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 IscsiLun *iscsilun = bs-opaque;
 
@@ -1475,7 +1475,6 @@ static int iscsi_refresh_limits(BlockDriverState *bs)
 }
 bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len,
  iscsilun);
-return 0;
 }
 
 /* Since iscsi_open() ignores bdrv_flags, there is nothing to do here in
diff --git a/block/qcow2.c b/block/qcow2.c
index b0faa69..445ead4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -855,13 +855,11 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 return ret;
 }
 
-static int qcow2_refresh_limits(BlockDriverState *bs)
+static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVQcowState *s = bs-opaque;
 
 bs-bl.write_zeroes_alignment = s-cluster_sectors;
-
-return 0;
 }
 
 static int qcow2_set_key(BlockDriverState *bs, const char *key)
diff --git a/block/qed.c b/block/qed.c
index cd4872b..7944832 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -528,13 +528,11 @@ out:
 return ret;
 }
 
-static int bdrv_qed_refresh_limits(BlockDriverState *bs)
+static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVQEDState *s = bs-opaque;
 
 bs-bl.write_zeroes_alignment = s-header.cluster_size  BDRV_SECTOR_BITS;
-
-return 0;
 }
 
 /* We have nothing to do for QED reopen, stubs just return
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2bcc73d..ef497b2 100644
--- a/block/raw-posix.c
+++

[Qemu-devel] [PATCH qemu] i386, linux-headers: Add support for kvm_get_rng_seed

2014-07-16 Thread Andy Lutomirski

This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
 kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt,
-NULL, NULL, NULL, NULL,
+kvm_get_rng_seed, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };
 
-- 
1.9.3

Re: [Qemu-devel] [musl] Re: Bogus struct stat64 for qemu-microblaze (user emulation)?

2014-07-16 Thread Rich Felker

On Wed, Jul 16, 2014 at 09:36:23AM +0100, Peter Maydell wrote:
 On 16 July 2014 05:02, Rich Felker dal...@libc.org wrote:
  The qemu-microblaze definition of struct stat64 seems to mismatch the
  kernel definition, which is using asm-generic/stat.h. See:
 
  http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall_defs.h;h=c9e6323905486452f518102bf40ba73143c9d601;hb=HEAD#l1469
  http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall.c;h=a50229d0d72fc68966515fcf2bc308b833a3c032;hb=HEAD#l4949
 
  This seems to be causing a truncated-to-32-bit inode number to be
  stored in the location where st_ino should reside, and a spurious copy
  of the inode number to be written in a unused slot at the end of the
  structure.
 
 Sounds quite plausible -- we've had issues with other archs
 not having correct stat struct definitions in QEMU. I don't
 suppose anybody's done much testing of the microblaze
 linux-user code.

The bug seems to have been introduced here.

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=a523eb06ec3fb2f4f4f4d362bb23704811d11379

I'm CC'ing the author/committer in case he has any input on why he did
this.

  Is my analysis correct? Stefan Kristiansson and I found this while
  working on the or1k port of musl libc, where it seems our structure
  for the existing microblaze port is wrongly aligned with the qemu
  definition rather than the definition the real kernel is using. Before
  I try correcting this on our side, I want to make sure we're working
  with the right version.
 
 I would definitely trust the kernel definition, not QEMU's!

Yes.

Rich

[Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)

2014-07-16 Thread Stefan Weil

Am 16.07.2014 02:26, schrieb Michael Roth:
 Hello,

 On behalf of the QEMU Team, I'd like to announce the availability of the
 third release candidate for the QEMU 2.1 release.  This release is meant
 for testing purposes and should not be used in a production environment.

   http://wiki.qemu.org/download/qemu-2.1.0-rc2.tar.bz2

 You can help improve the quality of the QEMU 2.1 release by testing this
 release and reporting bugs on Launchpad:

   https://bugs.launchpad.net/qemu/

Hi,

a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
view, this change results in a regression for Debian and similar Linux
distributions: QEMU no longer includes iscsi features.

In this special case, the current Debian stable includes QEMU packages
with libiscsi 1.4, but new builds won't include it because that version
is now too old. Debian testing includes a brand new libiscsi, but it
does not include libiscsi.pc, so pkg-config won't know that it is
available and configure will disable libiscsi. I have a patch which
fixes this, so QEMU for Debian testing could include libiscsi again.

Is a feature regression like this one acceptable? Do we need additional
testing (maybe run the build bots with --enable-xxx, so builds fail when
xxx no longer works)?

Regards
Stefan

[Qemu-devel] [PATCH RFC 00/14] dataplane: performance optimization and multi virtqueue

Hi,

These patches bring up below 4 changes:

- introduce selective coroutine bypass mechanism
for improving performance of virtio-blk dataplane with
raw format image

- introduce object allocation pool and apply it to
virtio-blk dataplane for improving its performance

- linux-aio changes: fixing for cases of -EAGAIN and partial
completion, increase max events to 256, and remove one unuseful
fields in 'struct qemu_laiocb'

- support multi virtqueue for virtio-blk dataplane

The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1],
and the 3.17 linux kernel[2] will support the feature in virtio-blk driver.
For those who wants to play the stuff, the kernel side patche can be found
in either Jens's block tree[3] or linux-next[4].

Below fio script running from VM is used for test improvement of these patches:

[global]
direct=1
size=128G
bsrange=4k-4k
timeout=120
numjobs=${JOBS}
ioengine=libaio
iodepth=64
filename=/dev/vdc
group_reporting=1

[f]
rw=randread

One quadcore VM is created in below two hosts to run above fio test:

- laptop(quadcore: 2 physical cores, 2 threads per physical core)
- server(16cores: 8 physical cores, 2 threads per physical core)

Follows the test result on throughput improvement(IOPS) with
this patchset(2 virtqueues per virito-blk device) against QEMU
2.1.0-rc1:

---
| VM in server host | VM in laptop host
---
JOBS=2  | +21%  | +30%
---
JOBS=4  | +84%  | +85%
---

From above result, we can see both scalability and performance
get improved a lot.

After commit 580b6b2aa2(dataplane: use the QEMU block
layer for I/O), average time for submiting one single
request has been increased a lot, as my trace, the average
time taken for submiting one request has been doubled even
though block plugunplug mechanism is introduced to
ease its effect. That is why this patchset introduces
selective coroutine bypass mechanism and object allocation
pool for saving the time first. Based on QEMU 2.0, only
single virtio-blk dataplane multi virtqueue patch can get
much better improvement than current result[5].

TODO:
- optimize block layer for linux aio, so that
more time can be saved for submitting request
- support more than one aio-context for improving
virtio-blk performance

 async.c |1 +
 block.c |  129 ++-
 block/linux-aio.c   |   93 +++-
 block/raw-posix.c   |   34 ++
 hw/block/dataplane/virtio-blk.c |  221 ++-
 hw/block/virtio-blk.c   |   30 +-
 include/block/aio.h |   13 +++
 include/block/block.h   |9 ++
 include/block/coroutine.h   |7 ++
 include/block/coroutine_int.h   |5 +
 include/hw/virtio/virtio-blk.h  |   15 +++
 include/qemu/obj_pool.h |   64 
 qemu-coroutine-lock.c   |4 +-
 qemu-coroutine.c|   33 ++
 14 files changed, 553 insertions(+), 105 deletions(-)



[1], http://marc.info/?l=linux-apim=140486843317107w=2
[2], http://marc.info/?l=linux-apim=140418368421229w=2
[3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ 
#for-3.17/drivers
[4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/
[5], http://marc.info/?l=linux-apim=140377573830230w=2

Thanks,
--
Ming Lei

[Qemu-devel] [PATCH RFC 03/14] block: support to bypass qemu coroutinue

This patch adds bypass mode support for the coroutinue
in bdrv_co_aio_rw_vector(), which is in the fast path
of lots of block device, especially for virtio-blk dataplane.

Signed-off-by: Ming Lei ming@canonical.com
---
 block.c |  129 +++
 1 file changed, 105 insertions(+), 24 deletions(-)

diff --git a/block.c b/block.c
index 128a14f..db7ba37 100644
--- a/block.c
+++ b/block.c
@@ -55,6 +55,21 @@ struct BdrvDirtyBitmap {
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
+typedef struct CoroutineIOCompletion {
+Coroutine *coroutine;
+int ret;
+bool bypass;
+QEMUIOVector *bounced_iov;
+} CoroutineIOCompletion;
+
+typedef struct BlockDriverAIOCBCoroutine {
+BlockDriverAIOCB common;
+BlockRequest req;
+bool is_write;
+bool *done;
+QEMUBH *bh;
+} BlockDriverAIOCBCoroutine;
+
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
 #define COROUTINE_POOL_RESERVATION 64 /* number of coroutines to reserve */
@@ -122,6 +137,21 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+static CoroutineIOCompletion *bdrv_get_co_io_comp(BlockDriverAIOCBCoroutine
+  *acb)
+{
+return (CoroutineIOCompletion *)((void *)acb +
+   sizeof(BlockDriverAIOCBCoroutine));
+}
+
+static BlockDriverAIOCBCoroutine *bdrv_get_aio_co(CoroutineIOCompletion *co)
+{
+assert(co-bypass);
+
+return (BlockDriverAIOCBCoroutine *)((void *)co -
+   sizeof(BlockDriverAIOCBCoroutine));
+}
+
 /* throttling disk I/O limits */
 void bdrv_set_io_limits(BlockDriverState *bs,
 ThrottleConfig *cfg)
@@ -3074,7 +3104,16 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 ret = drv-bdrv_co_readv(bs, sector_num, local_sectors,
  local_qiov);
 
-qemu_iovec_destroy(local_qiov);
+
+if (qemu_coroutine_self_bypassed()) {
+CoroutineIOCompletion *pco = bdrv_get_co_io_comp(
+ (BlockDriverAIOCBCoroutine *)
+ qemu_coroutine_get_var());
+pco-bounced_iov = g_malloc(sizeof(QEMUIOVector));
+*pco-bounced_iov = local_qiov;
+} else {
+qemu_iovec_destroy(local_qiov);
+}
 } else {
 ret = 0;
 }
@@ -4652,15 +4691,6 @@ static BlockDriverAIOCB 
*bdrv_aio_writev_em(BlockDriverState *bs,
 return bdrv_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
 }
 
-
-typedef struct BlockDriverAIOCBCoroutine {
-BlockDriverAIOCB common;
-BlockRequest req;
-bool is_write;
-bool *done;
-QEMUBH* bh;
-} BlockDriverAIOCBCoroutine;
-
 static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
 {
 AioContext *aio_context = bdrv_get_aio_context(blockacb-bs);
@@ -4679,6 +4709,12 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 .cancel = bdrv_aio_co_cancel_em,
 };
 
+static const AIOCBInfo bdrv_em_co_bypass_aiocb_info = {
+.aiocb_size = sizeof(BlockDriverAIOCBCoroutine) +
+  sizeof(CoroutineIOCompletion),
+.cancel = bdrv_aio_co_cancel_em,
+};
+
 static void bdrv_co_em_bh(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
@@ -4698,6 +4734,12 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
 BlockDriverState *bs = acb-common.bs;
+bool bypass = qemu_coroutine_self_bypassed();
+
+if (bypass) {
+qemu_coroutine_set_var(acb);
+memset(bdrv_get_co_io_comp(acb), 0, sizeof(CoroutineIOCompletion));
+}
 
 if (!acb-is_write) {
 acb-req.error = bdrv_co_do_readv(bs, acb-req.sector,
@@ -4707,8 +4749,10 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 acb-req.nb_sectors, acb-req.qiov, acb-req.flags);
 }
 
-acb-bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
-qemu_bh_schedule(acb-bh);
+if (!bypass) {
+acb-bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
+qemu_bh_schedule(acb-bh);
+}
 }
 
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
@@ -4722,8 +4766,18 @@ static BlockDriverAIOCB 
*bdrv_co_aio_rw_vector(BlockDriverState *bs,
 {
 Coroutine *co;
 BlockDriverAIOCBCoroutine *acb;
+const AIOCBInfo *aiocb_info;
+bool bypass;
 
-acb = qemu_aio_get(bdrv_em_co_aiocb_info, bs, cb, opaque);
+if (qemu_aio_get_bypass_co(bdrv_get_aio_context(bs))) {
+aiocb_info = bdrv_em_co_bypass_aiocb_info;
+bypass = true;
+} else {
+aiocb_info = bdrv_em_co_aiocb_info;
+bypass = false;
+}
+
+acb = qemu_aio_get(aiocb_info, bs, cb, opaque);
 acb-req.sector = sector_num;
 acb-req.nb_sectors = nb_sectors;
 acb-req.qiov = qiov;

[Qemu-devel] [PATCH RFC 14/14] dataplane: virtio-blk: support mutlti virtqueue

This patch supports to handle host notify from multi
virt queues, but still process/submit io in the
one iothread.

Currently this patch brings up below improvement
with two virtqueues(against single virtqueue):

---
| VM in server host | VM in laptop host
---
JOBS=2  | +8%   | -11%
---
JOBS=4  | +64%  | +29%
---

The reason is that commit 580b6b2aa2(dataplane: use the QEMU
block layer for I/O) causes average submitting time for single
request doubled, so io thread performance get decreased.

Based on QEMU 2.0, only this single patch can achieve
very good improvement:

http://marc.info/?l=linux-apim=140377573830230w=2

So hope QEMU block layer can get optimized for linux aio,
or maybe a fast path is needed for linux aio.

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/dataplane/virtio-blk.c |  209 ---
 include/hw/virtio/virtio-blk.h  |1 +
 2 files changed, 153 insertions(+), 57 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 828fe99..bd66274 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -26,6 +26,11 @@
 
 #define REQ_POOL_SZ 128
 
+typedef struct {
+EventNotifier notifier;
+VirtIOBlockDataPlane *s;
+} VirtIOBlockNotifier;
+
 struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
@@ -35,9 +40,10 @@ struct VirtIOBlockDataPlane {
 VirtIOBlkConf *blk;
 
 VirtIODevice *vdev;
-Vring vring;/* virtqueue vring */
-EventNotifier *guest_notifier;  /* irq */
-QEMUBH *bh; /* bh for guest notification */
+Vring *vring;/* virtqueue vring */
+EventNotifier **guest_notifier;  /* irq */
+uint64_t   pending_guest_notifier;  /* pending guest notifer for vq */
+QEMUBH *bh;  /* bh for guest notification */
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,7 +53,9 @@ struct VirtIOBlockDataPlane {
 IOThread *iothread;
 IOThread internal_iothread_obj;
 AioContext *ctx;
-EventNotifier host_notifier;/* doorbell */
+VirtIOBlockNotifier *host_notifier; /* doorbell */
+uint64_t   pending_host_notifier;   /* pending host notifer for vq */
+QEMUBH *host_notifier_bh;   /* for handle host notifier */
 
 /* Operation blocker on BDS */
 Error *blocker;
@@ -60,20 +68,26 @@ struct VirtIOBlockDataPlane {
 };
 
 /* Raise an interrupt to signal guest, if necessary */
-static void notify_guest(VirtIOBlockDataPlane *s)
+static void notify_guest(VirtIOBlockDataPlane *s, unsigned int qid)
 {
-if (!vring_should_notify(s-vdev, s-vring)) {
-return;
+if (vring_should_notify(s-vdev, s-vring[qid])) {
+event_notifier_set(s-guest_notifier[qid]);
 }
-
-event_notifier_set(s-guest_notifier);
 }
 
 static void notify_guest_bh(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
+unsigned int qid;
+uint64_t pending = s-pending_guest_notifier;
+
+s-pending_guest_notifier = 0;
 
-notify_guest(s);
+while ((qid = ffsl(pending))) {
+qid--;
+notify_guest(s, qid);
+pending = ~(1  qid);
+}
 }
 
 static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
@@ -81,7 +95,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req-dev-dataplane;
 stb_p(req-in-status, status);
 
-vring_push(req-dev-dataplane-vring, req-elem,
+vring_push(s-vring[req-qid], req-elem,
req-qiov.size + sizeof(*req-in));
 
 /* Suppress notification to guest by BH and its scheduled
@@ -90,17 +104,15 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
  * executed in dataplane aio context even after it is
  * stopped, so needn't worry about notification loss with BH.
  */
+assert(req-qid  64);
+s-pending_guest_notifier |= (1  req-qid);
 qemu_bh_schedule(s-bh);
 }
 
-static void handle_notify(EventNotifier *e)
+static void process_vq_notify(VirtIOBlockDataPlane *s, unsigned short qid)
 {
-VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-   host_notifier);
 VirtIOBlock *vblk = VIRTIO_BLK(s-vdev);
 
-event_notifier_test_and_clear(s-host_notifier);
-bdrv_io_plug(s-blk-conf.bs);
 for (;;) {
 MultiReqBuffer mrb = {
 .num_writes = 0,
@@ -108,12 +120,13 @@ static void handle_notify(EventNotifier *e)
 int ret;
 
 /*

[Qemu-devel] [PATCH RFC 08/14] linux-aio: fix submit aio as a batch

In the enqueue path, we can't complete request, otherwise
Co-routine re-entered recursively may be caused, so this
patch fixes the issue with below ideas:

- for -EAGAIN or partial completion, retry the submission by
an introduced event handler
- for part of completion, also update the io queue
- for other failure, return the failure if in enqueue path,
otherwise, abort all queued I/O

Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |   90 -
 1 file changed, 68 insertions(+), 22 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 7ac7e8c..5eb9c92 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -51,6 +51,7 @@ struct qemu_laio_state {
 
 /* io queue for submit at batch */
 LaioQueue io_q;
+EventNotifier retry;  /* handle -EAGAIN and partial completion */
 };
 
 static inline ssize_t io_event_ret(struct io_event *ev)
@@ -154,45 +155,80 @@ static void ioq_init(LaioQueue *io_q)
 io_q-plugged = 0;
 }
 
-static int ioq_submit(struct qemu_laio_state *s)
+static void abort_queue(struct qemu_laio_state *s)
+{
+int i;
+for (i = 0; i  s-io_q.idx; i++) {
+struct qemu_laiocb *laiocb = container_of(s-io_q.iocbs[i],
+  struct qemu_laiocb,
+  iocb);
+laiocb-ret = -EIO;
+qemu_laio_process_completion(s, laiocb);
+}
+}
+
+static int ioq_submit(struct qemu_laio_state *s, bool enqueue)
 {
 int ret, i = 0;
 int len = s-io_q.idx;
+int j = 0;
 
-do {
-ret = io_submit(s-ctx, len, s-io_q.iocbs);
-} while (i++  3  ret == -EAGAIN);
+if (!len) {
+return 0;
+}
 
-/* empty io queue */
-s-io_q.idx = 0;
+ret = io_submit(s-ctx, len, s-io_q.iocbs);
+if (ret == -EAGAIN) {
+event_notifier_set(s-retry);
+return 0;
+} else if (ret  0) {
+if (enqueue) {
+return ret;
+}
 
-if (ret  0) {
-i = 0;
-} else {
-i = ret;
+/* in non-queue path, all IOs have to be completed */
+abort_queue(s);
+ret = len;
+} else if (ret == 0) {
+goto out;
 }
 
-for (; i  len; i++) {
-struct qemu_laiocb *laiocb =
-container_of(s-io_q.iocbs[i], struct qemu_laiocb, iocb);
-
-laiocb-ret = (ret  0) ? ret : -EIO;
-qemu_laio_process_completion(s, laiocb);
+for (i = ret; i  len; i++) {
+s-io_q.iocbs[j++] = s-io_q.iocbs[i];
 }
+
+ out:
+/* update io queue */
+s-io_q.idx -= ret;
+
 return ret;
 }
 
-static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
+static void ioq_submit_retry(EventNotifier *e)
+{
+struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, retry);
+
+event_notifier_test_and_clear(e);
+ioq_submit(s, false);
+}
+
+static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
 {
 unsigned int idx = s-io_q.idx;
 
+if (unlikely(idx == s-io_q.size)) {
+return -1;
+}
+
 s-io_q.iocbs[idx++] = iocb;
 s-io_q.idx = idx;
 
-/* submit immediately if queue is full */
-if (idx == s-io_q.size) {
-ioq_submit(s);
+/* submit immediately if queue depth is above 2/3 */
+if (idx  s-io_q.size * 2 / 3) {
+return ioq_submit(s, true);
 }
+
+return 0;
 }
 
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
@@ -214,7 +250,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, 
bool unplug)
 }
 
 if (s-io_q.idx  0) {
-ret = ioq_submit(s);
+ret = ioq_submit(s, false);
 }
 
 return ret;
@@ -258,7 +294,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 goto out_free_aiocb;
 }
 } else {
-ioq_enqueue(s, iocbs);
+if (ioq_enqueue(s, iocbs)  0) {
+goto out_free_aiocb;
+}
 }
 return laiocb-common;
 
@@ -272,6 +310,7 @@ void laio_detach_aio_context(void *s_, AioContext 
*old_context)
 struct qemu_laio_state *s = s_;
 
 aio_set_event_notifier(old_context, s-e, NULL);
+aio_set_event_notifier(old_context, s-retry, NULL);
 }
 
 void laio_attach_aio_context(void *s_, AioContext *new_context)
@@ -279,6 +318,7 @@ void laio_attach_aio_context(void *s_, AioContext 
*new_context)
 struct qemu_laio_state *s = s_;
 
 aio_set_event_notifier(new_context, s-e, qemu_laio_completion_cb);
+aio_set_event_notifier(new_context, s-retry, ioq_submit_retry);
 }
 
 void *laio_init(void)
@@ -295,9 +335,14 @@ void *laio_init(void)
 }
 
 ioq_init(s-io_q);
+if (event_notifier_init(s-retry, false)  0) {
+goto out_notifer_init;
+}
 
 return s;
 
+out_notifer_init:
+io_destroy(s-ctx);
 out_close_efd:
 event_notifier_cleanup(s-e);
 out_free_state:
@@ -310,6 +355,7 @@ void laio_cleanup(void *s_)

[Qemu-devel] [PATCH RFC 12/14] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ

Prepare for supporting mutli vqs per virtio-blk device.

Signed-off-by: Ming Lei ming@canonical.com
---
 include/hw/virtio/virtio-blk.h |8 
 1 file changed, 8 insertions(+)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 45f8894..ad70c9a 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -42,6 +42,12 @@
 #define VIRTIO_BLK_F_TOPOLOGY   10  /* Topology information is available */
 #define VIRTIO_BLK_F_CONFIG_WCE 11  /* write cache configurable */
 
+/*
+ * support multi vqs, and virtio_blk_config.num_queues is only
+ * available when this feature is enabled
+ */
+#define VIRTIO_BLK_F_MQ12
+
 #define VIRTIO_BLK_ID_BYTES 20  /* ID string length */
 
 struct virtio_blk_config
@@ -58,6 +64,8 @@ struct virtio_blk_config
 uint16_t min_io_size;
 uint32_t opt_io_size;
 uint8_t wce;
+uint8_t unused;
+uint16_t num_queues;   /* must be at the end */
 } QEMU_PACKED;
 
 /* These two define direction. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 04/14] Revert raw-posix: drop raw_get_aio_fd() since it is no longer used

This reverts commit 76ef2cf5493a215efc351f48ae7094d6c183fcac.

Reintroduce the helper of raw_get_aio_fd() for enabling
coroutinue bypass mode in case of raw image.

Signed-off-by: Ming Lei ming@canonical.com
---
 block/raw-posix.c |   34 ++
 include/block/block.h |9 +
 2 files changed, 43 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2bcc73d..98b9626 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2419,6 +2419,40 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#ifdef CONFIG_LINUX_AIO
+/**
+ * Return the file descriptor for Linux AIO
+ *
+ * This function is a layering violation and should be removed when it becomes
+ * possible to call the block layer outside the global mutex.  It allows the
+ * caller to hijack the file descriptor so I/O can be performed outside the
+ * block layer.
+ */
+int raw_get_aio_fd(BlockDriverState *bs)
+{
+BDRVRawState *s;
+
+if (!bs-drv) {
+return -ENOMEDIUM;
+}
+
+if (bs-drv == bdrv_find_format(raw)) {
+bs = bs-file;
+}
+
+/* raw-posix has several protocols so just check for raw_aio_readv */
+if (bs-drv-bdrv_aio_readv != raw_aio_readv) {
+return -ENOTSUP;
+}
+
+s = bs-opaque;
+if (!s-use_aio) {
+return -ENOTSUP;
+}
+return s-fd;
+}
+#endif /* CONFIG_LINUX_AIO */
+
 static void bdrv_file_init(void)
 {
 /*
diff --git a/include/block/block.h b/include/block/block.h
index 32d3676..8d15693 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -482,6 +482,15 @@ void bdrv_op_block_all(BlockDriverState *bs, Error 
*reason);
 void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason);
 bool bdrv_op_blocker_is_empty(BlockDriverState *bs);
 
+#ifdef CONFIG_LINUX_AIO
+int raw_get_aio_fd(BlockDriverState *bs);
+#else
+static inline int raw_get_aio_fd(BlockDriverState *bs)
+{
+return -ENOTSUP;
+}
+#endif
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 05/14] dataplane: enable selective bypassing coroutine

This patch enables selective bypassing for the
coroutine in bdrv_co_aio_rw_vector() if the image
format is raw.

With this patch, ~16% thoughput improvement can
be observed in my laptop based VM test, and ~7%
improvement is observed in the VM based on server.

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/dataplane/virtio-blk.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d6ba65c..2093e4a 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -28,6 +28,7 @@ struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
 bool stopping;
+bool raw_format;
 
 VirtIOBlkConf *blk;
 
@@ -193,6 +194,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *blk,
 error_setg(s-blocker, block device is in use by data plane);
 bdrv_op_block_all(blk-conf.bs, s-blocker);
 
+s-raw_format = (raw_get_aio_fd(blk-conf.bs) = 0);
+
 *dataplane = s;
 }
 
@@ -262,6 +265,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 /* Kick right away to begin processing requests already in vring */
 event_notifier_set(virtio_queue_get_host_notifier(vq));
 
+if (s-raw_format) {
+qemu_aio_set_bypass_co(s-ctx, true);
+}
+
 /* Get this show started by hooking up our callbacks */
 aio_context_acquire(s-ctx);
 aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify);
@@ -291,6 +298,9 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 
 aio_context_release(s-ctx);
 
+if (s-raw_format) {
+qemu_aio_set_bypass_co(s-ctx, false);
+}
 /* Sync vring state back to virtqueue so that non-dataplane request
  * processing can continue when we disable the host notifier below.
  */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 13/14] hw/block/virtio-blk: create num_queues vqs if dataplane is enabled

Now we only support multi vqs for dataplane case.

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/virtio-blk.c  |   16 +++-
 include/hw/virtio/virtio-blk.h |3 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ab99156..160b021 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -556,6 +556,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blkcfg.physical_block_exp = get_physical_block_exp(s-conf);
 blkcfg.alignment_offset = 0;
 blkcfg.wce = bdrv_enable_write_cache(s-bs);
+stw_p(blkcfg.num_queues, s-blk.num_queues);
 memcpy(config, blkcfg, sizeof(struct virtio_blk_config));
 }
 
@@ -590,6 +591,12 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 if (bdrv_is_read_only(s-bs))
 features |= 1  VIRTIO_BLK_F_RO;
 
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+if (s-blk.num_queues  1) {
+features |= 1  VIRTIO_BLK_F_MQ;
+}
+#endif
+
 return features;
 }
 
@@ -739,8 +746,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 Error *err = NULL;
 #endif
+int i;
 static int virtio_blk_id;
 
+#ifndef CONFIG_VIRTIO_BLK_DATA_PLANE
+blk-num_queues = 1;
+#endif
+
 if (!blk-conf.bs) {
 error_setg(errp, drive property not set);
 return;
@@ -765,7 +777,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s-rq = NULL;
 s-sector_mask = (s-conf-logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
-s-vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+for (i = 0; i  blk-num_queues; i++)
+s-vqs[i] = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+s-vq = s-vqs[0];
 s-complete_request = virtio_blk_complete_request;
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 virtio_blk_data_plane_create(vdev, blk, s-dataplane, err);
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index ad70c9a..91489b0 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -50,6 +50,8 @@
 
 #define VIRTIO_BLK_ID_BYTES 20  /* ID string length */
 
+#define VIRTIO_BLK_MAX_VQS 16  /* max virtio queues supported now */
+
 struct virtio_blk_config
 {
 uint64_t capacity;
@@ -132,6 +134,7 @@ typedef struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockDriverState *bs;
 VirtQueue *vq;
+VirtQueue *vqs[VIRTIO_BLK_MAX_VQS];
 void *rq;
 QEMUBH *bh;
 BlockConf *conf;
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 06/14] qemu/obj_pool.h: introduce object allocation pool

This patch introduces object allocation pool for speeding up
object allocation in fast path.

Signed-off-by: Ming Lei ming@canonical.com
---
 include/qemu/obj_pool.h |   64 +++
 1 file changed, 64 insertions(+)
 create mode 100644 include/qemu/obj_pool.h

diff --git a/include/qemu/obj_pool.h b/include/qemu/obj_pool.h
new file mode 100644
index 000..94b5f49
--- /dev/null
+++ b/include/qemu/obj_pool.h
@@ -0,0 +1,64 @@
+#ifndef QEMU_OBJ_POOL_HEAD
+#define QEMU_OBJ_POOL_HEAD
+
+typedef struct {
+unsigned int size;
+unsigned int cnt;
+
+void **free_obj;
+int free_idx;
+
+char *objs;
+} ObjPool;
+
+static inline void obj_pool_init(ObjPool *op, void *objs_buf, void **free_objs,
+ unsigned int obj_size, unsigned cnt)
+{
+int i;
+
+op-objs = (char *)objs_buf;
+op-free_obj = free_objs;
+op-size = obj_size;
+op-cnt = cnt;
+
+for (i = 0; i  op-cnt; i++) {
+op-free_obj[i] = (void *)op-objs[i * op-size];
+}
+op-free_idx = op-cnt;
+}
+
+static inline void *obj_pool_get(ObjPool *op)
+{
+void *obj;
+
+if (!op) {
+return NULL;
+}
+
+if (op-free_idx = 0) {
+return NULL;
+}
+
+obj = op-free_obj[--op-free_idx];
+return obj;
+}
+
+static inline bool obj_pool_has_obj(ObjPool *op, void *obj)
+{
+return op  (unsigned long)obj = (unsigned long)op-objs[0] 
+   (unsigned long)obj =
+   (unsigned long)op-objs[(op-cnt - 1) * op-size];
+}
+
+static inline void obj_pool_put(ObjPool *op, void *obj)
+{
+if (!op || !obj_pool_has_obj(op, obj)) {
+return;
+}
+
+assert(op-free_idx  op-cnt);
+
+op-free_obj[op-free_idx++] = obj;
+}
+
+#endif
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 07/14] dataplane: use object pool to speed up allocation for virtio blk request

g_slice_new(VirtIOBlockReq), its free pair and access the instance
is a bit slow since sizeof(VirtIOBlockReq) takes more than 40KB,
so use object pool to speed up its allocation and release.

With this patch, ~20% thoughput improvement can
be observed in my laptop based VM test, and ~5%
improvement is observed in the VM based on server.

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/dataplane/virtio-blk.c |   12 
 hw/block/virtio-blk.c   |   13 +++--
 include/hw/virtio/virtio-blk.h  |2 ++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 2093e4a..828fe99 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -24,6 +24,8 @@
 #include hw/virtio/virtio-bus.h
 #include qom/object_interfaces.h
 
+#define REQ_POOL_SZ 128
+
 struct VirtIOBlockDataPlane {
 bool started;
 bool starting;
@@ -51,6 +53,10 @@ struct VirtIOBlockDataPlane {
 Error *blocker;
 void (*saved_complete_request)(struct VirtIOBlockReq *req,
unsigned char status);
+
+VirtIOBlockReq  reqs[REQ_POOL_SZ];
+void *free_reqs[REQ_POOL_SZ];
+ObjPool  req_pool;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -238,6 +244,10 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 return;
 }
 
+vblk-obj_pool = s-req_pool;
+obj_pool_init(vblk-obj_pool, s-reqs, s-free_reqs,
+  sizeof(VirtIOBlockReq), REQ_POOL_SZ);
+
 /* Set up guest notifier (irq) */
 if (k-set_guest_notifiers(qbus-parent, 1, true) != 0) {
 fprintf(stderr, virtio-blk failed to set guest notifier, 
@@ -298,6 +308,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 
 aio_context_release(s-ctx);
 
+vblk-obj_pool = NULL;
+
 if (s-raw_format) {
 qemu_aio_set_bypass_co(s-ctx, false);
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c241c50..2a11bc4 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -31,7 +31,11 @@
 
 VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 {
-VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
+VirtIOBlockReq *req = obj_pool_get(s-obj_pool);
+
+if (!req) {
+req = g_slice_new(VirtIOBlockReq);
+}
 req-dev = s;
 req-qiov.size = 0;
 req-next = NULL;
@@ -41,7 +45,11 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 void virtio_blk_free_request(VirtIOBlockReq *req)
 {
 if (req) {
-g_slice_free(VirtIOBlockReq, req);
+if (obj_pool_has_obj(req-dev-obj_pool, req)) {
+obj_pool_put(req-dev-obj_pool, req);
+} else {
+g_slice_free(VirtIOBlockReq, req);
+}
 }
 }
 
@@ -801,6 +809,7 @@ static void virtio_blk_instance_init(Object *obj)
 {
 VirtIOBlock *s = VIRTIO_BLK(obj);
 
+s-obj_pool = NULL;
 object_property_add_link(obj, iothread, TYPE_IOTHREAD,
  (Object **)s-blk.iothread,
  qdev_prop_allow_set_link_before_realize,
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index afb7b8d..49ac234 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -18,6 +18,7 @@
 #include hw/block/block.h
 #include sysemu/iothread.h
 #include block/block.h
+#include qemu/obj_pool.h
 
 #define TYPE_VIRTIO_BLK virtio-blk-device
 #define VIRTIO_BLK(obj) \
@@ -135,6 +136,7 @@ typedef struct VirtIOBlock {
 Notifier migration_state_notifier;
 struct VirtIOBlockDataPlane *dataplane;
 #endif
+ObjPool *obj_pool;
 } VirtIOBlock;
 
 typedef struct MultiReqBuffer {
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 09/14] linux-aio: increase max event to 256

This patch increases max event to 256 for the comming
virtio-blk multi virtqueue support.

Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 5eb9c92..c06a57d 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -23,7 +23,7 @@
  *  than this we will get EAGAIN from io_submit which is communicated to
  *  the guest as an I/O error.
  */
-#define MAX_EVENTS 128
+#define MAX_EVENTS 256
 
 #define MAX_QUEUED_IO  128
 
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 11/14] hw/virtio-pci: introduce num_queues property

This patch introduces the parameter of 'num_queues', and
prepare for supporting mutli vqs of virtio-blk.

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/virtio-blk.c  |1 +
 include/hw/virtio/virtio-blk.h |1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 2a11bc4..ab99156 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -826,6 +826,7 @@ static Property virtio_blk_properties[] = {
 #endif
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
 DEFINE_PROP_BIT(x-data-plane, VirtIOBlock, blk.data_plane, 0, false),
+DEFINE_PROP_UINT32(num_queues, VirtIOBlock, blk.num_queues, 1),
 #endif
 DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 49ac234..45f8894 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -114,6 +114,7 @@ struct VirtIOBlkConf
 uint32_t scsi;
 uint32_t config_wce;
 uint32_t data_plane;
+uint32_t num_queues;
 };
 
 struct VirtIOBlockDataPlane;
-- 
1.7.9.5

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases (was: [ANNOUNCE] QEMU 2.1.0-rc2 is now available)

2014-07-16 Thread Peter Maydell

On 16 July 2014 17:28, Stefan Weil s...@weilnetz.de wrote:
 a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
 requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
 view, this change results in a regression for Debian and similar Linux
 distributions: QEMU no longer includes iscsi features.

 In this special case, the current Debian stable includes QEMU packages
 with libiscsi 1.4, but new builds won't include it because that version
 is now too old. Debian testing includes a brand new libiscsi, but it
 does not include libiscsi.pc, so pkg-config won't know that it is
 available and configure will disable libiscsi. I have a patch which
 fixes this, so QEMU for Debian testing could include libiscsi again.

 Is a feature regression like this one acceptable? Do we need additional
 testing (maybe run the build bots with --enable-xxx, so builds fail when
 xxx no longer works)?

In general, we should try to avoid feature regressions, but it's
going to be a case-by-case thing. In this particular instance,
upstream libiscsi don't recommend pre-1.9 for production use
(as the commit message documents), and I don't think we would
be doing our users any favours by allowing them to build something
that's likely to be broken. We should of course flag up this sort of
minimum version of our dependencies has been bumped info in
the release notes.

I think that does QEMU still build with all the features we need on
our distro? has to be testing done by the people who package and
maintain QEMU for each distro -- they're the only people who know
which configurations options they enable and which baseline versions
of their distro they still care about packaging mainline QEMU for.

thanks
-- PMM

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases


Il 16/07/2014 18:28, Stefan Weil ha scritto:

Debian testing includes a brand new libiscsi, but it
does not include libiscsi.pc, so pkg-config won't know that it is
available and configure will disable libiscsi.


That's a packaging bug.


I have a patch which
fixes this, so QEMU for Debian testing could include libiscsi again.

Is a feature regression like this one acceptable? Do we need additional
testing (maybe run the build bots with --enable-xxx, so builds fail when
xxx no longer works)?


As mentioned in the e49ab19fcaa617ad6cdfe1ac401327326b6a2552 commit 
message, this was intentional.  I was reluctant to do it, but ultimately 
Peter Lieven convinced me that it isn't just about using fancy new APIs; 
libiscsi was too buggy to be useful until release 1.8.0 (even 1.9.0 
requires a patch to avoid segfaults, and many more if you want to run it 
on ARM).


Paolo

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

* Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:
 * Paolo Bonzini (pbonz...@redhat.com) wrote:
 Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
 
 +
 +/* If it's already open, return it */
 +if (qfs-file-return_path) {
 +return qfs-file-return_path;
 
 Wouldn't this leave a dangling file descriptor if you call
 socket_dup_return_path twice, and then close the original QEMUFile?
 
 Hmm - how?
 
 The problem is that there is no reference count on QEMUFile, so if you do
 
   f1 = open_return_path(f0);
   f2 = open_return_path(f0);
   /* now f1 == f2 */
   qemu_fclose(f1);
   /* now f2 is dangling */
 
 I think from the way I'm using it, I can remove the optimisation, but I do
 need to check.
 
 I'm not too sure what your worry is about 'f2' in this case; I guess the 
 caller
 needs to know that it should only close the return path once - is that
 your worry?
 
 Yes.  The API is not well encapsulated; a random caller of
 open_return_path does not know (and cannot know) whether it should close the
 returned file or not.

OK, then yes I'll give that a go taking out those optimisations.

 I'm more nervous about dropping that one, because the current scheme
 does provide a clean way of finding the forward path if you've got the
 reverse (although I don't think I make use of it).
 
 If it isn't used, why keep it?

It just felt pleasently symmetric being able to get the forward path
by asking for the return path on the return path; but I can remove it.

  Source side
 Forward path - written by migration thread
: It's OK for this to be blocking, but we end up with it being
  non-blocking, and modify the socket code to emulate blocking.
 
 This likely has a performance impact though.  The first migration thread
 code drop from Juan already improved throughput a lot, even if it kept the
 iothread all the time and only converted from nonblocking writes to
 blocking.
 
 Can you give some more reasoning as to why you think this will hit the
 performance so much; I thought the output buffers were quite big anyway.
 
 I don't really know, it's
 Return path  - opened by main thread, read by fd_handler on main thread
: Must be non-blocking so as not to block the main thread while
  waiting for a partially sent command.
 
 Why can't you handle this in the migration thread (or a new postcopy thread
 on the source side)?  Then it can stay blocking.
 
 Handling it within the migration thread would make it much more complicated
 (which would be bad since it's already complex enough);
 
 Ok.  I'm not sure why it is more complicated since migration is essentially
 two-phase, one where the source drives it and one where the source just
 waits for requests, but I'll trust you on this. :)

It's not as clean a split like that; during the postcopy phase we still do the 
linear
page scan to send pages before they're requested, so the main migration thread
code keeps going.
(There's an 'interesting' balance here, send too many linear pages and they get
in the way of the postcopy requests and increase the latency, but sending them
means that you get a lot of the pages without having to request them which is 0
latency)

  Destination side
 Forward path - read by main thread
 
 This must be nonblocking so that the monitor keeps responding.
 
 Interesting, I suspect none of the code in there is set up for that at the
 moment, so how does that work during migration at the moment?
 
 It sure is. :)

Oh so it is; I missed the 'qemu_set_nonblock(fd);' in process_incoming_migration

 On the destination side, migration is done in a coroutine (see
 process_incoming_migration) so it's all transparent.  Only socket_get_buffer
 has to know about this:
 
 len = qemu_recv(s-fd, buf, size, 0);
 if (len != -1) {
 break;
 }
 if (socket_error() == EAGAIN) {
 yield_until_fd_readable(s-fd);
 } else if (socket_error() != EINTR) {
 break;
 }
 
 If the socket is put in blocking mode recv will never return EAGAIN, so this
 code will only run if the socket is nonblocking.

OK, yes.

 Actually, I see I missed something here; this should be:
 
Destination side
  Forward path - read by main thread, and listener thread (see the
  separate mail that described that listner thread)
 
 and that means that once postcopy is going (and the listener thread started)
 it can't block the monitor.
 
 Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once
 it gets its hands on the QEMUFile.
 
 Return path  - opened by main thread, written by main thread AND 
  postcopy
thread (protected by rp_mutex)
 
 When does the main thread needs to write?
 
 Not much; the only things the main thread currently responds to are the
 ReqAck (ping like) requests; those are turning out to be very useful during

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Peter Lieven

Am 16.07.2014 19:23, schrieb ronnie sahlberg:
 On Wed, Jul 16, 2014 at 10:11 AM, Stefan Weil s...@weilnetz.de wrote:
 Am 16.07.2014 18:49, schrieb Paolo Bonzini:
 Il 16/07/2014 18:28, Stefan Weil ha scritto:
 Debian testing includes a brand new libiscsi, but it
 does not include libiscsi.pc, so pkg-config won't know that it is
 available and configure will disable libiscsi.
 That's a packaging bug.
 CC'ing Michael as he is the Debian maintainer of this package and
 Aurélien who maintains QEMU for Debian.

 Michael, should I send a Debian bug report for libiscsi-dev? Would an
 update of libiscsi for Debian stable be reasonable if versions older
 than 1.9 are too buggy to be used?
 If you ask debian to upgrade. Could you ask them to wait and upgrade after I
 have release the next version, hopefully if all goes well, at the end
 of this week?

If someone from Ubuntu reads here, this is also for you. Your latest
LTS release does also contain 1.4.0.


 It contains new functionality, thanks to plieven, to better handle
 cases where active/passive storage arrays
 perform failover.

Yes and it fixes yet another bug in serial logic. Not a big one, but it
could cause protocol errors.



 I must admit that I'm a little bit
 surprised because iSCSI support worked for me quite well the last time I
 used it with Debian wheezy.
 I think, and plieven please correct me if I am wrong, earlier version
 would work reasonably well for basic use
 but there were bugs and gaps in functionality that made it ill suited
 for enterprise environments.

It depends how you define basic use. It causes severe protocol violations 
especially
on reconnects and it will simply stop sending PDUs if CmdSN wraps from 
0x to 0x0.
Thats ok for try and mount an iSCSI volume and see if it works, but its
not usable for production use enterprise or not.

I mentioned the most critical bugs in the commit message to the libiscsi
version bump to 1.8.0 (which Paolo later increased to 1.9.0 to make the
BUSY handling possible).

Peter

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-16 Thread Peter Lieven

Am 16.07.2014 18:46, schrieb Peter Maydell:
 On 16 July 2014 17:28, Stefan Weil s...@weilnetz.de wrote:
 a recent commit (e49ab19fcaa617ad6cdfe1ac401327326b6a2552) increased the
 requirements for libiscsi from 1.4.0 to 1.9.0. From a user's point of
 view, this change results in a regression for Debian and similar Linux
 distributions: QEMU no longer includes iscsi features.

 In this special case, the current Debian stable includes QEMU packages
 with libiscsi 1.4, but new builds won't include it because that version
 is now too old. Debian testing includes a brand new libiscsi, but it
 does not include libiscsi.pc, so pkg-config won't know that it is
 available and configure will disable libiscsi. I have a patch which
 fixes this, so QEMU for Debian testing could include libiscsi again.

 Is a feature regression like this one acceptable? Do we need additional
 testing (maybe run the build bots with --enable-xxx, so builds fail when
 xxx no longer works)?
 In general, we should try to avoid feature regressions, but it's
 going to be a case-by-case thing. In this particular instance,
 upstream libiscsi don't recommend pre-1.9 for production use
 (as the commit message documents), and I don't think we would
 be doing our users any favours by allowing them to build something
 that's likely to be broken. We should of course flag up this sort of
 minimum version of our dependencies has been bumped info in
 the release notes.

I will update the Wiki.

Peter


 I think that does QEMU still build with all the features we need on
 our distro? has to be testing done by the people who package and
 maintain QEMU for each distro -- they're the only people who know
 which configurations options they enable and which baseline versions
 of their distro they still care about packaging mainline QEMU for.

 thanks
 -- PMM

Re: [Qemu-devel] [PATCH 1/2] block: Add Error argument to bdrv_refresh_limits()

2014-07-16 Thread Eric Blake

On 07/16/2014 09:48 AM, Kevin Wolf wrote:
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  block.c   | 33 +++--
  block/iscsi.c |  3 +--
  block/qcow2.c |  4 +---
  block/qed.c   |  4 +---
  block/raw-posix.c |  4 +---
  block/raw_bsd.c   |  3 +--
  block/stream.c|  2 +-
  block/vmdk.c  |  4 +---
  include/block/block.h |  2 +-
  include/block/block_int.h |  2 +-
  10 files changed, 32 insertions(+), 29 deletions(-)
 

  /* Take some limits from the children as a default */
  if (bs-file) {
 -bdrv_refresh_limits(bs-file);
 +bdrv_refresh_limits(bs-file, local_err);
 +if (local_err) {
 +error_propagate(errp, local_err);
 +return;
 +}

Nice that you are no longer ignoring failure from the child.


 @@ -1154,7 +1167,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
 BlockDriverState *backing_hd)
  bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_COMMIT,
  bs-backing_blocker);
  out:
 -bdrv_refresh_limits(bs);
 +bdrv_refresh_limits(bs, NULL);
  }
  
  /*
 @@ -1778,7 +1791,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
BDRV_O_CACHE_WB);
  reopen_state-bs-read_only = !(reopen_state-flags  BDRV_O_RDWR);
  
 -bdrv_refresh_limits(reopen_state-bs);
 +bdrv_refresh_limits(reopen_state-bs, NULL);

 +++ b/block/stream.c
 @@ -76,7 +76,7 @@ static void close_unused_images(BlockDriverState *top, 
 BlockDriverState *base,
  bdrv_unref(unused);
  }
  
 -bdrv_refresh_limits(top);
 +bdrv_refresh_limits(top, NULL);
  }
  

Should these three callers be concerned about failure?  If so, would
error_abort be better than NULL?  But as for this patch, you are
preserving existing semantics, so you could save it for a later patch.

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] raw-posix: Fail gracefully if no working alignment is found

2014-07-16 Thread Eric Blake

On 07/16/2014 09:48 AM, Kevin Wolf wrote:
 If qemu couldn't find out what O_DIRECT alignment to use with a given
 file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c
 and confuse users. This adds a more descriptive error message for such
 cases.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  block/raw-posix.c | 35 +++
  1 file changed, 27 insertions(+), 8 deletions(-)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

On Wed, Jul 16, 2014 at 5:24 PM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
 On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti mtosa...@redhat.com 
 wrote:
  On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
  On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini pbonz...@redhat.com 
  wrote:
   Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
  
   On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti mtosa...@redhat.com
   wrote:
  
   On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
  
   On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov and...@xdel.ru
   wrote:
  
   On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah amit.s...@redhat.com
   wrote:
  
   On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
  
   Hello,
  
   the issue is not specific to the iothread code because generic
   virtio-blk also hangs up:
  
  
   Do you know which version works well?  If you could bisect, 
   that'll
   help a lot.
  
   Thanks,
   Amit
  
  
   Hi,
  
   2.0 works definitely well. I`ll try to finish bisection today, 
   though
   every step takes about 10 minutes to complete.
  
  
   Yay! It is even outside of virtio-blk.
  
   commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
   Author: Marcelo Tosatti mtosa...@redhat.com
   Date:   Tue Jun 3 13:34:48 2014 -0300
  
   kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec
   calculation
  
   Ensure proper env-tsc value for kvmclock_current_nsec 
   calculation.
  
   Reported-by: Marcin Gibuła m.gib...@beyond.pl
   Cc: qemu-sta...@nongnu.org
   Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
   Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  
  
   Andrey,
  
   Can you please provide instructions on how to create reproducible
   environment?
  
   The following patch is equivalent to the original patch, for
   the purposes of fixing the kvmclock problem.
  
   Perhaps it becomes easier to spot the reason for the hang you are
   experiencing.
  
  
   diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
   index 272a88a..feb5fc5 100644
   --- a/hw/i386/kvm/clock.c
   +++ b/hw/i386/kvm/clock.c
   @@ -17,7 +17,6 @@
#include qemu/host-utils.h
#include sysemu/sysemu.h
#include sysemu/kvm.h
   -#include sysemu/cpus.h
#include hw/sysbus.h
#include hw/kvm/clock.h
  
   @@ -66,7 +65,6 @@ static uint64_t 
   kvmclock_current_nsec(KVMClockState *s)
  
cpu_physical_memory_read(kvmclock_struct_pa, time, 
   sizeof(time));
  
   -assert(time.tsc_timestamp = migration_tsc);
delta = migration_tsc - time.tsc_timestamp;
if (time.tsc_shift  0) {
delta = -time.tsc_shift;
   @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void 
   *opaque,
   int running,
if (s-clock_valid) {
return;
}
   -
   -cpu_synchronize_all_states();
ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data);
if (ret  0) {
fprintf(stderr, KVM_GET_CLOCK failed: %s\n,
   strerror(ret));
   diff --git a/migration.c b/migration.c
   index 8d675b3..34f2325 100644
   --- a/migration.c
   +++ b/migration.c
   @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)

   qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
old_vm_running = runstate_is_running();
  
   +cpu_synchronize_all_states();
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
if (ret = 0) {
qemu_file_set_rate_limit(s-file, INT64_MAX);
  
  
   It could also be useful to apply the above patch _and_ revert
   a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
  
   Paolo
 
  Yes, it solved the issue for me! (it took much time to check because
  most of country` debian mirrors went inconsistent by some reason)
 
  Also trivial addition:
 
  diff --git a/migration.c b/migration.c
  index 34f2325..65d1c88 100644
  --- a/migration.c
  +++ b/migration.c
  @@ -25,6 +25,7 @@
   #include qemu/thread.h
   #include qmp-commands.h
   #include trace.h
  +#include sysemu/cpus.h
 
  And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
 
  That is, test with a stock qemu.git tree and the patch sent today,
  on this thread, to move cpu_synchronize_all_states ?
 
 

 The main reason for things to work for me is a revert of
 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
 patches. I had tested two cases, with Alexander`s patch completely
 reverted plus suggestion from Marcelo and only with revert 9b178682
 plug same suggestion. The difference is that the until Alexander`
 patch is not reverted, live migration is always failing by the timeout
 value, and when reverted migration always succeeds in 8-10 seconds.
 Appropriate

[Qemu-devel] [PATCH V3 1/5] docs: Specification for the image fuzzer

'Overall fuzzer requirements' chapter contains the current product vision and
features done and to be done. This chapter is still in progress.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/docs/image-fuzzer.txt | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt

diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt 
b/tests/image-fuzzer/docs/image-fuzzer.txt
new file mode 100644
index 000..c1594a0
--- /dev/null
+++ b/tests/image-fuzzer/docs/image-fuzzer.txt
@@ -0,0 +1,239 @@
+# Specification for the fuzz testing tool
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+
+Image fuzzer
+
+
+Description
+---
+
+The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img
+by providing to them randomly corrupted images.
+Test images are generated from scratch and have valid inner structure with some
+elements, e.g. L1/L2 tables, having random invalid values.
+
+
+Test runner
+---
+
+The test runner generates test images, executes tests utilizing generated
+images, indicates their results and collects all test related artifacts (logs,
+core dumps, test images).
+The test means execution of all available commands under test with the same
+generated test image.
+By default, the test runner generates new tests and executes them until
+keyboard interruption. But if a test seed is specified via the '--seed' runner
+parameter, then only one test with this seed will be executed, after its finish
+the runner will exit.
+
+The runner uses an external image fuzzer to generate test images. An image
+generator should be specified as a mandatory parameter of the test runner.
+Details about interactions between the runner and fuzzers see Module
+interfaces.
+
+The runner activates generation of core dumps during test executions, but it
+assumes that core dumps will be generated in the current working directory.
+For comprehensive test results, please, set up your test environment
+properly.
+
+Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from
+environment variables. If the environment check fails the runner will
+use SUTs installed in system paths.
+qemu-img is required for creation of backing files, so it's mandatory to set
+the related environment variable if it's not installed in the system path.
+For details about environment variables see qemu-iotests/check.
+
+The runner accepts via the '--config' argument JSON objects with lists of
+fields expected to be fuzzed, e.g.
+
+   '[[feature_name_table], [header, l1_table_offset]]'
+
+Each sublist can be have one or two strings defining image structure elements.
+In the latter case a parent element should be placed on the first position,
+and a field name on the second one.
+
+The runner accepts lists of commands under test as JSON objects via
+the '--command' argument. Each command is a list containing a SUT and all its
+arguments, e.g.
+
+   runner.py -c '[[qemu-io, $test_img, -c, write $off $len]]'
+ /tmp/test ../qcow2
+
+For variable arguments next aliases can be used:
+- $test_img for a fuzzed img
+- $off for an offset in the fuzzed image
+- $len for a data size
+
+Values for last two aliases will be generated based on the size of the virtal
+disk in the fuzzed image.
+In case when no commands are specified the runner will execute commands from
+the default list:
+- qemu-img check
+- qemu-img info
+- qemu-img convert
+- qemu-io -c read
+- qemu-io -c write
+- qemu-io -c aio_read
+- qemu-io -c aio_write
+- qemu-io -c flush
+- qemu-io -c discard
+- qemu-io -c truncate
+
+
+Qcow2 image generator
+-
+
+The 'qcow2' generator is a Python package providing 'create_image' method as
+a single public API. See details in 'Test runner/image fuzzer' in 'Module
+interfaces'.
+
+Qcow2 contains two submodules: fuzz.py and layout.py.
+
+'fuzz.py' contains all fuzzing functions, one per image field. It's assumed
+that after code analysis every field will have own constraints for its value.
+For now only universal potentially dangerous values are used, e.g. type limits
+for integers or unsafe symbols as '%s' for strings. For bitmasks random amount
+of bits are set to ones. All fuzzed values are

[Qemu-devel] [PATCH V3 5/5] package: Public API for image-fuzzer/runner/runner.py

__init__.py provides the public API required by the test runner

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/__init__.py | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tests/image-fuzzer/qcow2/__init__.py

diff --git a/tests/image-fuzzer/qcow2/__init__.py 
b/tests/image-fuzzer/qcow2/__init__.py
new file mode 100644
index 000..e2ebe19
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/__init__.py
@@ -0,0 +1 @@
+from layout import create_image
-- 
1.9.3

[Qemu-devel] [PATCH V3 4/5] layout: Generator of fuzzed qcow2 images

The layout submodule of the qcow2 package creates a random valid image,
randomly selects some amount of its fields, fuzzes them and write the fuzzed
image to the file. Fuzzing process can be controlled by external configuration.

Now only a header, header extensions and a backing file name are generated.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/layout.py | 359 +
 1 file changed, 359 insertions(+)
 create mode 100644 tests/image-fuzzer/qcow2/layout.py

diff --git a/tests/image-fuzzer/qcow2/layout.py 
b/tests/image-fuzzer/qcow2/layout.py
new file mode 100644
index 000..f475639
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/layout.py
@@ -0,0 +1,359 @@
+# Generator of fuzzed qcow2 images
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import random
+import struct
+import fuzz
+
+MAX_IMAGE_SIZE = 10*2**20
+# Standard sizes
+UINT32_S = 4
+UINT64_S = 8
+
+
+class Field(object):
+Atomic image element (field)
+
+The class represents an image field as quadruple of a data format
+of value necessary for its packing to binary form, an offset from
+the beginning of the image, a value and a name.
+
+The field can be iterated as a list [format, offset, value]
+
+__slots__ = ('fmt', 'offset', 'value', 'name')
+
+def __init__(self, fmt, offset, val, name):
+self.fmt = fmt
+self.offset = offset
+self.value = val
+self.name = name
+
+def __iter__(self):
+return (x for x in [self.fmt, self.offset, self.value])
+
+def __repr__(self):
+return Field(fmt='%s', offset=%d, value=%s, name=%s) % \
+(self.fmt, self.offset, str(self.value), self.name)
+
+
+class FieldsList(object):
+List of fields
+
+The class allows access to a field in the list by its name and joins
+several list in one via in-place addition
+
+def __init__(self, meta_data=None):
+if meta_data is None:
+self.data = []
+else:
+self.data = [Field(f[0], f[1], f[2], f[3])
+ for f in meta_data]
+
+def __getitem__(self, name):
+return [x for x in self.data if x.name == name]
+
+def __iter__(self):
+return (x for x in self.data)
+
+def __iadd__(self, other):
+self.data += other.data
+return self
+
+def __len__(self):
+return len(self.data)
+
+
+class Image(object):
+ Qcow2 image object
+
+This class allows to create qcow2 images with random valid structures and
+values, fuzz them via external qcow2.fuzz module and write the result to
+a file
+
+@staticmethod
+def _size_params():
+Generate a random image size aligned to a random correct
+cluster size
+
+cluster_bits = random.randrange(9, 21)
+cluster_size = 1  cluster_bits
+img_size = random.randrange(0, MAX_IMAGE_SIZE + 1,
+cluster_size)
+return [cluster_bits, img_size]
+
+@staticmethod
+def _header(cluster_bits, img_size, backing_file_name=None):
+Generate a random valid header
+meta_header = [
+['4s', 0, QFI\xfb, 'magic'],
+['I', 4, random.randint(2, 3), 'version'],
+['Q', 8, 0, 'backing_file_offset'],
+['I', 16, 0, 'backing_file_size'],
+['I', 20, cluster_bits, 'cluster_bits'],
+['Q', 24, img_size, 'size'],
+['I', 32, 0, 'crypt_method'],
+['I', 36, 0, 'l1_size'],
+['Q', 40, 0, 'l1_table_offset'],
+['Q', 48, 0, 'refcount_table_offset'],
+['I', 56, 0, 'refcount_table_clusters'],
+['I', 60, 0, 'nb_snapshots'],
+['Q', 64, 0, 'snapshots_offset'],
+['Q', 72, 0, 'incompatible_features'],
+['Q', 80, 0, 'compatible_features'],
+['Q', 88, 0, 'autoclear_features'],
+# Only refcount_order = 4 is supported by current (07.2014)
+# implementaation of QEMU
+['I', 96, 4, 'refcount_order'],
+['I', 100, 0, 'header_length']
+]
+v_header = FieldsList(meta_header)
+
+if v_header['version'][0].value == 2:
+v_header['header_length'][0].value = 72
+else:
+

[Qemu-devel] [PATCH V3 3/5] fuzz: Fuzzing functions for qcow2 images

The fuzz submodule of the qcow2 image generator contains fuzzing functions for
image fields.
Each fuzzing function contains list of constraints and a call of a helper
function that randomly selects a fuzzed value satisfied to one of constraints.
For now constraints include only known as invalid or potentially dangerous
values. But after investigation of code coverage by fuzz tests they will be
expanded by heuristic values based on inner checks and flows of a program
under test.

Now fuzzing of a header, header extensions and a backing file name is
supported.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/fuzz.py | 336 +++
 1 file changed, 336 insertions(+)
 create mode 100644 tests/image-fuzzer/qcow2/fuzz.py

diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py
new file mode 100644
index 000..b576259
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/fuzz.py
@@ -0,0 +1,336 @@
+# Fuzzing functions for qcow2 fields
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import random
+
+
+UINT32 = 0x
+UINT64 = 0x
+# Most significant bit orders
+UINT32_M = 31
+UINT64_M = 63
+# Fuzz vectors
+UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1,
+UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32]
+UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4,
+   UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1,
+   UINT64]
+STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x',
+'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n',
+'%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p',
+'%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C%S%08x%%',
+'%s x 129', '%x x 257']
+
+
+def random_from_intervals(intervals):
+Select a random integer number from the list of specified intervals
+
+Each interval is a tuple of lower and upper limits of the interval. The
+limits are included. Intervals in a list should not overlap.
+
+total = reduce(lambda x, y: x + y[1] - y[0] + 1, intervals, 0)
+r = random.randint(0, total-1) + intervals[0][0]
+temp = zip(intervals, intervals[1:])
+for x in temp:
+r = r + (r  x[0][1])*(x[1][0] - x[0][1] - 1)
+return r
+
+
+def random_bits(bit_ranges):
+Generate random binary mask with ones in the specified bit ranges
+
+Each bit_ranges is a list of tuples of lower and upper limits of bit
+positions will be fuzzed. The limits are included. Random amount of bits
+in range limits will be set to ones. The mask is returned in decimal
+integer format.
+
+bit_numbers = []
+# Select random amount of random positions in bit_ranges
+for rng in bit_ranges:
+bit_numbers += random.sample(range(rng[0], rng[1] + 1),
+ random.randint(0, rng[1] - rng[0] + 1))
+val = 0
+# Set bits on selected possitions to ones
+for bit in bit_numbers:
+val |= 1  bit
+return val
+
+
+def truncate_string(strings, length):
+Return strings truncated to specified length
+if type(strings) == list:
+return [s[:length] for s in strings]
+else:
+return strings[:length]
+
+
+def int_validator(current, intervals):
+Return a random value from intervals not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+val = random_from_intervals(intervals)
+if val == current:
+return int_validator(current, intervals)
+else:
+return val
+
+
+def bit_validator(current, bit_ranges):
+Return a random bit mask not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+
+val = random_bits(bit_ranges)
+if val == current:
+return bit_validator(current, bit_ranges)
+else:
+return val
+
+
+def string_validator(current, strings):
+Return a random string value from the list not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+val = random.choice(strings)
+if val == current:
+return string_validator(current,

Re: [Qemu-devel] [PATCH v6 0/3] s390: Support for Hotplug of Standby Memory

2014-07-16 Thread Matthew Rosato

On 06/30/2014 10:00 AM, Matthew Rosato wrote:
 This patchset adds support in s390 for a pool of standby memory,
 which can be set online/offline by the guest (ie, via chmem).
 The standby pool of memory is allocated as the difference between 
 the initial memory setting and the maxmem setting.
 As part of this work, additional results are provided for the 
 Read SCP Information SCLP, and new implentation is added for the 
 Read Storage Element Information, Attach Storage Element, 
 Assign Storage and Unassign Storage SCLPs, which enables the s390 
 guest to manipulate the standby memory pool.
 
 This patchset is based on work originally done by Jeng-Fang (Nick)
 Wang.
 
 Sample qemu command snippet:
 
 qemu -machine s390-ccw-virtio  -m 1024M,maxmem=2048M,slots=32 -enable-kvm
 
 This will allocate 1024M of active memory, and another 1024M 
 of standby memory.  Example output from s390-tools lsmem:
 =
 0x-0x0fff256  online   no 0-127
 0x1000-0x1fff256  online   yes128-255
 0x2000-0x3fff512  online   no 256-511
 0x4000-0x7fff   1024  offline  -  512-1023
 
 Memory device size  : 2 MB
 Memory block size   : 256 MB
 Total online memory : 1024 MB
 Total offline memory: 1024 MB
 
 
 The guest can dynamically enable part or all of the standby pool 
 via the s390-tools chmem, for example:
 
 chmem -e 512M
 
 And can attempt to dynamically disable:
 
 chmem -d 512M
 

Ping...

 
 Changes for v6:
  * Fix in sclp.h - DeviceState parent -- SysBusDevice parent 
in struct sclpMemoryHotplugDev.
  * Fix in assign_storage - int this_subregion_size, should 
be uint64_t.
  * Added information on how to test in the cover letter.  
 
 Changes for v5:
  * Since ACPI memory hotplug is now in, removed Igor's patches 
from this set.
  * Updated sclp.c to use object_resolve_path() instead of 
object_property_find().
 
 Changes for v4:
  * Remove initialization code from get_sclp_memory_hotplug_dev()
and place in its own function, init_sclp_memory_hotplug_dev().
  * Add hit to qemu-options.hx to note the fact that the memory 
size specified via -m might be forced to a boundary.
  * Account for the legacy s390 machine, which does not support 
memory hotplug.
  * Fix a bug in sclp.c - Change memory hotplug device parent to 
sysbus.
  * Pulled latest version of Igor's patch. 
 
 Matthew Rosato (3):
   sclp-s390: Add device to manage s390 memory hotplug
   virtio-ccw: Include standby memory when calculating storage increment
   sclp-s390: Add memory hotplug SCLPs
 
  hw/s390x/s390-virtio-ccw.c |   46 +--
  hw/s390x/sclp.c|  289 
 +++-
  include/hw/s390x/sclp.h|   20 +++
  qemu-options.hx|3 +-
  target-s390x/cpu.h |   18 +++
  target-s390x/kvm.c |5 +
  6 files changed, 366 insertions(+), 15 deletions(-)

[Qemu-devel] [Bug 1318830] Re: High CPU usage on windows virtual machine

2014-07-16 Thread Serge Hallyn

Hi,

could you please try adding hyperv to the vm options, as shown here:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308341/comments/10

Please let us know if that helps.

** Changed in: qemu (Ubuntu)
   Status: New = Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1318830

Title:
  High CPU usage on windows virtual machine

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Incomplete

Bug description:
  I got Ubuntu 14.04, with Qemu 2.0 and moved my windows VM to this new box, 
and made sure that what this article indicates was achieved
  https://www.kraxel.org/blog/2014/03/qemu-and-usb-tablet-cpu-consumtion/
  I can attest that it works following the instructions, erasing the registry, 
etc.
  Unfortunately, with 4 cpus as below, I still see 60% CPU outside as shown by 
Top versus 0% CPU inside. My Kernel is 3.15.0-031500rc4-generic
  If some developer wants to log in and take a look, I am happy to help. The 
box is not in production and I take full responsibility. Until this is solved, 
KVM is not going to compete with Hyper-V or Vmware.  Basically KVM is not 
suitable for the Enterprise as of yet.

  qemu-system-x86_64 -enable-kvm -name Production -S -machine pc-
  i440fx-2.0,accel=kvm,usb=off -cpu
  
kvm64,+rdtscp,+pdpe1gb,+x2apic,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed,hv_vapic,hv_spinlocks=0xfff
  -m 4024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid
  e8701c5c-b542-0199-fd2a-1047df24770e -no-user-config -nodefaults
  -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/Production.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
  -no-shutdown -boot strict=on -device piix3-usb-
  uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/var/lib/libvirt/images/Production.img,if=none,id=drive-virtio-
  disk0,format=raw -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31
  -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=00:16:3a:d2:cd:ea,bus=pci.0,addr=0x3
  -netdev tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net-
  pci,netdev=hostnet1,id=net1,mac=52:54:00:70:fe:54,bus=pci.0,addr=0x4
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device
  intel-hda,id=sound0,bus=pci.0,addr=0x5 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x7

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1318830/+subscriptions

[Qemu-devel] pause and restore function before and after QEMU Live Snapshot

2014-07-16 Thread Yuanzhen Gu

Hi folks,


I am going to make a patch, and need to find the pause and thaw(restore)
function before and after taking Live QEMU Snapshot respectively.


 Basically, I'm using # (qemu) snapshot_blkdev blockX
snapshot-fileformat taking snapshot,
http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when
giving command inside QMP.


 Does anyone know how to find the pause (freeze) and restore(thaw) function
before and after taking snapshot? Or anyway using snapshot_blkdev command
outside QEMU console? Thanks a lot in advance!


 Best,

Yuanzhen


Best,
Yuanzhen

Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot

2014-07-16 Thread Eric Blake

On 07/16/2014 03:13 PM, Yuanzhen Gu wrote:
 Hi folks,
 
 
 I am going to make a patch, and need to find the pause and thaw(restore)
 function before and after taking Live QEMU Snapshot respectively.

Libvirt has the ability to do just that, when taking external disk-only
snapshots.  You can turn on libvirt debugging to trace what QMP/agent
commands are sent during the overall snapshot operation.

 
 
  Basically, I'm using # (qemu) snapshot_blkdev blockX
 snapshot-fileformat taking snapshot,
 http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when
 giving command inside QMP.
 
 
  Does anyone know how to find the pause (freeze) and restore(thaw) function
 before and after taking snapshot? Or anyway using snapshot_blkdev command
 outside QEMU console? Thanks a lot in advance!

You have to coordinate multiple commands: freeze to the guest agent,
then snapshot to QMP, then thaw to the guest agent.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot

On Thu, Jul 17, 2014 at 1:18 AM, Eric Blake ebl...@redhat.com wrote:
 On 07/16/2014 03:13 PM, Yuanzhen Gu wrote:
 Hi folks,


 I am going to make a patch, and need to find the pause and thaw(restore)
 function before and after taking Live QEMU Snapshot respectively.

 Libvirt has the ability to do just that, when taking external disk-only
 snapshots.  You can turn on libvirt debugging to trace what QMP/agent
 commands are sent during the overall snapshot operation.



  Basically, I'm using # (qemu) snapshot_blkdev blockX
 snapshot-fileformat taking snapshot,
 http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work when
 giving command inside QMP.


  Does anyone know how to find the pause (freeze) and restore(thaw) function
 before and after taking snapshot? Or anyway using snapshot_blkdev command
 outside QEMU console? Thanks a lot in advance!

 You have to coordinate multiple commands: freeze to the guest agent,
 then snapshot to QMP, then thaw to the guest agent.

 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org


Just 2c:

fsfreeze will not succeed if there are any container instances, for
example Docker, running in the VM on same filesystem. This exact case
can be fixed by extending agent` logic but it looks that the putting a
note is enough.

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcin Gibuła


Tested on iscsi pool, though there are no-cache requirement and rbd
with disabled cache may survive one migration but iscsi backend hangs
always. As it was before, just rolling back problematic commit fixes
the problem and adding cpu_synchronize_all_states to migration.c has
no difference at a glance in a VM` behavior. The problem consist at
least two separate ones: the current hang and behavior with the
unreverted patch from agraf - last one causes live migration with
writeback cache to fail, cache=none works well in any variant which
survives first condition. Marcin, would you mind to check the current
state of the problem on your environments in a spare time? It is
probably easier to reproduce on iscsi because of way smaller time
needed to set it up, command line and libvirt config attached
(v2.1.0-rc2 plus iscsi-1.11.0).


Ok, but what exacly do you want me to test?

Just to avoid any confusion, originally there were two problems with 
kvmclock:


1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when 
clock drift (?) caused kvmclock in guest to report time in past which 
caused guest kernel to hang. This is hard to reproduce reliably 
(probably as it requires long time for drift to accumulate).


2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression 
caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during 
non-migration operations (drive-mirror + pivot), which also caused guest 
kernel to hang. This is trival to reproduce.


I'm using both of them applied on top of 2.0 in production and have no 
problems with them. I'm using NFS exclusively with cache=none.


So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no 
extra patches applied or reverted, on VM that is running fio, am I correct?


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

On Thu, Jul 17, 2014 at 1:28 AM, Marcin Gibuła m.gib...@beyond.pl wrote:
 Tested on iscsi pool, though there are no-cache requirement and rbd
 with disabled cache may survive one migration but iscsi backend hangs
 always. As it was before, just rolling back problematic commit fixes
 the problem and adding cpu_synchronize_all_states to migration.c has
 no difference at a glance in a VM` behavior. The problem consist at
 least two separate ones: the current hang and behavior with the
 unreverted patch from agraf - last one causes live migration with
 writeback cache to fail, cache=none works well in any variant which
 survives first condition. Marcin, would you mind to check the current
 state of the problem on your environments in a spare time? It is
 probably easier to reproduce on iscsi because of way smaller time
 needed to set it up, command line and libvirt config attached
 (v2.1.0-rc2 plus iscsi-1.11.0).


 Ok, but what exacly do you want me to test?

 Just to avoid any confusion, originally there were two problems with
 kvmclock:

 1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when clock
 drift (?) caused kvmclock in guest to report time in past which caused guest
 kernel to hang. This is hard to reproduce reliably (probably as it requires
 long time for drift to accumulate).

 2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression caused
 by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during
 non-migration operations (drive-mirror + pivot), which also caused guest
 kernel to hang. This is trival to reproduce.

 I'm using both of them applied on top of 2.0 in production and have no
 problems with them. I'm using NFS exclusively with cache=none.

 So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
 extra patches applied or reverted, on VM that is running fio, am I correct?


Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.

 --
 mg

Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot

2014-07-16 Thread Yuanzhen Gu

Thank you very much, Eric!

So do you mean, if I turn on libvirt debugging to trace QMP/agent commands,
I can find the pause and thaw functions in the source code, is it right ?

and is there any way that I freeze and thaw outside guest agent via command
line?   or profiling tools suggest for freeze and thaw in guest agent?
thanks a lot!

Best,
Yuanzhen


On Wed, Jul 16, 2014 at 5:18 PM, Eric Blake ebl...@redhat.com wrote:

 On 07/16/2014 03:13 PM, Yuanzhen Gu wrote:
  Hi folks,
 
 
  I am going to make a patch, and need to find the pause and thaw(restore)
  function before and after taking Live QEMU Snapshot respectively.

 Libvirt has the ability to do just that, when taking external disk-only
 snapshots.  You can turn on libvirt debugging to trace what QMP/agent
 commands are sent during the overall snapshot operation.


So do you mean, if I turn on libvirt debugging to trace QMP/agent
commands, I can find the pause and thaw functions in the source code, is it
right ?


 
 
   Basically, I'm using # (qemu) snapshot_blkdev blockX
  snapshot-fileformat taking snapshot,
  http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work
 when
  giving command inside QMP.
 
 
   Does anyone know how to find the pause (freeze) and restore(thaw)
 function
  before and after taking snapshot? Or anyway using snapshot_blkdev command
  outside QEMU console? Thanks a lot in advance!

 You have to coordinate multiple commands: freeze to the guest agent,
 then snapshot to QMP, then thaw to the guest agent.

 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org

Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot

2014-07-16 Thread Yuanzhen Gu

Thanks very much Andrey!

how can I figure out whether fsfreeze not succeed while taking snapshot, if
I run a simple application inside VM, say mysql? thanks!

Best,
Yuanzhen


On Wed, Jul 16, 2014 at 5:25 PM, Andrey Korolyov and...@xdel.ru wrote:

 On Thu, Jul 17, 2014 at 1:18 AM, Eric Blake ebl...@redhat.com wrote:
  On 07/16/2014 03:13 PM, Yuanzhen Gu wrote:
  Hi folks,
 
 
  I am going to make a patch, and need to find the pause and thaw(restore)
  function before and after taking Live QEMU Snapshot respectively.
 
  Libvirt has the ability to do just that, when taking external disk-only
  snapshots.  You can turn on libvirt debugging to trace what QMP/agent
  commands are sent during the overall snapshot operation.
 
 
 
   Basically, I'm using # (qemu) snapshot_blkdev blockX
  snapshot-fileformat taking snapshot,
  http://wiki.qemu.org/Features/Snapshotsmy profile tool didn't work
 when
  giving command inside QMP.
 
 
   Does anyone know how to find the pause (freeze) and restore(thaw)
 function
  before and after taking snapshot? Or anyway using snapshot_blkdev
 command
  outside QEMU console? Thanks a lot in advance!
 
  You have to coordinate multiple commands: freeze to the guest agent,
  then snapshot to QMP, then thaw to the guest agent.
 
  --
  Eric Blake   eblake redhat com+1-919-301-3266
  Libvirt virtualization library http://libvirt.org
 

 Just 2c:

 fsfreeze will not succeed if there are any container instances, for
 example Docker, running in the VM on same filesystem. This exact case
 can be fixed by extending agent` logic but it looks that the putting a
 note is enough.

Re: [Qemu-devel] pause and restore function before and after QEMU Live Snapshot