Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Pekka Enberg
On Sun, Sep 25, 2011 at 2:11 PM, Sasha Levin levinsasha...@gmail.com wrote:
 This patch adds support for multiple network devices. The command line syntax
 changes to the following:

        --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]

 Each of the parameters is optional, and the config defaults to a TAP based
 networking with a sequential MAC.

 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Seems reasonable. Asias, is v2 OK to merge?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote:
 On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote:
  On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote:
   The code to generate basic SSDT code isn't that difficult (see
   build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
   patch the DSDT versus just generating the necessary blocks in an SSDT?
  
  I don't really care whether the code is in DSDT or SSDT,
  IMO there isn't much difference between build_ssdt and patching:
  main reason is build_ssdt uses offsets hardcoded to a specific binary
  (ssdt_proc and SD_OFFSET_* ) while I used
  a script to extract offsets.
 
 Yes - your script to extract the offsets is nice.

If you still have doubts,
it might make sense to merge just patch 1 -
acpi: generate and parse mixed asl/aml listing
- as the first step.
With the infrastructure in place it will be
easier to discuss the best way to use it.

  I think we should avoid relying on copy-pasted binary 
  because I see the related ASL code changing in the near future
  (with multifunction and bridge support among others).
 
 Can you expand on this?  Would multi-function and bridge support make
 patching easier than dynamic SSDT generation?

Just generic concerns:
1. We are going to have to modify DSL, so binary bytecode
would be hard to maintain. Specifically IMO the more
work can be done compile-time, the better, both iasl
and my script do compile-time checks to keep runtime simple.
2. There are going to be a lot of devices so one new table
with all of them would be ok, a table per device would be
expensive.

 I'm a little leary of patching the DSDT - doubly so when the DSDT is
 creating dummy devices that are then dynamically patched out.

A small correction, my specific code only patches out methods, not whole
devices.


  (For
 example, even with your patch series there are still two devices
 defined with _ADR 1.)

So this is a bug, I think.
I am not sure whether we just want ISA and VGA always
non-removable - is there some reason that
we have hotplug_slot macros for these slots?
If no we should just remove hotplug
macros for these two slots. Gleb, Marcelo, any preferences?

If we want to keep the option of making these slots hotpluggable
from qemu, it is still easy to fix the duplication. For example,
I could put ACPI_EXTRACT tags in VGA/ISA devices without
renaming them. But the easiest way is probably by using Scope.
See patch below.


  It seems more straight-forward to just create
 the devices that are needed.
 
 -Kevin

FWIW the acpi spec does mention that it's ok to describe
unoccupied slots.



Multiple device with the same _ADR relies on unspecified
behavior of ACPI. Fix DSDT so we always have a single device per slot.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

-

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 055202b..305d2e5 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -260,8 +260,8 @@ DefinitionBlock (
 }
 
 Scope(\_SB.PCI0) {
-Device (VGA) {
- Name (_ADR, 0x0002)
+   /* VGA device */
+Scope (S2) {
  OperationRegion(PCIC, PCI_Config, Zero, 0x4)
  Field(PCIC, DWordAcc, NoLock, Preserve) {
  VEND, 32
@@ -282,13 +282,10 @@ DefinitionBlock (
  Return (0x00)
  }
  }
- Method(_RMV) { Return (0x00) }
 }
 
/* PIIX3 ISA bridge */
-Device (ISA) {
-Name (_ADR, 0x0001)
-Method(_RMV) { Return (0x00) }
+Scope (S1) {
 
 
 /* PIIX PCI to ISA irq remapping */
@@ -486,7 +483,7 @@ DefinitionBlock (
 
 /* PCI IRQs */
 Scope(\_SB) {
- Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve)
+ Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve)
  {
  PRQ0,   8,
  PRQ1,   8,

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-26 Thread Rudolf Marek

Hi all,


defined with _ADR 1.)  It seems more straight-forward to just create
the devices that are needed.


Yes something like this:

http://review.coreboot.org/gitweb?p=coreboot.git;a=blob;f=src/arch/x86/boot/acpigen.c;hb=HEAD



Thanks
Rudolf
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm] Re: Questions about duplicate memory work

2011-09-26 Thread Robin Lee Powell
On Mon, Sep 26, 2011 at 01:49:18PM +0800, Emmanuel Noobadmin wrote:
 On 9/25/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote:
 
  OK, so I've got a Linux host, and a bunch of Linux VMs.
 
  This means that the host *and* all tho VMs do their own disk
  caches/buffers and do their own swap as well.
 
 If I'm not wrong, that's why the recommended and current default
 in libvirtd is to create storage devices with no caching to remove
 one layer of duplication.

How do you do that?  I have my VMs using LVs created on the host as
their disks, but I'm open to other methods if there are significant
advantages.

  I've considered turning off swap on the VMs so all the swapping
  at least happens in *one place*; I dunno if that's best.
 
 Not sure it's a good idea. If the VM needs more working memory
 than you allocated, I think it locks up dead if there is
 insufficient swap space. At least that appears to be what happened
 to one of mine.

Good to know, thanks.

-Robin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-26 Thread Zhi Yong Wu
On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this senario.
      2.) When dd command is issued in guest, if its option bs is set to a 
 large value such as bs=1024K, the result speed will slightly bigger than 
 the limits.

 For these problems, if you have nice thought, pls let us know.:)

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c |  259 
 ---
  block.h |    1 -
  2 files changed, 248 insertions(+), 12 deletions(-)

 One general comment: What about synchronous and/or coroutine I/O
 operations? Do you think they are just not important enough to consider
 here or were they forgotten?
For sync ops, we assume that it will be converse into async mode at
some point of future, right?
For coroutine I/O, it is introduced in image driver layer, and behind
bdrv_aio_readv/writev. I think that we need not consider them, right?


 Also, do I understand correctly that you're always submitting the whole
Right, when the block timer fire, it will flush whole request queue.
 queue at once? Does this effectively enforce the limit all the time or
 will it lead to some peaks and then no requests at all for a while until
In fact, it only try to submit those enqueued request one by one. If
fail to pass the limit, this request will be enqueued again.
 the average is right again?
Yeah, it is possible. Do you better idea?

 Maybe some documentation on how it all works from a high level
 perspective would be helpful.

 diff --git a/block.c b/block.c
 index cd75183..c08fde8 100644
 --- a/block.c
 +++ b/block.c
 @@ -30,6 +30,9 @@
  #include qemu-objects.h
  #include qemu-coroutine.h

 +#include qemu-timer.h
 +#include block/blk-queue.h
 +
  #ifdef CONFIG_BSD
  #include sys/types.h
  #include sys/stat.h
 @@ -72,6 +75,13 @@ static int coroutine_fn 
 bdrv_co_writev_em(BlockDriverState *bs,
                                           QEMUIOVector *iov);
  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);

 +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 +        double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, int64_t *wait);
 +
  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
      QTAILQ_HEAD_INITIALIZER(bdrv_states);

 @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char 
 *filename, int flags,
              bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
      }

 +    /* throttling disk I/O limits */
 +    if (bs-io_limits_enabled) {
 +        bdrv_io_limits_enable(bs);
 +    }
 +
      return 0;

  unlink_and_fail:
 @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
          if (bs-change_cb)
              bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
      }
 +
 +    /* throttling disk I/O limits */
 +    if (bs-block_queue) {
 +        qemu_del_block_queue(bs-block_queue);
 +        bs-block_queue = NULL;
 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);
 +        bs-block_timer = NULL;
 +    }

 Why not io_limits_disable() instead of copying the code here?
Good point, thanks.

  }

  void bdrv_close_all(void)
 @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState 
 *bs, int64_t sector_num,
                                   BlockDriverCompletionFunc *cb, void 
 *opaque)
  {
      BlockDriver *drv = bs-drv;
 -
 +    BlockDriverAIOCB *ret;
 +    int64_t wait_time = -1;
 +printf(sector_num=%ld, nb_sectors=%d\n, sector_num, nb_sectors);

 Debugging leftover (more of them follow, won't comment on each one)
Removed.

      trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);

 -    if (!drv)
 -        return NULL;
 -    if (bdrv_check_request(bs, sector_num, nb_sectors))
 +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
          return NULL;
 +    }

 This part is unrelated.
Have changed it to original.

 +
 +    /* throttling disk read I/O */
 +    if (bs-io_limits_enabled) {
 +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) {
 +            ret = qemu_block_queue_enqueue(bs-block_queue, bs, 
 bdrv_aio_readv,
 +                           sector_num, qiov, nb_sectors, cb, opaque);
 +            printf(wait_time=%ld\n, wait_time);
 +            if (wait_time != -1) {
 +                printf(reset block timer\n);
 +                qemu_mod_timer(bs-block_timer,
 +                               wait_time + qemu_get_clock_ns(vm_clock));
 +            }
 +
 +            if (ret) {
 +                printf(ori ret is not 

Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-09-26 Thread Zhi Yong Wu
On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  Makefile.objs     |    2 +-
  block/blk-queue.c |  201 
 +
  block/blk-queue.h |   59 
  block_int.h       |   27 +++
  4 files changed, 288 insertions(+), 1 deletions(-)
  create mode 100644 block/blk-queue.c
  create mode 100644 block/blk-queue.h

 diff --git a/Makefile.objs b/Makefile.objs
 index 26b885b..5dcf456 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
 dmg.o bochs.o vpc.o vv
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
 qcow2-cache.o
  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
  block-nested-y += qed-check.o
 -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
 blk-queue.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 diff --git a/block/blk-queue.c b/block/blk-queue.c
 new file mode 100644
 index 000..adef497
 --- /dev/null
 +++ b/block/blk-queue.c
 @@ -0,0 +1,201 @@
 +/*
 + * QEMU System Emulator queue definition for block layer
 + *
 + * Copyright (c) IBM, Corp. 2011
 + *
 + * Authors:
 + *  Zhi Yong Wu  wu...@linux.vnet.ibm.com
 + *  Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
 + * of this software and associated documentation files (the Software), to 
 deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included 
 in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include block_int.h
 +#include block/blk-queue.h
 +#include qemu-common.h
 +
 +/* The APIs for block request queue on qemu block layer.
 + */
 +
 +struct BlockQueueAIOCB {
 +    BlockDriverAIOCB common;
 +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
 +    BlockRequestHandler *handler;
 +    BlockDriverAIOCB *real_acb;
 +
 +    int64_t sector_num;
 +    QEMUIOVector *qiov;
 +    int nb_sectors;
 +};

 The idea is that each request is first queued on the QTAILQ, and at some
 point it's removed from the queue and gets a real_acb. But it never has
 both at the same time. Correct?
NO. if block I/O throttling is enabled and I/O rate at runtime exceed
this limits, this request will be enqueued.
It represents the whole lifecycle of one enqueued request.


 Can we have the basic principle of operation spelled out as a comment
 somewhere near the top of the file?
OK, i will.

 +
 +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
 +
 +struct BlockQueue {
 +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
 +    bool req_failed;
 +    bool flushing;
 +};

 I find req_failed pretty confusing. Needs documentation at least, but
 most probably also a better name.
OK. request_has_failed?

 +
 +static void qemu_block_queue_dequeue(BlockQueue *queue,
 +                                     BlockQueueAIOCB *request)
 +{
 +    BlockQueueAIOCB *req;
 +
 +    assert(queue);
 +    while (!QTAILQ_EMPTY(queue-requests)) {
 +        req = QTAILQ_FIRST(queue-requests);
 +        if (req == request) {
 +            QTAILQ_REMOVE(queue-requests, req, entry);
 +            break;
 +        }
 +    }
 +}

 Is it just me or is this an endless loop if the request isn't the first
 element in the list?
queue-requests is only used to store requests which exceed the limits.
Why is the request not the first evlement?

 +
 +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
 +{
 +    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
 +    if (request-real_acb) {
 +        bdrv_aio_cancel(request-real_acb);
 +    } else {
 +        assert(request-common.bs-block_queue);
 +        qemu_block_queue_dequeue(request-common.bs-block_queue,
 +                                 request);
 +    }
 +
 +    qemu_aio_release(request);
 +}
 +
 +static AIOPool 

Re: [kvm] Re: Questions about duplicate memory work

2011-09-26 Thread Emmanuel Noobadmin
On 9/26/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote:
 On Mon, Sep 26, 2011 at 01:49:18PM +0800, Emmanuel Noobadmin wrote:
 On 9/25/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote:
 
  OK, so I've got a Linux host, and a bunch of Linux VMs.
 
  This means that the host *and* all tho VMs do their own disk
  caches/buffers and do their own swap as well.

 If I'm not wrong, that's why the recommended and current default
 in libvirtd is to create storage devices with no caching to remove
 one layer of duplication.

 How do you do that?  I have my VMs using LVs created on the host as
 their disks, but I'm open to other methods if there are significant
 advantages.

It's unrelated to what you're actually using as the disks, whether
file or block devices like LVs. I think it just makes KVM tell the
host not to cache I/O done on the storage device. To do so just use
the option cache=none when specify the storage. e.g. from
http://www.linux-kvm.org/page/Tuning_KVM
 qemu -drive file=/dev/mapper/ImagesVolumeGroup-Guest1,cache=none,if=virtio

or edit the cache attribute in the libvirt domain XML file if you're using that.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-26 Thread Zhi Yong Wu
On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
 On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
  On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
   Note:
        1.) When bps/iops limits are specified to a small value such as 
   511 bytes/s, this VM will hang up. We are considering how to handle 
   this senario.
  
   You can increase the length of the slice, if the request is larger than
   slice_time * bps_limit.
  Yeah, but it is a challenge for how to increase it. Do you have some nice 
  idea?
 
  If the queue is empty, and the request being processed does not fit the
  queue, increase the slice so that the request fits.
 Sorry for late reply. actually, do you think that this scenario is
 meaningful for the user?
 Since we implement this, if the user limits the bps below 512
 bytes/second, the VM can also not run every task.
 Can you let us know why we need to make such effort?

 It would be good to handle request larger than the slice.

 It is not strictly necessary, but in case its not handled, a minimum
 should be in place, to reflect maximum request size known. Being able to
 specify something which crashes is not acceptable.
HI, Marcelo,

any comments? I have post the implementation based on your suggestions






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Fix inject-nmi

2011-09-26 Thread Avi Kivity

On 09/25/2011 08:22 PM, Jan Kiszka wrote:

On 2011-09-25 16:07, Avi Kivity wrote:
  On 09/23/2011 12:31 PM, Lai Jiangshan wrote:
 Moreover: wrong indention.
  
 You know that this won't work for qemu-kvm with in-kernel irqchip? You
 may want to provide a patch for that tree, emulating the unavailable
 LINT1 injection via testing the APIC configration and then raising an
 NMI as before if it is accepted.
  

  It works in my box but the NMI is not injected through the in-kernel
  irqchip,
  I will implement it as you suggested.

  Somewhat hacky; isn't it better to test LINT1 in the kernel (and
  redefine the KVM_NMI ioctl as toggle LINT1)?

KVM_NMI is required for user space IRQ chip as well.


We could define KVM_NMI as edging the core NMI input if 
!irqchip_in_kernel, and toggling LINT1 otherwise.  Hardly nice though.


The current KVM_NMI with irqchip_in_kernel is not meaningful, since it 
doesn't obey the rules of any NMI source.



Introducing some KVM_SET_LINT1 is an option though. But emulating it for
the NMI button on older kernels sounds worthwhile nevertheless.



Perhaps this is the best option to avoid confusion.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread David Gibson
On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
 Based on the discussions over the last couple of weeks
 I have updated the device fd file layout proposal and
 tried to specify it a bit more formally.
 
 ===
 
 1.  Overview
 
   This specification describes the layout of device files
   used in the context of vfio, which gives user space
   direct access to I/O devices that have been bound to
   vfio.
 
   When a device fd is opened and read, offset 0x0 contains
   a fixed sized header followed by a number of variable length
   records that describe different characteristics
   of the device-- addressable regions, interrupts, etc.
 
   0x0  +-+-+
| magic | u32  // identifies this as a vfio
 device file
+---+ and identifies the type of bus
| version   | u32  // specifies the version of this
+---+
| flags | u32  // encodes any flags
+---+
|  dev info record 0|
|type   | u32   // type of record
|rec_len| u32   // length in bytes of record
|   |  (including record header)
|flags  | u32   // type specific flags
|...content...  |   // record content, which could
+---+   // include sub-records
|  dev info record 1|
+---+
|  dev info record N|
+---+

I really should have chimed in on this earlier, but I've been very
busy.

Um, not to put too fine a point on it, this is madness.

Yes, it's very flexible and can thereby cover a very wide range of
cases.  But it's much, much too complex.  Userspace has to parse a
complex, multilayered data structure, with variable length elements
just to get an address at which to do IO.  I can pretty much guarantee
that if we went with this, most userspace programs using this
interface would just ignore this metadata and directly map the
offsets at which they happen to know the kernel will put things for
the type of device they care about.

_At least_ for PCI, I think the original VFIO layout of each BAR at a
fixed, well known offset is much better.  Despite its limitations,
just advertising a device type ID which describes one of a few fixed
layouts would be preferable to this.  I'm still hoping, that we can do
a bit better than that.  But we should try really hard to at the very
least force the metadata into a simple array of resources each with a
fixed size record describing it, even if it means some space wastage
with occasionally-used fields.  Anything more complex than that and
userspace is just never going to use it properly.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Weird console issue.

2011-09-26 Thread Robin Lee Powell

I'm running qemu-kvm on Fedora, with Fedora VMs, normally via
libvirt/virsh.

I have serial consoles setup, like so:

serial type='pty'
  target port='0'/
/serial
console type='pty'
  target type='serial' port='0'/
/console

$ sudo virsh ttyconsole stodi
/dev/pts/3

With both PuTTY and shellinabox (web-based shell) to the master,
connecting via either virsh console *or* simple screen
/dev/pts/3, with agetty running on the VM like so: sudo
/sbin/agetty 115200 ttyS0, I get the following problem:

Everything seems to work until it asks for a password.  As soon as
that happens, any character I type causes it to respond as though
I'd hit enter.  This continues until /bin/login completes and a
new getty starts, at which point it prints the right headers and I
can enter my userid, and then it asks for a password and it breaks
again in the same way.

Obviously, this makes the console not so useful.  Any help or
suggestions would be welcome.

-Robin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm] Re: Questions about duplicate memory work

2011-09-26 Thread Mihamina Rakotomandimby

On 09/26/2011 11:15 AM, Emmanuel Noobadmin wrote:

This means that the host *and* all tho VMs do their own disk
caches/buffers and do their own swap as well.

If I'm not wrong, that's why the recommended and current default
in libvirtd is to create storage devices with no caching to remove
one layer of duplication.

How do you do that?

or edit the cache attribute in the libvirt domain XML file if you're using that.


Just saw it:
http://libvirt.org/formatdomain.html#elementsDisks
disk ... 
  driver ... cache=none ... /
  ... /
/disk

Good to know.

--
RMA.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for September 26

2011-09-26 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Thanks, Juan.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for September 26

2011-09-26 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

I should learn to read the calendar

Call is for September 27th, of course.

Sorry for any inconvenience.

 Please send in any agenda items you are interested in covering.

 Thanks, Juan.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
#secure method=pgpmime mode=sign
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Alexander Graf

Am 26.09.2011 um 09:51 schrieb David Gibson da...@gibson.dropbear.id.au:

 On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
 Based on the discussions over the last couple of weeks
 I have updated the device fd file layout proposal and
 tried to specify it a bit more formally.
 
 ===
 
 1.  Overview
 
  This specification describes the layout of device files
  used in the context of vfio, which gives user space
  direct access to I/O devices that have been bound to
  vfio.
 
  When a device fd is opened and read, offset 0x0 contains
  a fixed sized header followed by a number of variable length
  records that describe different characteristics
  of the device-- addressable regions, interrupts, etc.
 
  0x0  +-+-+
   | magic | u32  // identifies this as a vfio
 device file
   +---+ and identifies the type of bus
   | version   | u32  // specifies the version of this
   +---+
   | flags | u32  // encodes any flags
   +---+
   |  dev info record 0|
   |type   | u32   // type of record
   |rec_len| u32   // length in bytes of record
   |   |  (including record header)
   |flags  | u32   // type specific flags
   |...content...  |   // record content, which could
   +---+   // include sub-records
   |  dev info record 1|
   +---+
   |  dev info record N|
   +---+
 
 I really should have chimed in on this earlier, but I've been very
 busy.
 
 Um, not to put too fine a point on it, this is madness.
 
 Yes, it's very flexible and can thereby cover a very wide range of
 cases.  But it's much, much too complex.  Userspace has to parse a
 complex, multilayered data structure, with variable length elements
 just to get an address at which to do IO.  I can pretty much guarantee
 that if we went with this, most userspace programs using this
 interface would just ignore this metadata and directly map the
 offsets at which they happen to know the kernel will put things for
 the type of device they care about.
 
 _At least_ for PCI, I think the original VFIO layout of each BAR at a
 fixed, well known offset is much better.  Despite its limitations,
 just advertising a device type ID which describes one of a few fixed
 layouts would be preferable to this.  I'm still hoping, that we can do
 a bit better than that.  But we should try really hard to at the very
 least force the metadata into a simple array of resources each with a
 fixed size record describing it, even if it means some space wastage
 with occasionally-used fields.  Anything more complex than that and
 userspace is just never going to use it properly.

We will have 2 different types of user space. One wants to be as generic as 
possible and needs all this dynamic information. QEMU would fall into this 
category.

The other one is device specific drivers in user space. Here hardcoding might 
make sense.

For the generic interface, we need something that us as verbose as possible and 
lets us enumerate all the device properties, so we can properly map and forward 
them to the guest.

However, nothing keeps us from mapping BARs always at static offsets into the 
file. If you don't need the generic info, then you don't need it.

Also, if you can come up with an interface that does not have variable length 
descriptors but is still able to export all the required generic information, 
please send a proposal to the list :)


Alex
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] nVMX: Add KVM_REQ_IMMEDIATE_EXIT

2011-09-26 Thread Marcelo Tosatti
On Sun, Sep 25, 2011 at 11:13:06AM +0300, Nadav Har'El wrote:
 On Fri, Sep 23, 2011, Marcelo Tosatti wrote about Re: [PATCH 1/2] nVMX: Add 
 KVM_REQ_IMMEDIATE_EXIT:
  On Thu, Sep 22, 2011 at 01:52:56PM +0300, Nadav Har'El wrote:
   This patch adds a new vcpu-requests bit, KVM_REQ_IMMEDIATE_EXIT.
   This bit requests that when next entering the guest, we should run it only
   for as little as possible, and exit again.
   
   We use this new option in nested VMX: When L1 launches L2, but L0 wishes 
   L1
 ...
   @@ -5647,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_v
 }
 if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
 record_steal_time(vcpu);
   + req_immediate_exit =
   + kvm_check_request(KVM_REQ_IMMEDIATE_EXIT, vcpu);
 ...
  The immediate exit information can be lost if entry decides to bail out.
  You can do 
  
  req_immediate_exit = kvm_check_request(KVM_REQ_IMMEDIATE_EXIT)
  after preempt_disable()
  and then transfer back the bit in the bail out case in
  if (vcpu-mode == EXITING_GUEST_MODE || vcpu-requests
 
 Thanks.
 
 But thinking about this a bit, it seems to me that in my case *losing* this
 bit on a canceled entry is the correct thing to do, as turning on this bit was
 decided in the injection phase (in enable_irq_window()), and next time, if
 the reason to turn on this bit still exists (i.e., L0 has something to inject
 to L1, but L2 needs to run), we will turn it on again.

Correct, the loss is irrelevant.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-26 Thread Marcelo Tosatti
On Mon, Sep 26, 2011 at 10:04:13AM +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote:
  On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote:
   On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote:
The code to generate basic SSDT code isn't that difficult (see
build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
patch the DSDT versus just generating the necessary blocks in an SSDT?
   
   I don't really care whether the code is in DSDT or SSDT,
   IMO there isn't much difference between build_ssdt and patching:
   main reason is build_ssdt uses offsets hardcoded to a specific binary
   (ssdt_proc and SD_OFFSET_* ) while I used
   a script to extract offsets.
  
  Yes - your script to extract the offsets is nice.
 
 If you still have doubts,
 it might make sense to merge just patch 1 -
 acpi: generate and parse mixed asl/aml listing
 - as the first step.
 With the infrastructure in place it will be
 easier to discuss the best way to use it.
 
   I think we should avoid relying on copy-pasted binary 
   because I see the related ASL code changing in the near future
   (with multifunction and bridge support among others).
  
  Can you expand on this?  Would multi-function and bridge support make
  patching easier than dynamic SSDT generation?
 
 Just generic concerns:
 1. We are going to have to modify DSL, so binary bytecode
 would be hard to maintain. Specifically IMO the more
 work can be done compile-time, the better, both iasl
 and my script do compile-time checks to keep runtime simple.
 2. There are going to be a lot of devices so one new table
 with all of them would be ok, a table per device would be
 expensive.
 
  I'm a little leary of patching the DSDT - doubly so when the DSDT is
  creating dummy devices that are then dynamically patched out.
 
 A small correction, my specific code only patches out methods, not whole
 devices.

And note only the name of the method is changed to something which the
guests do not identify, its not as if the entire method is added or
removed (although IMO it would be interesting to patch out entirely with
NOPs).

   (For
  example, even with your patch series there are still two devices
  defined with _ADR 1.)
 
 So this is a bug, I think.
 I am not sure whether we just want ISA and VGA always
 non-removable - is there some reason that
 we have hotplug_slot macros for these slots?

Just so its a nice linear range.

 If no we should just remove hotplug
 macros for these two slots. Gleb, Marcelo, any preferences?

Currently they are hardcoded as not hotpluggable (as can be noted by
the RMV macros), but its nice to retrieve that information from QEMU
instead.

 If we want to keep the option of making these slots hotpluggable
 from qemu, it is still easy to fix the duplication. For example,
 I could put ACPI_EXTRACT tags in VGA/ISA devices without
 renaming them. But the easiest way is probably by using Scope.
 See patch below.
 
 
   It seems more straight-forward to just create
  the devices that are needed.
  
  -Kevin
 
 FWIW the acpi spec does mention that it's ok to describe
 unoccupied slots.
 
 
 
 Multiple device with the same _ADR relies on unspecified
 behavior of ACPI. Fix DSDT so we always have a single device per slot.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 -
 
 diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
 index 055202b..305d2e5 100644
 --- a/src/acpi-dsdt.dsl
 +++ b/src/acpi-dsdt.dsl
 @@ -260,8 +260,8 @@ DefinitionBlock (
  }
  
  Scope(\_SB.PCI0) {
 -Device (VGA) {
 - Name (_ADR, 0x0002)
 + /* VGA device */
 +Scope (S2) {
   OperationRegion(PCIC, PCI_Config, Zero, 0x4)
   Field(PCIC, DWordAcc, NoLock, Preserve) {
   VEND, 32
 @@ -282,13 +282,10 @@ DefinitionBlock (
   Return (0x00)
   }
   }
 - Method(_RMV) { Return (0x00) }
  }
  
   /* PIIX3 ISA bridge */
 -Device (ISA) {
 -Name (_ADR, 0x0001)
 -Method(_RMV) { Return (0x00) }
 +Scope (S1) {
  
  
  /* PIIX PCI to ISA irq remapping */
 @@ -486,7 +483,7 @@ DefinitionBlock (
  
  /* PCI IRQs */
  Scope(\_SB) {
 - Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve)
 + Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve)
   {
   PRQ0,   8,
   PRQ1,   8,
 
 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM updates for 3.1-rc7

2011-09-26 Thread Avi Kivity

Linus, please pull from:

  git://github.com/avikivity/kvm.git kvm-updates/3.1

to receive an emulator fix affecting shld/shrd and a softmmu issue on 
i386 hosts.


Avi Kivity (1):
  KVM: x86 emulator: fix Src2CL decode

Zhao Jin (1):
  KVM: MMU: fix incorrect return of spte

 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/mmu.c |3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Asias He
On 09/25/2011 07:11 PM, Sasha Levin wrote:
 This patch adds support for multiple network devices. The command line syntax
 changes to the following:
 
   --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]

This syntax is actually, no?

   --network/-n mode=tap,guest_ip=x.x.x.x

not

   --network/-n mode=tag guest_ip=x.x.x.x


This works for me:
$ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage  \
-n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1

This does not work for me:
$ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \
-n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1


 
 Each of the parameters is optional, and the config defaults to a TAP based
 networking with a sequential MAC.
 
 Signed-off-by: Sasha Levin levinsasha...@gmail.com
 ---
  tools/kvm/builtin-run.c |  146 
  tools/kvm/virtio/net.c  |  155 +-
  2 files changed, 191 insertions(+), 110 deletions(-)
 
 diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
 index 28dc95a..070b1b6 100644
 --- a/tools/kvm/builtin-run.c
 +++ b/tools/kvm/builtin-run.c
 @@ -87,9 +87,12 @@ static bool sdl;
  static bool balloon;
  static bool using_rootfs;
  static bool custom_rootfs;
 +static bool no_net;
  extern bool ioport_debug;
  extern int  active_console;
  extern int  debug_iodelay;
 +struct virtio_net_parameters *net_params;
 +int num_net_devices;
  
  bool do_debug_print = false;
  
 @@ -182,6 +185,89 @@ static int tty_parser(const struct option *opt, const 
 char *arg, int unset)
   return 0;
  }
  
 +static inline void str_to_mac(const char *str, char *mac)
 +{
 + sscanf(str, %hhx:%hhx:%hhx:%hhx:%hhx:%hhx,
 + mac, mac+1, mac+2, mac+3, mac+4, mac+5);
 +}
 +static int set_net_param(struct virtio_net_parameters *p, const char *param,
 + const char *val)
 +{
 + if (strcmp(param, guest_mac) == 0) {
 + str_to_mac(val, p-guest_mac);
 + } else if (strcmp(param, mode) == 0) {
 + if (!strncmp(val, user, 4)) {
 + int i;
 +
 + for (i = 0; i  num_net_devices; i++)
 + if (net_params[i].mode == NET_MODE_USER)
 + die(Only one usermode network device 
 allowed at a time);
 + p-mode = NET_MODE_USER;
 + } else if (!strncmp(val, tap, 3)) {
 + p-mode = NET_MODE_TAP;
 + } else if (!strncmp(val, none, 4)) {
 + no_net = 1;
 + return -1;
 + } else
 + die(Unkown network mode %s, please use user, tap or 
 none, network);
 + } else if (strcmp(param, script) == 0) {
 + p-script = val;
 + } else if (strcmp(param, guest_ip) == 0) {
 + p-guest_ip = val;
 + } else if (strcmp(param, host_ip) == 0) {
 + p-host_ip = val;
 + }
 +
 + return 0;
 +}
 +
 +static int netdev_parser(const struct option *opt, const char *arg, int 
 unset)
 +{
 + struct virtio_net_parameters p;
 + char *buf, *cmd = NULL, *cur = NULL;
 + bool on_cmd = true;
 +
 + if (arg) {
 + buf = strdup(arg);
 + if (buf == NULL)
 + die(Failed allocating new net buffer);
 + cur = strtok(buf, ,=);
 + }
 +
 + p = (struct virtio_net_parameters) {
 + .guest_ip   = DEFAULT_GUEST_ADDR,
 + .host_ip= DEFAULT_HOST_ADDR,
 + .script = DEFAULT_SCRIPT,
 + .mode   = NET_MODE_TAP,
 + };
 +
 + str_to_mac(DEFAULT_GUEST_MAC, p.guest_mac);
 + p.guest_mac[5] += num_net_devices;
 +
 + while (cur) {
 + if (on_cmd) {
 + cmd = cur;
 + } else {
 + if (set_net_param(p, cmd, cur)  0)
 + goto done;
 + }
 + on_cmd = !on_cmd;
 +
 + cur = strtok(NULL, ,=);
 + };
 +
 + num_net_devices++;
 +
 + net_params = realloc(net_params, num_net_devices * sizeof(*net_params));
 + if (net_params == NULL)
 + die(Failed adding new network device);
 +
 + net_params[num_net_devices - 1] = p;
 +
 +done:
 + return 0;
 +}
 +
  static int shmem_parser(const struct option *opt, const char *arg, int unset)
  {
   const u64 default_size = SHMEM_DEFAULT_SIZE;
 @@ -339,18 +425,9 @@ static const struct option options[] = {
   Kernel command line arguments),
  
   OPT_GROUP(Networking options:),
 - OPT_STRING('n', network, network, user, tap, none,
 - Network to use),
 - OPT_STRING('\0', host-ip, host_ip, a.b.c.d,
 - Assign this address to the host side networking),
 - OPT_STRING('\0', 

Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Sasha Levin
On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote:
 On 09/25/2011 07:11 PM, Sasha Levin wrote:
  This patch adds support for multiple network devices. The command
 line syntax
  changes to the following:
  
--network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]]
 [host_ip=
  [host_ip]] [guest_mac=[guest_mac]] [script=[script]]
 
 This syntax is actually, no?
 
--network/-n mode=tap,guest_ip=x.x.x.x
 
 not
 
--network/-n mode=tag guest_ip=x.x.x.x
 
 
 This works for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage  \
 -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1
 
 This does not work for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \
 -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1

Yes, with ',' ofcourse :)

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Asias He
On 09/26/2011 07:59 PM, Sasha Levin wrote:
 On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote:
 On 09/25/2011 07:11 PM, Sasha Levin wrote:
 This patch adds support for multiple network devices. The command
 line syntax
 changes to the following:

   --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]]
 [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]

 This syntax is actually, no?

--network/-n mode=tap,guest_ip=x.x.x.x

 not

--network/-n mode=tag guest_ip=x.x.x.x


 This works for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage  \
 -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1

 This does not work for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \
 -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1
 
 Yes, with ',' ofcourse :)

Sasha, Can you drop these []?  There are way too many []. Although []
tells user this option is optional.

--network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip=
[host_ip]] [guest_mac=[guest_mac]] [script=[script]]


Does this look ok?
--network/-n
mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh


-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Sasha Levin
On Mon, 2011-09-26 at 20:11 +0800, Asias He wrote:
 On 09/26/2011 07:59 PM, Sasha Levin wrote:
  On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote:
  On 09/25/2011 07:11 PM, Sasha Levin wrote:
  This patch adds support for multiple network devices. The command
  line syntax
  changes to the following:
 
--network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]]
  [host_ip=
  [host_ip]] [guest_mac=[guest_mac]] [script=[script]]
 
  This syntax is actually, no?
 
 --network/-n mode=tap,guest_ip=x.x.x.x
 
  not
 
 --network/-n mode=tag guest_ip=x.x.x.x
 
 
  This works for me:
  $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage  \
  -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1
 
  This does not work for me:
  $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \
  -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1
  
  Yes, with ',' ofcourse :)
 
 Sasha, Can you drop these []?  There are way too many []. Although []
 tells user this option is optional.
 
 --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]
 
 
 Does this look ok?
 --network/-n
 mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh

We really need them to specify they are optional.

How about this:

--network/-n
[mode=tap/user/none][,guest_ip=ip][,host_ip=ip][,guest_mac=mac][,script=file]

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Asias He
On 09/26/2011 08:14 PM, Sasha Levin wrote:
 On Mon, 2011-09-26 at 20:11 +0800, Asias He wrote:
 On 09/26/2011 07:59 PM, Sasha Levin wrote:
 On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote:
 On 09/25/2011 07:11 PM, Sasha Levin wrote:
 This patch adds support for multiple network devices. The command
 line syntax
 changes to the following:

   --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]]
 [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]

 This syntax is actually, no?

--network/-n mode=tap,guest_ip=x.x.x.x

 not

--network/-n mode=tag guest_ip=x.x.x.x


 This works for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage  \
 -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1

 This does not work for me:
 $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \
 -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1

 Yes, with ',' ofcourse :)

 Sasha, Can you drop these []?  There are way too many []. Although []
 tells user this option is optional.

 --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip=
 [host_ip]] [guest_mac=[guest_mac]] [script=[script]]


 Does this look ok?
 --network/-n
 mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh
 
 We really need them to specify they are optional.
 
 How about this:
 
 --network/-n
 [mode=tap/user/none][,guest_ip=ip][,host_ip=ip][,guest_mac=mac][,script=file]

OK.

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt.virt_env_process: Abstract screenshot production

2011-09-26 Thread Lukáš Doktor

Hi,

vm.screendump() doesn't have parameter 'debug'.

So you should either add debug parameter to kvm_vm.py or remove this 
parameter (and perhaps add debug=False into kvm_vm.py).


Regards,
Lukáš


Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a):

In order to ease work with other virtualization types,
make virt_env_process to call vm.screendump() instead
of vm.monitor.screendump().

Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
---
  client/virt/virt_env_process.py |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py
index 789fa01..a2e 100644
--- a/client/virt/virt_env_process.py
+++ b/client/virt/virt_env_process.py
@@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name):
  scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name)
  try:
  if vm.monitor and params.get(take_regular_screendumps) == yes:
-vm.monitor.screendump(scrdump_filename, debug=False)
+vm.screendump(scrdump_filename, debug=False)
  except kvm_monitor.MonitorError, e:
  logging.warn(e)

@@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name):
  scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name)
  try:
  if vm.monitor and params.get(take_regular_screenshots) == yes:
-vm.monitor.screendump(scrdump_filename, debug=False)
+vm.screendump(scrdump_filename, debug=False)
  except kvm_monitor.MonitorError, e:
  logging.warn(e)

@@ -460,7 +460,7 @@ def _take_screendumps(test, params, env):
  if not vm.is_alive():
  continue
  try:
-vm.monitor.screendump(filename=temp_filename, debug=False)
+vm.screendump(filename=temp_filename, debug=False)
  except kvm_monitor.MonitorError, e:
  logging.warn(e)
  continue


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt.virt_env_process: Abstract screenshot production

2011-09-26 Thread Lucas Meneghel Rodrigues

On 09/26/2011 09:27 AM, Lukáš Doktor wrote:

Hi,

vm.screendump() doesn't have parameter 'debug'.


My fault, the screendump method on both qmp and human monitors does take 
this parameter, and since the implementation on virt_env_process was 
using the monitor method directly, I forgot to add the param to screendump.


It's fixed now. debug=True by default, the only place where it is False 
is during screendump thread (to avoid polluting the logs).


https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6


So you should either add debug parameter to kvm_vm.py or remove this
parameter (and perhaps add debug=False into kvm_vm.py).

Regards,
Lukáš


Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a):

In order to ease work with other virtualization types,
make virt_env_process to call vm.screendump() instead
of vm.monitor.screendump().

Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
---
client/virt/virt_env_process.py | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/virt/virt_env_process.py
b/client/virt/virt_env_process.py
index 789fa01..a2e 100644
--- a/client/virt/virt_env_process.py
+++ b/client/virt/virt_env_process.py
@@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name):
scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name)
try:
if vm.monitor and params.get(take_regular_screendumps) == yes:
- vm.monitor.screendump(scrdump_filename, debug=False)
+ vm.screendump(scrdump_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)

@@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name):
scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name)
try:
if vm.monitor and params.get(take_regular_screenshots) == yes:
- vm.monitor.screendump(scrdump_filename, debug=False)
+ vm.screendump(scrdump_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)

@@ -460,7 +460,7 @@ def _take_screendumps(test, params, env):
if not vm.is_alive():
continue
try:
- vm.monitor.screendump(filename=temp_filename, debug=False)
+ vm.screendump(filename=temp_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)
continue




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 08:36:41AM -0300, Marcelo Tosatti wrote:
  If no we should just remove hotplug
  macros for these two slots. Gleb, Marcelo, any preferences?
 
 Currently they are hardcoded as not hotpluggable (as can be noted by
 the RMV macros), but its nice to retrieve that information from QEMU
 instead.

OK, so you ACK the patch below?

  If we want to keep the option of making these slots hotpluggable
  from qemu, it is still easy to fix the duplication. For example,
  I could put ACPI_EXTRACT tags in VGA/ISA devices without
  renaming them. But the easiest way is probably by using Scope.
  See patch below.
  
  
It seems more straight-forward to just create
   the devices that are needed.
   
   -Kevin
  
  FWIW the acpi spec does mention that it's ok to describe
  unoccupied slots.
  
  
  
  Multiple device with the same _ADR relies on unspecified
  behavior of ACPI. Fix DSDT so we always have a single device per slot.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  -
  
  diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
  index 055202b..305d2e5 100644
  --- a/src/acpi-dsdt.dsl
  +++ b/src/acpi-dsdt.dsl
  @@ -260,8 +260,8 @@ DefinitionBlock (
   }
   
   Scope(\_SB.PCI0) {
  -Device (VGA) {
  - Name (_ADR, 0x0002)
  +   /* VGA device */
  +Scope (S2) {
OperationRegion(PCIC, PCI_Config, Zero, 0x4)
Field(PCIC, DWordAcc, NoLock, Preserve) {
VEND, 32
  @@ -282,13 +282,10 @@ DefinitionBlock (
Return (0x00)
}
}
  - Method(_RMV) { Return (0x00) }
   }
   
  /* PIIX3 ISA bridge */
  -Device (ISA) {
  -Name (_ADR, 0x0001)
  -Method(_RMV) { Return (0x00) }
  +Scope (S1) {
   
   
   /* PIIX PCI to ISA irq remapping */
  @@ -486,7 +483,7 @@ DefinitionBlock (
   
   /* PCI IRQs */
   Scope(\_SB) {
  - Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve)
  + Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve)
{
PRQ0,   8,
PRQ1,   8,
  
  -- 
  MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt.virt_env_process: Abstract screenshot production

2011-09-26 Thread Lukáš Doktor

Dne 26.9.2011 15:10, Lucas Meneghel Rodrigues napsal(a):

On 09/26/2011 09:27 AM, Lukáš Doktor wrote:

Hi,

vm.screendump() doesn't have parameter 'debug'.


My fault, the screendump method on both qmp and human monitors does 
take this parameter, and since the implementation on virt_env_process 
was using the monitor method directly, I forgot to add the param to 
screendump.


It's fixed now. debug=True by default, the only place where it is 
False is during screendump thread (to avoid polluting the logs).


https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6 


We used the same fix ;-)

Acked-by: Lukáš Doktor ldok...@redhat.com




So you should either add debug parameter to kvm_vm.py or remove this
parameter (and perhaps add debug=False into kvm_vm.py).

Regards,
Lukáš


Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a):

In order to ease work with other virtualization types,
make virt_env_process to call vm.screendump() instead
of vm.monitor.screendump().

Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
---
client/virt/virt_env_process.py | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/virt/virt_env_process.py
b/client/virt/virt_env_process.py
index 789fa01..a2e 100644
--- a/client/virt/virt_env_process.py
+++ b/client/virt/virt_env_process.py
@@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name):
scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name)
try:
if vm.monitor and params.get(take_regular_screendumps) == yes:
- vm.monitor.screendump(scrdump_filename, debug=False)
+ vm.screendump(scrdump_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)

@@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name):
scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name)
try:
if vm.monitor and params.get(take_regular_screenshots) == yes:
- vm.monitor.screendump(scrdump_filename, debug=False)
+ vm.screendump(scrdump_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)

@@ -460,7 +460,7 @@ def _take_screendumps(test, params, env):
if not vm.is_alive():
continue
try:
- vm.monitor.screendump(filename=temp_filename, debug=False)
+ vm.screendump(filename=temp_filename, debug=False)
except kvm_monitor.MonitorError, e:
logging.warn(e)
continue






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt.virt_env_process: Abstract screenshot production

2011-09-26 Thread Lucas Meneghel Rodrigues

On 09/26/2011 10:20 AM, Lukáš Doktor wrote:

Dne 26.9.2011 15:10, Lucas Meneghel Rodrigues napsal(a):

On 09/26/2011 09:27 AM, Lukáš Doktor wrote:

Hi,

vm.screendump() doesn't have parameter 'debug'.


My fault, the screendump method on both qmp and human monitors does
take this parameter, and since the implementation on virt_env_process
was using the monitor method directly, I forgot to add the param to
screendump.

It's fixed now. debug=True by default, the only place where it is
False is during screendump thread (to avoid polluting the logs).

https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6


We used the same fix ;-)


Yup :) Now, please rebase your topic branch, remove the fix and commit 
it with push --force. The pull request will be updated automatically, I 
will test it here and will let you know if there are more comments 
before I merge it with upstream.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Pekka Enberg

On Mon, 26 Sep 2011, Asias He wrote:

$ ./kvm run -n mode=tap


[1.490781] registered taskstats version 1
[1.492781] BUG: unable to handle kernel NULL pointer dereference at
001c
[1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408
[1.493781] *pde = 
[1.493781] Oops:  [#1] PREEMPT SMP
[1.493781] Modules linked in:
[1.493781]
[1.493781] Pid: 1, comm: swapper Tainted: GW   3.1.0-rc3+ #77
[1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1
[1.493781] EIP is at virtnet_poll+0x16e/0x408
[1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000
[1.493781] ESI:  EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64
[1.493781]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[1.493781] Process swapper (pid: 1, ti=db486000 task=db45
task.ti=db458000)
[1.493781] Stack:
[1.493781]  db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000
 0010
[1.493781]  0080 dbcebfe0 db5e1414 db5e1000  fffec005
db5e1414 db906dc0
[1.493781]  c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080
012c 
[1.493781] Call Trace:
[1.493781]  [c15e4869] net_rx_action+0x8e/0x177
[1.493781]  [c1066128] __do_softirq+0xa7/0x158
[1.493781]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
[1.493781]  IRQ
[1.493781]  [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86
[1.493781]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
[1.493781]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
[1.493781]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
[1.493781]  [c15e5a5f] ? __dev_open+0x96/0xa6
[1.493781]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
[1.493781]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
[1.493781]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
[1.493781]  [c1393c86] ? extract_entropy+0x45/0x71
[1.493781]  [c1059e35] ? get_parent_ip+0xb/0x31
[1.493781]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
[1.493781]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
[1.493781]  [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95
[1.493781]  [c1001159] ? do_one_initcall+0x71/0x114
[1.493781]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
[1.493781]  [c1a9c7ab] ? kernel_init+0xab/0x11d
[1.493781]  [c1a9c700] ? start_kernel+0x301/0x301
[1.493781]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
[1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc
8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2
ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02
[1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP
0068:db487f64
[1.493781] CR2: 001c
[1.549772] ---[ end trace 4eaa2a86a8e2da27 ]---
[1.550772] Kernel panic - not syncing: Fatal exception in interrupt
[1.551772] Pid: 1, comm: swapper Tainted: G  D W   3.1.0-rc3+ #77
[1.553771] Call Trace:
[1.553771]  [c169ca33] panic+0x58/0x156
[1.554771]  [c16a921a] oops_end+0x8c/0x9b
[1.555771]  [c169c4e7] no_context+0x116/0x120
[1.555771]  [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8
[1.557771]  [c169c5f6] bad_area_nosemaphore+0xd/0x10
[1.558771]  [c16aa4b5] do_page_fault+0x174/0x2fa
[1.559770]  [c107bad0] ? sched_clock_local+0x10/0x14b
[1.560770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
[1.561770]  [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7
[1.563770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
[1.564770]  [c16aa341] ? spurious_fault+0xa8/0xa8
[1.565770]  [c16a89d6] error_code+0x5a/0x60
[1.566769]  [c16aa341] ? spurious_fault+0xa8/0xa8
[1.567769]  [c14f3236] ? virtnet_poll+0x16e/0x408
[1.567769]  [c15e4869] net_rx_action+0x8e/0x177
[1.568769]  [c1066128] __do_softirq+0xa7/0x158
[1.569769]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
[1.569769]  IRQ  [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86
[1.570769]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
[1.571769]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
[1.571769]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
[1.572768]  [c15e5a5f] ? __dev_open+0x96/0xa6
[1.573768]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
[1.573768]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
[1.574768]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
[1.574768]  [c1393c86] ? extract_entropy+0x45/0x71
[1.575768]  [c1059e35] ? get_parent_ip+0xb/0x31
[1.576768]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
[1.576768]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
[1.577768]  [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95
[1.578768]  [c1001159] ? do_one_initcall+0x71/0x114
[1.578768]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
[1.579767]  [c1a9c7ab] ? kernel_init+0xab/0x11d
[1.580767]  [c1a9c700] ? start_kernel+0x301/0x301
[1.581767]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
[1.582767] Rebooting in 1 seconds..
 # KVM session ended normally.


This needs fixing before I can apply 

Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Asias He
On 09/26/2011 10:54 PM, Pekka Enberg wrote:
 On Mon, 26 Sep 2011, Asias He wrote:
 $ ./kvm run -n mode=tap


 [1.490781] registered taskstats version 1
 [1.492781] BUG: unable to handle kernel NULL pointer dereference at
 001c
 [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408
 [1.493781] *pde = 
 [1.493781] Oops:  [#1] PREEMPT SMP
 [1.493781] Modules linked in:
 [1.493781]
 [1.493781] Pid: 1, comm: swapper Tainted: GW   3.1.0-rc3+ #77
 [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1
 [1.493781] EIP is at virtnet_poll+0x16e/0x408
 [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000
 [1.493781] ESI:  EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64
 [1.493781]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 [1.493781] Process swapper (pid: 1, ti=db486000 task=db45
 task.ti=db458000)
 [1.493781] Stack:
 [1.493781]  db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000
  0010
 [1.493781]  0080 dbcebfe0 db5e1414 db5e1000  fffec005
 db5e1414 db906dc0
 [1.493781]  c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080
 012c 
 [1.493781] Call Trace:
 [1.493781]  [c15e4869] net_rx_action+0x8e/0x177
 [1.493781]  [c1066128] __do_softirq+0xa7/0x158
 [1.493781]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
 [1.493781]  IRQ
 [1.493781]  [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86
 [1.493781]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
 [1.493781]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
 [1.493781]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
 [1.493781]  [c15e5a5f] ? __dev_open+0x96/0xa6
 [1.493781]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
 [1.493781]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
 [1.493781]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
 [1.493781]  [c1393c86] ? extract_entropy+0x45/0x71
 [1.493781]  [c1059e35] ? get_parent_ip+0xb/0x31
 [1.493781]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
 [1.493781]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
 [1.493781]  [c1629173] ?
 tcp_set_default_congestion_control+0x8c/0x95
 [1.493781]  [c1001159] ? do_one_initcall+0x71/0x114
 [1.493781]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
 [1.493781]  [c1a9c7ab] ? kernel_init+0xab/0x11d
 [1.493781]  [c1a9c700] ? start_kernel+0x301/0x301
 [1.493781]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
 [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc
 8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2
 ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02
 [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP
 0068:db487f64
 [1.493781] CR2: 001c
 [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]---
 [1.550772] Kernel panic - not syncing: Fatal exception in interrupt
 [1.551772] Pid: 1, comm: swapper Tainted: G  D W   3.1.0-rc3+ #77
 [1.553771] Call Trace:
 [1.553771]  [c169ca33] panic+0x58/0x156
 [1.554771]  [c16a921a] oops_end+0x8c/0x9b
 [1.555771]  [c169c4e7] no_context+0x116/0x120
 [1.555771]  [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8
 [1.557771]  [c169c5f6] bad_area_nosemaphore+0xd/0x10
 [1.558771]  [c16aa4b5] do_page_fault+0x174/0x2fa
 [1.559770]  [c107bad0] ? sched_clock_local+0x10/0x14b
 [1.560770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
 [1.561770]  [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7
 [1.563770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
 [1.564770]  [c16aa341] ? spurious_fault+0xa8/0xa8
 [1.565770]  [c16a89d6] error_code+0x5a/0x60
 [1.566769]  [c16aa341] ? spurious_fault+0xa8/0xa8
 [1.567769]  [c14f3236] ? virtnet_poll+0x16e/0x408
 [1.567769]  [c15e4869] net_rx_action+0x8e/0x177
 [1.568769]  [c1066128] __do_softirq+0xa7/0x158
 [1.569769]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
 [1.569769]  IRQ  [c1065e82] ?
 _local_bh_enable_ip.isra.9+0x65/0x86
 [1.570769]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
 [1.571769]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
 [1.571769]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
 [1.572768]  [c15e5a5f] ? __dev_open+0x96/0xa6
 [1.573768]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
 [1.573768]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
 [1.574768]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
 [1.574768]  [c1393c86] ? extract_entropy+0x45/0x71
 [1.575768]  [c1059e35] ? get_parent_ip+0xb/0x31
 [1.576768]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
 [1.576768]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
 [1.577768]  [c1629173] ?
 tcp_set_default_congestion_control+0x8c/0x95
 [1.578768]  [c1001159] ? do_one_initcall+0x71/0x114
 [1.578768]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
 [1.579767]  [c1a9c7ab] ? kernel_init+0xab/0x11d
 [1.580767]  [c1a9c700] ? start_kernel+0x301/0x301
 [1.581767]  [c16acfb6] ? 

Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Avi Kivity

On 09/26/2011 05:54 PM, Pekka Enberg wrote:

On Mon, 26 Sep 2011, Asias He wrote:

$ ./kvm run -n mode=tap


[1.490781] registered taskstats version 1
[1.492781] BUG: unable to handle kernel NULL pointer dereference at
001c
[1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408
[1.493781] *pde = 
[1.493781] Oops:  [#1] PREEMPT SMP
[1.493781] Modules linked in:
[1.493781]
[1.493781] Pid: 1, comm: swapper Tainted: GW   3.1.0-rc3+ 
#77

[1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1
[1.493781] EIP is at virtnet_poll+0x16e/0x408
[1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000
[1.493781] ESI:  EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64
[1.493781]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[1.493781] Process swapper (pid: 1, ti=db486000 task=db45
task.ti=db458000)
[1.493781] Stack:
[1.493781]  db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000
 0010
[1.493781]  0080 dbcebfe0 db5e1414 db5e1000  fffec005
db5e1414 db906dc0
[1.493781]  c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080
012c 
[1.493781] Call Trace:
[1.493781]  [c15e4869] net_rx_action+0x8e/0x177
[1.493781]  [c1066128] __do_softirq+0xa7/0x158
[1.493781]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
[1.493781] IRQ
[1.493781]  [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86
[1.493781]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
[1.493781]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
[1.493781]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
[1.493781]  [c15e5a5f] ? __dev_open+0x96/0xa6
[1.493781]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
[1.493781]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
[1.493781]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
[1.493781]  [c1393c86] ? extract_entropy+0x45/0x71
[1.493781]  [c1059e35] ? get_parent_ip+0xb/0x31
[1.493781]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
[1.493781]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
[1.493781]  [c1629173] ? 
tcp_set_default_congestion_control+0x8c/0x95

[1.493781]  [c1001159] ? do_one_initcall+0x71/0x114
[1.493781]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
[1.493781]  [c1a9c7ab] ? kernel_init+0xab/0x11d
[1.493781]  [c1a9c700] ? start_kernel+0x301/0x301
[1.493781]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
[1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc
8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2
ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02
[1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP
0068:db487f64
[1.493781] CR2: 001c
[1.549772] ---[ end trace 4eaa2a86a8e2da27 ]---
[1.550772] Kernel panic - not syncing: Fatal exception in interrupt
[1.551772] Pid: 1, comm: swapper Tainted: G  D W   3.1.0-rc3+ 
#77

[1.553771] Call Trace:
[1.553771]  [c169ca33] panic+0x58/0x156
[1.554771]  [c16a921a] oops_end+0x8c/0x9b
[1.555771]  [c169c4e7] no_context+0x116/0x120
[1.555771]  [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8
[1.557771]  [c169c5f6] bad_area_nosemaphore+0xd/0x10
[1.558771]  [c16aa4b5] do_page_fault+0x174/0x2fa
[1.559770]  [c107bad0] ? sched_clock_local+0x10/0x14b
[1.560770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
[1.561770]  [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7
[1.563770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
[1.564770]  [c16aa341] ? spurious_fault+0xa8/0xa8
[1.565770]  [c16a89d6] error_code+0x5a/0x60
[1.566769]  [c16aa341] ? spurious_fault+0xa8/0xa8
[1.567769]  [c14f3236] ? virtnet_poll+0x16e/0x408
[1.567769]  [c15e4869] net_rx_action+0x8e/0x177
[1.568769]  [c1066128] __do_softirq+0xa7/0x158
[1.569769]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
[1.569769] IRQ  [c1065e82] ? 
_local_bh_enable_ip.isra.9+0x65/0x86

[1.570769]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
[1.571769]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
[1.571769]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
[1.572768]  [c15e5a5f] ? __dev_open+0x96/0xa6
[1.573768]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
[1.573768]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
[1.574768]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
[1.574768]  [c1393c86] ? extract_entropy+0x45/0x71
[1.575768]  [c1059e35] ? get_parent_ip+0xb/0x31
[1.576768]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
[1.576768]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
[1.577768]  [c1629173] ? 
tcp_set_default_congestion_control+0x8c/0x95

[1.578768]  [c1001159] ? do_one_initcall+0x71/0x114
[1.578768]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
[1.579767]  [c1a9c7ab] ? kernel_init+0xab/0x11d
[1.580767]  [c1a9c700] ? start_kernel+0x301/0x301
[1.581767]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
[1.582767] Rebooting in 1 seconds..
 # KVM session 

Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Sasha Levin
On Mon, 2011-09-26 at 19:21 +0300, Avi Kivity wrote:
 On 09/26/2011 05:54 PM, Pekka Enberg wrote:
  On Mon, 26 Sep 2011, Asias He wrote:
  $ ./kvm run -n mode=tap
 
 
  [1.490781] registered taskstats version 1
  [1.492781] BUG: unable to handle kernel NULL pointer dereference at
  001c
  [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408
  [1.493781] *pde = 
  [1.493781] Oops:  [#1] PREEMPT SMP
  [1.493781] Modules linked in:
  [1.493781]
  [1.493781] Pid: 1, comm: swapper Tainted: GW   3.1.0-rc3+ 
  #77
  [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1
  [1.493781] EIP is at virtnet_poll+0x16e/0x408
  [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000
  [1.493781] ESI:  EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64
  [1.493781]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
  [1.493781] Process swapper (pid: 1, ti=db486000 task=db45
  task.ti=db458000)
  [1.493781] Stack:
  [1.493781]  db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000
   0010
  [1.493781]  0080 dbcebfe0 db5e1414 db5e1000  fffec005
  db5e1414 db906dc0
  [1.493781]  c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080
  012c 
  [1.493781] Call Trace:
  [1.493781]  [c15e4869] net_rx_action+0x8e/0x177
  [1.493781]  [c1066128] __do_softirq+0xa7/0x158
  [1.493781]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
  [1.493781] IRQ
  [1.493781]  [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86
  [1.493781]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
  [1.493781]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
  [1.493781]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
  [1.493781]  [c15e5a5f] ? __dev_open+0x96/0xa6
  [1.493781]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
  [1.493781]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
  [1.493781]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
  [1.493781]  [c1393c86] ? extract_entropy+0x45/0x71
  [1.493781]  [c1059e35] ? get_parent_ip+0xb/0x31
  [1.493781]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
  [1.493781]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
  [1.493781]  [c1629173] ? 
  tcp_set_default_congestion_control+0x8c/0x95
  [1.493781]  [c1001159] ? do_one_initcall+0x71/0x114
  [1.493781]  [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91
  [1.493781]  [c1a9c7ab] ? kernel_init+0xab/0x11d
  [1.493781]  [c1a9c700] ? start_kernel+0x301/0x301
  [1.493781]  [c16acfb6] ? kernel_thread_helper+0x6/0xd
  [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc
  8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2
  ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02
  [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP
  0068:db487f64
  [1.493781] CR2: 001c
  [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]---
  [1.550772] Kernel panic - not syncing: Fatal exception in interrupt
  [1.551772] Pid: 1, comm: swapper Tainted: G  D W   3.1.0-rc3+ 
  #77
  [1.553771] Call Trace:
  [1.553771]  [c169ca33] panic+0x58/0x156
  [1.554771]  [c16a921a] oops_end+0x8c/0x9b
  [1.555771]  [c169c4e7] no_context+0x116/0x120
  [1.555771]  [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8
  [1.557771]  [c169c5f6] bad_area_nosemaphore+0xd/0x10
  [1.558771]  [c16aa4b5] do_page_fault+0x174/0x2fa
  [1.559770]  [c107bad0] ? sched_clock_local+0x10/0x14b
  [1.560770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
  [1.561770]  [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7
  [1.563770]  [c15db33f] ? __netdev_alloc_skb+0x17/0x34
  [1.564770]  [c16aa341] ? spurious_fault+0xa8/0xa8
  [1.565770]  [c16a89d6] error_code+0x5a/0x60
  [1.566769]  [c16aa341] ? spurious_fault+0xa8/0xa8
  [1.567769]  [c14f3236] ? virtnet_poll+0x16e/0x408
  [1.567769]  [c15e4869] net_rx_action+0x8e/0x177
  [1.568769]  [c1066128] __do_softirq+0xa7/0x158
  [1.569769]  [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b
  [1.569769] IRQ  [c1065e82] ? 
  _local_bh_enable_ip.isra.9+0x65/0x86
  [1.570769]  [c1065eab] ? local_bh_enable_ip+0x8/0xa
  [1.571769]  [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a
  [1.571769]  [c15e59c5] ? dev_set_rx_mode+0x22/0x26
  [1.572768]  [c15e5a5f] ? __dev_open+0x96/0xa6
  [1.573768]  [c15e5c23] ? __dev_change_flags+0x97/0x10e
  [1.573768]  [c15e5cfe] ? dev_change_flags+0x13/0x3f
  [1.574768]  [c1acfe6f] ? ip_auto_config+0x160/0xcf8
  [1.574768]  [c1393c86] ? extract_entropy+0x45/0x71
  [1.575768]  [c1059e35] ? get_parent_ip+0xb/0x31
  [1.576768]  [c16aa6b7] ? sub_preempt_count+0x7c/0x89
  [1.576768]  [c16a7d24] ? _raw_spin_unlock+0x1c/0x27
  [1.577768]  [c1629173] ? 
  tcp_set_default_congestion_control+0x8c/0x95
  [1.578768]  [c1001159] ? do_one_initcall+0x71/0x114
  [1.578768]  [c1acfd0f] ? 

Re: [PATCH v2] kvm tools: Support multiple net devices

2011-09-26 Thread Pekka Enberg
On Mon, Sep 26, 2011 at 7:37 PM, Sasha Levin levinsasha...@gmail.com wrote:
  This needs fixing before I can apply the patch, right?

 Looks like a guest kernel bug, no?

 It's a kernel bug and should be fixed there, but it's caused by us not
 passing sane values to virtio-net, which we can fix on our side as well.

 So my plan is to prevent triggering it from within kvm tools while
 working on a kernel patch.

Yup, that's what I meant as well. Even if it is a kernel issue, we
should do enough validation on our side not to trigger it so easily.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-26 Thread Avi Kivity

On 09/20/2011 07:49 PM, Jan Kiszka wrote:

Upstream's version is about to be signal-free and will stop handling
SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
from signalfd to a pipe.



Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NFS share

2011-09-26 Thread M. Rodrigo Monteiro
Hi!

I just installed KVM (yum) on a fresh CentOS 6 (64 bits).
Everything works great with virt-manager.
The only thing I'm having problem is accessing the HD from an NFS share.
virt-manager mount fine the share. Locally I can write on the share.
But when I'm trying to create a HD on the pool, it just don't do
anything. Returns to the pool and don't create the file nor gives me
an error.
What should I do?


Regards,
Rodrigo.


M. Rodrigo Monteiro

Free as in Freedom, not free as in free beer
As we are liberated from our own fear, our presence automatically
liberates others
Linux User # 403730
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-26 Thread Jan Kiszka
On 2011-09-21 10:06, Stefan Hajnoczi wrote:
 On Tue, Sep 20, 2011 at 06:49:49PM +0200, Jan Kiszka wrote:
 Upstream's version is about to be signal-free and will stop handling
 SIGUSR2 specially. So it's time to adopt its implementation, ie. switch
 from signalfd to a pipe.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---

 This should help pulling upstream into qemu-kvm when block: avoid
 SIGUSR2 is merged. And will help merging further cleanups of this code
 I'm working on.

  posix-aio-compat.c |   73 
 ++-
  1 files changed, 32 insertions(+), 41 deletions(-)
 
 Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 
 Perhaps qemu_eventfd() can be used in the future instead of an explicit
 pipe.  Then Linux will do eventfd while other OSes will fall back to
 pipes.

Basically simpler code, or does this also have runtime benefits?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-26 Thread Avi Kivity

On 09/26/2011 08:23 PM, Jan Kiszka wrote:


  Perhaps qemu_eventfd() can be used in the future instead of an explicit
  pipe.  Then Linux will do eventfd while other OSes will fall back to
  pipes.

Basically simpler code, or does this also have runtime benefits?



In corner cases, the completion can block on the write() with pipes, but 
not eventfd.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] nVMX injection corrections

2011-09-26 Thread Avi Kivity

On 09/22/2011 01:52 PM, Nadav Har'El wrote:

The following two patches solve two injection-related nested VMX issues:

  1. When we must run L2 next (namely on L1's VMLAUNCH/VMRESUME), injection
 into L1 was delayed for an unknown amount of time - until L2 exits.
 We now force (using a self IPI) an exit immediately after entry to L2,
 so that the injection into L1 happens promptly.

  2. unexpected, valid vectoring info warnings appeared in L1.
 These are fixed by correcting the emulation of concurrent L0-L1 and
 L1-L2 injections: We cannot inject into L1 until the injection into L2
 has been processed.




Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Sasha Levin
This patch verifies that the length of a buffer stored in a linked list
of pages is small enough to fit into a skb.

If the size is larger than a max size of a skb, it means that we shouldn't
go ahead building skbs anyway since we won't be able to send the buffer as
the user requested.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 drivers/net/virtio_net.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0c7321c..64e0717 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
unsigned int copy, hdr_len, offset;
char *p;
 
+   if (len  MAX_SKB_FRAGS * PAGE_SIZE)
+   return NULL;
+
p = page_address(page);
 
/* copy small packet so we can reuse these pages for small data */
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] virtio-net: Prevent NULL dereference

2011-09-26 Thread Sasha Levin
This patch prevents a NULL dereference when the user has passed a length
longer than an actual buffer to virtio-net.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 drivers/net/virtio_net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64e0717..8d32c1e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -198,7 +198,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
len -= copy;
offset += copy;
 
-   while (len) {
+   while (len  page) {
set_skb_frag(skb, page, offset, len);
page = (struct page *)page-private;
offset = 0;
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] kvm tools: Support multiple net devices

2011-09-26 Thread Sasha Levin
This patch adds support for multiple network devices. The command line syntax
changes to the following:

--network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip=
ip][,guest_mac=mac][,script=file]

Each of the parameters is optional, and the config defaults to a TAP based
networking with a sequential MAC.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-run.c |  146 
 tools/kvm/virtio/net.c  |  157 ++-
 2 files changed, 193 insertions(+), 110 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 28dc95a..070b1b6 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -87,9 +87,12 @@ static bool sdl;
 static bool balloon;
 static bool using_rootfs;
 static bool custom_rootfs;
+static bool no_net;
 extern bool ioport_debug;
 extern int  active_console;
 extern int  debug_iodelay;
+struct virtio_net_parameters *net_params;
+int num_net_devices;
 
 bool do_debug_print = false;
 
@@ -182,6 +185,89 @@ static int tty_parser(const struct option *opt, const char 
*arg, int unset)
return 0;
 }
 
+static inline void str_to_mac(const char *str, char *mac)
+{
+   sscanf(str, %hhx:%hhx:%hhx:%hhx:%hhx:%hhx,
+   mac, mac+1, mac+2, mac+3, mac+4, mac+5);
+}
+static int set_net_param(struct virtio_net_parameters *p, const char *param,
+   const char *val)
+{
+   if (strcmp(param, guest_mac) == 0) {
+   str_to_mac(val, p-guest_mac);
+   } else if (strcmp(param, mode) == 0) {
+   if (!strncmp(val, user, 4)) {
+   int i;
+
+   for (i = 0; i  num_net_devices; i++)
+   if (net_params[i].mode == NET_MODE_USER)
+   die(Only one usermode network device 
allowed at a time);
+   p-mode = NET_MODE_USER;
+   } else if (!strncmp(val, tap, 3)) {
+   p-mode = NET_MODE_TAP;
+   } else if (!strncmp(val, none, 4)) {
+   no_net = 1;
+   return -1;
+   } else
+   die(Unkown network mode %s, please use user, tap or 
none, network);
+   } else if (strcmp(param, script) == 0) {
+   p-script = val;
+   } else if (strcmp(param, guest_ip) == 0) {
+   p-guest_ip = val;
+   } else if (strcmp(param, host_ip) == 0) {
+   p-host_ip = val;
+   }
+
+   return 0;
+}
+
+static int netdev_parser(const struct option *opt, const char *arg, int unset)
+{
+   struct virtio_net_parameters p;
+   char *buf, *cmd = NULL, *cur = NULL;
+   bool on_cmd = true;
+
+   if (arg) {
+   buf = strdup(arg);
+   if (buf == NULL)
+   die(Failed allocating new net buffer);
+   cur = strtok(buf, ,=);
+   }
+
+   p = (struct virtio_net_parameters) {
+   .guest_ip   = DEFAULT_GUEST_ADDR,
+   .host_ip= DEFAULT_HOST_ADDR,
+   .script = DEFAULT_SCRIPT,
+   .mode   = NET_MODE_TAP,
+   };
+
+   str_to_mac(DEFAULT_GUEST_MAC, p.guest_mac);
+   p.guest_mac[5] += num_net_devices;
+
+   while (cur) {
+   if (on_cmd) {
+   cmd = cur;
+   } else {
+   if (set_net_param(p, cmd, cur)  0)
+   goto done;
+   }
+   on_cmd = !on_cmd;
+
+   cur = strtok(NULL, ,=);
+   };
+
+   num_net_devices++;
+
+   net_params = realloc(net_params, num_net_devices * sizeof(*net_params));
+   if (net_params == NULL)
+   die(Failed adding new network device);
+
+   net_params[num_net_devices - 1] = p;
+
+done:
+   return 0;
+}
+
 static int shmem_parser(const struct option *opt, const char *arg, int unset)
 {
const u64 default_size = SHMEM_DEFAULT_SIZE;
@@ -339,18 +425,9 @@ static const struct option options[] = {
Kernel command line arguments),
 
OPT_GROUP(Networking options:),
-   OPT_STRING('n', network, network, user, tap, none,
-   Network to use),
-   OPT_STRING('\0', host-ip, host_ip, a.b.c.d,
-   Assign this address to the host side networking),
-   OPT_STRING('\0', guest-ip, guest_ip, a.b.c.d,
-   Assign this address to the guest side networking),
-   OPT_STRING('\0', host-mac, host_mac, aa:bb:cc:dd:ee:ff,
-   Assign this address to the host side NIC),
-   OPT_STRING('\0', guest-mac, guest_mac, aa:bb:cc:dd:ee:ff,
-   Assign this address to the guest side NIC),
-   OPT_STRING('\0', tapscript, script, Script path,
-Assign a script to process created tap device),

Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream

2011-09-26 Thread Anthony Liguori

On 09/26/2011 12:24 PM, Avi Kivity wrote:

On 09/26/2011 08:23 PM, Jan Kiszka wrote:


 Perhaps qemu_eventfd() can be used in the future instead of an explicit
 pipe. Then Linux will do eventfd while other OSes will fall back to
 pipes.

Basically simpler code, or does this also have runtime benefits?



In corner cases, the completion can block on the write() with pipes, but not
eventfd.


The pipe is O_NONBLOCK and the only thing the caller cares about is whether 
there is *some* data in the PIPE so it can happily ignore EAGAIN.


So the pipe implementation never blocks on write() either.  It may require 
multiple reads to drain the queue though whereas eventfd would only require a 
single read to drain the queue.


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Alex Williamson
On Mon, 2011-09-26 at 12:04 +0200, Alexander Graf wrote:
 Am 26.09.2011 um 09:51 schrieb David Gibson da...@gibson.dropbear.id.au:
 
  On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
  Based on the discussions over the last couple of weeks
  I have updated the device fd file layout proposal and
  tried to specify it a bit more formally.
  
  ===
  
  1.  Overview
  
   This specification describes the layout of device files
   used in the context of vfio, which gives user space
   direct access to I/O devices that have been bound to
   vfio.
  
   When a device fd is opened and read, offset 0x0 contains
   a fixed sized header followed by a number of variable length
   records that describe different characteristics
   of the device-- addressable regions, interrupts, etc.
  
   0x0  +-+-+
| magic | u32  // identifies this as a vfio
  device file
+---+ and identifies the type of bus
| version   | u32  // specifies the version of this
+---+
| flags | u32  // encodes any flags
+---+
|  dev info record 0|
|type   | u32   // type of record
|rec_len| u32   // length in bytes of record
|   |  (including record header)
|flags  | u32   // type specific flags
|...content...  |   // record content, which could
+---+   // include sub-records
|  dev info record 1|
+---+
|  dev info record N|
+---+
  
  I really should have chimed in on this earlier, but I've been very
  busy.
  
  Um, not to put too fine a point on it, this is madness.
  
  Yes, it's very flexible and can thereby cover a very wide range of
  cases.  But it's much, much too complex.  Userspace has to parse a
  complex, multilayered data structure, with variable length elements
  just to get an address at which to do IO.  I can pretty much guarantee
  that if we went with this, most userspace programs using this
  interface would just ignore this metadata and directly map the
  offsets at which they happen to know the kernel will put things for
  the type of device they care about.
  
  _At least_ for PCI, I think the original VFIO layout of each BAR at a
  fixed, well known offset is much better.  Despite its limitations,
  just advertising a device type ID which describes one of a few fixed
  layouts would be preferable to this.  I'm still hoping, that we can do
  a bit better than that.  But we should try really hard to at the very
  least force the metadata into a simple array of resources each with a
  fixed size record describing it, even if it means some space wastage
  with occasionally-used fields.  Anything more complex than that and
  userspace is just never going to use it properly.
 
 We will have 2 different types of user space. One wants to be as
 generic as possible and needs all this dynamic information. QEMU would
 fall into this category.
 
 The other one is device specific drivers in user space. Here
 hardcoding might make sense.
 
 For the generic interface, we need something that us as verbose as
 possible and lets us enumerate all the device properties, so we can
 properly map and forward them to the guest.
 
 However, nothing keeps us from mapping BARs always at static offsets
 into the file. If you don't need the generic info, then you don't need
 it.
 
 Also, if you can come up with an interface that does not have variable
 length descriptors but is still able to export all the required
 generic information, please send a proposal to the list :)
 

Hi,

The other obvious possibility is a pure ioctl interface.  To match what
this proposal is trying to describe, plus the runtime interfaces, we'd
need something like:

/* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
#define VFIO_DEVICE_GET_FLAGS   _IOR(, , u64)


/* Return number of mmio/iop/config regions.
 * For PCI this is always 8 (BAR0-5 + ROM + Config) */
#define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int)

/* Return length for region index (may be zero) */
#define VFIO_DEVICE_GET_REGION_LEN  _IOWR(, , u64)

/* Return flags for region index
 * :0 - mmap'able, :1 - read-only, 63:2 - reserved */
#define VFIO_DEVICE_GET_REGION_FLAGS_IOR(, , u64)

/* Return file offset for region index */
#define VFIO_DEVICE_GET_REGION_OFFSET   _IOWR(, , u64)

/* Return physical address for region index - not implemented for PCI */
#define VFIO_DEVICE_GET_REGION_PHYS_ADDR_IOWR(, , u64)



/* Return number of IRQs (Not including MSI/MSI-X for PCI) 

Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote:
 This patch verifies that the length of a buffer stored in a linked list
 of pages is small enough to fit into a skb.
 
 If the size is larger than a max size of a skb, it means that we shouldn't
 go ahead building skbs anyway since we won't be able to send the buffer as
 the user requested.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: virtualizat...@lists.linux-foundation.org
 Cc: net...@vger.kernel.org
 Cc: kvm@vger.kernel.org
 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Interesting.  This is a theoretical issue, correct?
Not a crash you actually see.

This crash would mean device is giving us packets
that are way too large. Avoiding crashes even in the face of
a misbehaved device is a good idea, but should
we print a diagnostic to a system log?
Maybe rate-limited or print once to avoid filling
up the disk. Other places in driver print with pr_debug
I'm not sure that's right but better than nothing.

 ---
  drivers/net/virtio_net.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 0c7321c..64e0717 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
 *vi,
   unsigned int copy, hdr_len, offset;
   char *p;
  
 + if (len  MAX_SKB_FRAGS * PAGE_SIZE)

unlikely()?

Also, this seems too aggressive: at this point len includes the header
and the linear part. The right place for this
test is probably where we fill in the frags, just before
while (len)

The whole can only happen when mergeable buffers
are disabled, right?


 + return NULL;
 +
   p = page_address(page);
  
   /* copy small packet so we can reuse these pages for small data */
 -- 
 1.7.6.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio-net: Prevent NULL dereference

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 08:41:09PM +0300, Sasha Levin wrote:
 This patch prevents a NULL dereference when the user has passed a length
 longer than an actual buffer to virtio-net.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: virtualizat...@lists.linux-foundation.org
 Cc: net...@vger.kernel.org
 Cc: kvm@vger.kernel.org
 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Hmm, another protection against a buggy device, right?
No problem with that, but let's discard the packet
and print a disgnostic, so that the user can discover
what happened.

 ---
  drivers/net/virtio_net.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 64e0717..8d32c1e 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -198,7 +198,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
 *vi,
   len -= copy;
   offset += copy;
  
 - while (len) {
 + while (len  page) {
   set_skb_frag(skb, page, offset, len);
   page = (struct page *)page-private;
   offset = 0;
 -- 
 1.7.6.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Sasha Levin
On Mon, 2011-09-26 at 21:44 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote:
  This patch verifies that the length of a buffer stored in a linked list
  of pages is small enough to fit into a skb.
  
  If the size is larger than a max size of a skb, it means that we shouldn't
  go ahead building skbs anyway since we won't be able to send the buffer as
  the user requested.
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: virtualizat...@lists.linux-foundation.org
  Cc: net...@vger.kernel.org
  Cc: kvm@vger.kernel.org
  Signed-off-by: Sasha Levin levinsasha...@gmail.com
 
 Interesting.  This is a theoretical issue, correct?
 Not a crash you actually see.

Actually it was an actual crash caused when our virtio-net driver in kvm
tools did funny things and passed '(u32)-1' length as a buffer length to
the guest kernel.

 This crash would mean device is giving us packets
 that are way too large. Avoiding crashes even in the face of
 a misbehaved device is a good idea, but should
 we print a diagnostic to a system log?
 Maybe rate-limited or print once to avoid filling
 up the disk. Other places in driver print with pr_debug
 I'm not sure that's right but better than nothing.

Yup, I'll add some debug info.

  ---
   drivers/net/virtio_net.c |3 +++
   1 files changed, 3 insertions(+), 0 deletions(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 0c7321c..64e0717 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
  *vi,
  unsigned int copy, hdr_len, offset;
  char *p;
   
  +   if (len  MAX_SKB_FRAGS * PAGE_SIZE)
 
 unlikely()?
 
 Also, this seems too aggressive: at this point len includes the header
 and the linear part. The right place for this
 test is probably where we fill in the frags, just before
 while (len)
 
 The whole can only happen when mergeable buffers
 are disabled, right?

From what I understand it can happen whenever you're going to build a
skb longer than PAGE_SIZE.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Pekka Enberg
On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote:
 Interesting.  This is a theoretical issue, correct?
 Not a crash you actually see.

 Actually it was an actual crash caused when our virtio-net driver in kvm
 tools did funny things and passed '(u32)-1' length as a buffer length to
 the guest kernel.

I'm not sure what Michael means with theoretical issue here. Can the guest
driver assume that the hypervisor doesn't attempt to do nasty things?

  Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 10:37:22PM +0300, Sasha Levin wrote:
 On Mon, 2011-09-26 at 21:44 +0300, Michael S. Tsirkin wrote:
  On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote:
   This patch verifies that the length of a buffer stored in a linked list
   of pages is small enough to fit into a skb.
   
   If the size is larger than a max size of a skb, it means that we shouldn't
   go ahead building skbs anyway since we won't be able to send the buffer as
   the user requested.
   
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: virtualizat...@lists.linux-foundation.org
   Cc: net...@vger.kernel.org
   Cc: kvm@vger.kernel.org
   Signed-off-by: Sasha Levin levinsasha...@gmail.com
  
  Interesting.  This is a theoretical issue, correct?
  Not a crash you actually see.
 
 Actually it was an actual crash caused when our virtio-net driver in kvm
 tools did funny things and passed '(u32)-1' length as a buffer length to
 the guest kernel.
 
  This crash would mean device is giving us packets
  that are way too large. Avoiding crashes even in the face of
  a misbehaved device is a good idea, but should
  we print a diagnostic to a system log?
  Maybe rate-limited or print once to avoid filling
  up the disk. Other places in driver print with pr_debug
  I'm not sure that's right but better than nothing.
 
 Yup, I'll add some debug info.
 
   ---
drivers/net/virtio_net.c |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
   
   diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
   index 0c7321c..64e0717 100644
   --- a/drivers/net/virtio_net.c
   +++ b/drivers/net/virtio_net.c
   @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct 
   virtnet_info *vi,
 unsigned int copy, hdr_len, offset;
 char *p;

   + if (len  MAX_SKB_FRAGS * PAGE_SIZE)
  
  unlikely()?
  
  Also, this seems too aggressive: at this point len includes the header
  and the linear part. The right place for this
  test is probably where we fill in the frags, just before
  while (len)
  
  The whole can only happen when mergeable buffers
  are disabled, right?
 
 From what I understand it can happen whenever you're going to build a
 skb longer than PAGE_SIZE.

Hmm how exactly?  With mergeable buffers this only gets
the length of the 1st chunk which is up to 4K unless the driver
is buggy ...

 -- 
 
 Sasha.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Michael S. Tsirkin
On Mon, Sep 26, 2011 at 10:45:35PM +0300, Pekka Enberg wrote:
 On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote:
  Interesting.  This is a theoretical issue, correct?
  Not a crash you actually see.
 
  Actually it was an actual crash caused when our virtio-net driver in kvm
  tools did funny things and passed '(u32)-1' length as a buffer length to
  the guest kernel.
 
 I'm not sure what Michael means with theoretical issue here. Can the guest
 driver assume that the hypervisor doesn't attempt to do nasty things?
 
   Pekka

IMO yes, hypervisor has full access to guest memory so it's a safe
assumption. But surviving in the face of hypervisor bugs
is laudable goal, bugs do happen.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Stuart Yoder
On Mon, Sep 26, 2011 at 2:51 AM, David Gibson
da...@gibson.dropbear.id.au wrote:
 On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
 Based on the discussions over the last couple of weeks
 I have updated the device fd file layout proposal and
 tried to specify it a bit more formally.

 ===

 1.  Overview

   This specification describes the layout of device files
   used in the context of vfio, which gives user space
   direct access to I/O devices that have been bound to
   vfio.

   When a device fd is opened and read, offset 0x0 contains
   a fixed sized header followed by a number of variable length
   records that describe different characteristics
   of the device-- addressable regions, interrupts, etc.

   0x0  +-+-+
        |         magic             | u32  // identifies this as a vfio
 device file
        +---+         and identifies the type of bus
        |         version           | u32  // specifies the version of this
        +---+
        |         flags             | u32  // encodes any flags
        +---+
        |  dev info record 0        |
        |    type                   | u32   // type of record
        |    rec_len                | u32   // length in bytes of record
        |                           |          (including record header)
        |    flags                  | u32   // type specific flags
        |    ...content...          |       // record content, which could
        +---+       // include sub-records
        |  dev info record 1        |
        +---+
        |  dev info record N        |
        +---+

 I really should have chimed in on this earlier, but I've been very
 busy.

 Um, not to put too fine a point on it, this is madness.

 Yes, it's very flexible and can thereby cover a very wide range of
 cases.  But it's much, much too complex.  Userspace has to parse a
 complex, multilayered data structure, with variable length elements
 just to get an address at which to do IO.  I can pretty much guarantee
 that if we went with this, most userspace programs using this
 interface would just ignore this metadata and directly map the
 offsets at which they happen to know the kernel will put things for
 the type of device they care about.

 _At least_ for PCI, I think the original VFIO layout of each BAR at a
 fixed, well known offset is much better.  Despite its limitations,
 just advertising a device type ID which describes one of a few fixed
 layouts would be preferable to this.  I'm still hoping, that we can do
 a bit better than that.  But we should try really hard to at the very
 least force the metadata into a simple array of resources each with a
 fixed size record describing it, even if it means some space wastage
 with occasionally-used fields.  Anything more complex than that and
 userspace is just never going to use it properly.

So, is your issue really the variable length nature of what was
proposed?

I don't think it would be that hard to make the different resources
fixed length.   I think we have 2 types of resources now-- address
regions and interrupts.

The only thing that get's a bit tricky is device tree paths, which
are obviously variable length.

We could put a description of all the resources in an array with
each element being something like 4KB??

Stuart
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Sasha Levin
On Mon, 2011-09-26 at 22:45 +0300, Pekka Enberg wrote:
 On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote:
  Interesting.  This is a theoretical issue, correct?
  Not a crash you actually see.
 
  Actually it was an actual crash caused when our virtio-net driver in kvm
  tools did funny things and passed '(u32)-1' length as a buffer length to
  the guest kernel.
 
 I'm not sure what Michael means with theoretical issue here. Can the guest
 driver assume that the hypervisor doesn't attempt to do nasty things?

afaik if the hypervisor can access the vcpus and the memory of the
guest, this shouldn't be a security issue - more of a bug prevention
issue.

I guess it'll be interesting the other way around, when it's the guest
that passes this buggy information to the hypervisor.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Stuart Yoder

 The other obvious possibility is a pure ioctl interface.  To match what
 this proposal is trying to describe, plus the runtime interfaces, we'd
 need something like:

 /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
 #define VFIO_DEVICE_GET_FLAGS                   _IOR(, , u64)


 /* Return number of mmio/iop/config regions.
  * For PCI this is always 8 (BAR0-5 + ROM + Config) */
 #define VFIO_DEVICE_GET_NUM_REGIONS             _IOR(, , int)

 /* Return length for region index (may be zero) */
 #define VFIO_DEVICE_GET_REGION_LEN              _IOWR(, , u64)

 /* Return flags for region index
  * :0 - mmap'able, :1 - read-only, 63:2 - reserved */
 #define VFIO_DEVICE_GET_REGION_FLAGS            _IOR(, , u64)

 /* Return file offset for region index */
 #define VFIO_DEVICE_GET_REGION_OFFSET           _IOWR(, , u64)

 /* Return physical address for region index - not implemented for PCI */
 #define VFIO_DEVICE_GET_REGION_PHYS_ADDR        _IOWR(, , u64)



 /* Return number of IRQs (Not including MSI/MSI-X for PCI) */
 #define VFIO_DEVICE_GET_NUM_IRQ                 _IOR(, , int)

 /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */
 #define VFIO_DEVICE_SET_IRQ_EVENTFD             _IOW(, , int)

 /* Unmask IRQ index */
 #define VFIO_DEVICE_UNMASK_IRQ                  _IOW(, , int)

 /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */
 #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD      _IOW(, , int)


 /* Return the device tree path for type/index into the user
  * allocated buffer */
 struct dtpath {
        u32     type; (0 = region, 1 = IRQ)
        u32     index;
        u32     buf_len;
        char    *buf;
 };
 #define VFIO_DEVICE_GET_DTPATH                  _IOWR(, , struct dtpath)

 /* Return the device tree index for type/index */
 struct dtindex {
        u32     type; (0 = region, 1 = IRQ)
        u32     index;
        u32     prop_type;
        u32     prop_index;
 };
 #define VFIO_DEVICE_GET_DTINDEX                 _IOWR(, , struct dtindex)


 /* Reset the device */
 #define VFIO_DEVICE_RESET                       _IO(, ,)


 /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
 #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS        _IOW(, , int)
 #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS       _IOW(, , int)

 Hope that covers it.  Something I prefer about this interface is that
 everything can easily be generated on the fly, whereas reading out a
 table from the device means we really need to have that table somewhere
 in kernel memory to easily support reading random offsets.  Thoughts?

I think this could work, but I'm not sure it makes the problem David
had any better--  you substitute the complexity of parsing the
variable length regions with invoking a set of APIs.

Stuart
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb

2011-09-26 Thread Pekka Enberg
On Mon, Sep 26, 2011 at 10:45:35PM +0300, Pekka Enberg wrote:
 I'm not sure what Michael means with theoretical issue here. Can the guest
 driver assume that the hypervisor doesn't attempt to do nasty things?

On Mon, Sep 26, 2011 at 10:57 PM, Michael S. Tsirkin m...@redhat.com wrote:
 IMO yes, hypervisor has full access to guest memory so it's a safe
 assumption. But surviving in the face of hypervisor bugs
 is laudable goal, bugs do happen.

I was thinking of a compromised guest that is able to trick the hypervisor into
doing nasty things to other guests without taking over the hypervisor
completely. So for something like virtio networking, that's by
definition exposed
to rest of the world, I think it's very important not to be robust
against hypervisor
bugs.

In any case, we were able to trigger this particular case rather easily with our
buggy tool, so it's definitely worth fixing. ;-)

FWIW,

Acked-by: Pekka Enberg penb...@kernel.org

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-26 Thread Scott Wood
On 09/24/2011 02:47 AM, Alexander Graf wrote:
 
 On 22.09.2011, at 08:50, Liu Yu-B13201 wrote:
 


 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org 
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf
 Sent: Tuesday, September 20, 2011 7:36 AM
 To: kvm-...@vger.kernel.org
 Cc: kvm@vger.kernel.org
 Subject: [PATCH] KVM: PPC: E500: Support hugetlbfs

 With hugetlbfs support emerging on e500, we should also support KVM
 backing its guest memory by it.

 This patch adds support for hugetlbfs into the e500 shadow mmu code.

 Signed-off-by: Alexander Graf ag...@suse.de
 ---
 arch/powerpc/kvm/e500_tlb.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

 diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
 index ec17148..64f75eb 100644
 --- a/arch/powerpc/kvm/e500_tlb.c
 +++ b/arch/powerpc/kvm/e500_tlb.c
 @@ -24,6 +24,7 @@
 #include linux/sched.h
 #include linux/rwsem.h
 #include linux/vmalloc.h
 +#include linux/hugetlb.h
 #include asm/kvm_ppc.h
 #include asm/kvm_e500.h

 @@ -673,13 +674,34 @@ static inline void 
 kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 pfn = ~(tsize_pages - 1);
 break;
 }
 +   } else if (vma  hva = vma-vm_start 
 +   (vma-vm_flags  VM_HUGETLB)) {

 Why check (vma  hva = vma-vm_start) twice?
 
 What would you do? :)

I think he's just complaining about doing the check twice, in which case
the answer could be it avoids extra indentation and the compiler should
be able to factor out the common subexpression.

 In fact, I only copied the vm_start condition from the pfn code.
 Scott, why do we have to check this in the first place? We're calling
 find_vma. Can that return a vma that does not cover the hva we're
 passing in?

Yes, it can.  From find_vma():

/* Look up the first VMA which satisfies  addr  vm_end,  NULL if none. */

You'll find similar checks in a lot of other places where find_vma() is
used.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-26 Thread Scott Wood
On 09/24/2011 02:44 AM, Alexander Graf wrote:
 On 20.09.2011, at 19:54, Scott Wood wrote:
 On 09/19/2011 06:35 PM, Alexander Graf wrote:
 +   asm (PPC_CNTLZL %0,%1 : =r (lz) : r (psize));
 +   tsize = min(21 - lz, tsize);

 No need to open-code the cntlz and subtract-from-21:

  tsize = min(ilog2(psize) - 10, tsize);

  /*
   * e500 doesn't implement the lowest tsize bit,
   * or 1K pages.
   */
  tsize = max(BOOK3E_PAGESZ_4K, tsize  ~1);

 There's still an open-coded subtraction of 10, but that relates more
 straightforwardly to the definition of tsize (and could be factored out
 into a size-to-tsize function).
 
 Yeah, no need to micro-optimized those few bits. The reason I used the asm 
 statement was that I copied the hugetlbfs code which does it that way :).

The 21 - lz thing is broken on 64-bit, FWIW.  It works by chance in
the hugetlbfs code (and in settlbcam) because it results in tsize being
too low by 32, which only affects bits that subsequently get masked off.

 }

 up_read(current-mm-mmap_sem);
 }

 if (likely(!pfnmap)) {
 +   unsigned long tsize_pages = 1  (tsize - 2);

 1  (tsize + 10 - PAGE_SHIFT);
 
 Are we getting variable page sizes anytime soon? Will change it nevertheless, 
 just curious :).

Nothing imminent on our chips AFAIK, but there's some 64K page support
in Linux for IBM's book3e chips, and it's not nice to hardcode
(especially in a hidden way) regardless.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Alex Williamson
On Mon, 2011-09-26 at 15:03 -0500, Stuart Yoder wrote:
 
  The other obvious possibility is a pure ioctl interface.  To match what
  this proposal is trying to describe, plus the runtime interfaces, we'd
  need something like:
 
  /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
  #define VFIO_DEVICE_GET_FLAGS   _IOR(, , u64)
 
 
  /* Return number of mmio/iop/config regions.
   * For PCI this is always 8 (BAR0-5 + ROM + Config) */
  #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int)
 
  /* Return length for region index (may be zero) */
  #define VFIO_DEVICE_GET_REGION_LEN  _IOWR(, , u64)
 
  /* Return flags for region index
   * :0 - mmap'able, :1 - read-only, 63:2 - reserved */
  #define VFIO_DEVICE_GET_REGION_FLAGS_IOR(, , u64)
 
  /* Return file offset for region index */
  #define VFIO_DEVICE_GET_REGION_OFFSET   _IOWR(, , u64)
 
  /* Return physical address for region index - not implemented for PCI */
  #define VFIO_DEVICE_GET_REGION_PHYS_ADDR_IOWR(, , u64)
 
 
 
  /* Return number of IRQs (Not including MSI/MSI-X for PCI) */
  #define VFIO_DEVICE_GET_NUM_IRQ _IOR(, , int)
 
  /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */
  #define VFIO_DEVICE_SET_IRQ_EVENTFD _IOW(, , int)
 
  /* Unmask IRQ index */
  #define VFIO_DEVICE_UNMASK_IRQ  _IOW(, , int)
 
  /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */
  #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD  _IOW(, , int)
 
 
  /* Return the device tree path for type/index into the user
   * allocated buffer */
  struct dtpath {
 u32 type; (0 = region, 1 = IRQ)
 u32 index;
 u32 buf_len;
 char*buf;
  };
  #define VFIO_DEVICE_GET_DTPATH  _IOWR(, , struct dtpath)
 
  /* Return the device tree index for type/index */
  struct dtindex {
 u32 type; (0 = region, 1 = IRQ)
 u32 index;
 u32 prop_type;
 u32 prop_index;
  };
  #define VFIO_DEVICE_GET_DTINDEX _IOWR(, , struct dtindex)
 
 
  /* Reset the device */
  #define VFIO_DEVICE_RESET   _IO(, ,)
 
 
  /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
  #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS_IOW(, , int)
  #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS   _IOW(, , int)
 
  Hope that covers it.  Something I prefer about this interface is that
  everything can easily be generated on the fly, whereas reading out a
  table from the device means we really need to have that table somewhere
  in kernel memory to easily support reading random offsets.  Thoughts?
 
 I think this could work, but I'm not sure it makes the problem David
 had any better--  you substitute the complexity of parsing the
 variable length regions with invoking a set of APIs.

I read it as mostly the complexity problem, which I think this makes
fairly trivial.  It also eliminates a lot of complexity on the kernel
side of supporting the table driven interface.  Thanks,

Alex

if (!(GET_FLAGS  PCI))
return error;

if (GET_NUM_REGIONS  8)
return error;

GET_REGION_LEN(7)
GET_REGION_OFFSET(7) // setup config access

for (i = 0; i  6; i++) {
if (GET_REGION_LEN(i)) {
GET_REGION_OFFSET(i)
setup mmap/rw
}
}

if (GET_REGION_LEN(6)) {
GET_REGION_OFFSET(6)
setup ROM access
}

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-26 Thread Christian Benvenuti (benve)
 -Original Message-
 From: Roopa Prabhu (roprabhu)
 Sent: Thursday, September 15, 2011 6:47 AM
 To: Michael S. Tsirkin
 Cc: Sridhar Samudrala; net...@vger.kernel.org;
 dragos.tatu...@gmail.com; a...@arndb.de; David Wang (dwang2);
Christian
 Benvenuti (benve); ka...@trash.net; da...@davemloft.net;
 eric.duma...@gmail.com; mc...@broadcom.com; kvm@vger.kernel.org
 Subject: Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address
 filtering support for passthru mode
 
 
 
 The netlink patch is still in the works. I will post the patches after
 I
 clean it up a bit and also accommodate or find answers to most
 questions
 discussed for non-passthru case. Thought I will post the netlink
 interface
 here to see if anyone has any early comments. I have a
 rtnl_link_ops-set_rx_filter defined.
 
 [IFLA_RX_FILTER] = {
 [IFLA_ADDRESS_FILTER] = {
 [IFLA_ADDRESS_FILTER_FLAGS]
 [IFLA_ADDRESS_LIST] = {
 [IFLA_ADDRESS_LIST_ENTRY]
 }
 }
 [IFLA_VLAN_FILTER] = {
 [IFLA_VLAN_LIST] = {
 [IFLA_VLAN]
 }
 }
 }
 
 Some open questions:
 - The VLAN filter above shows a VLAN list. It could also be a
 bitmap or
 the interface could provide both a bitmap and VLAN list for more
 flexibility
 . Like the below
 
 [IFLA_RX_FILTER] = {
 [IFLA_ADDRESS_FILTER] = {
 [IFLA_ADDRESS_FILTER_FLAGS]
 [IFLA_ADDRESS_LIST] = {
 [IFLA_ADDRESS_LIST_ENTRY]
 }
 }
 [IFLA_VLAN_FILTER] = {
 [IFLA_VLAN_BITMAP]
 [IFLA_VLAN_LIST] = {
 [IFLA_VLAN]
 }
 }
 }

The simplest interface probably is to stick to a bitmap (knowing that in
the worst
case it will take 256 bytes, but we can compress it ...), because
sending
a vlan list may end up requiring much more than that (on interfaces
configured as trunks).
This regardless of whether the most common use case is that of a server
configured
with just few vlans or that of a switch configured with few trunks.

Another option would be a list of ranges, but that one would not work
well
in those cases where trunks are configured, for example, to carry big
numbers
of odd or even vlan IDs or other groups of vlans IDs that cannot be
grouped
into ranges. Probably an acceptable compromise, if we care about the
size
of this attribute, would be:
- to use a list of IDs for less than 256 vlans (or a list of ranges)
- to use a bitmap for more than 256 vlan.

I would recommend the two attributes (IFLA_VLAN_BITMAP and
IFLA_VLAN_LIST)
to be mutually exclusive to reduce the complexity of the merging and
error/misconfig detection code.

 - Do you see any advantage in keeping Unicast and multicast
address
 list
 separate ? Something like the below :
 [IFLA_RX_FILTER] = {
 [IFLA_ADDRESS_FILTER_FLAGS]
 [IFLA_UC_ADDRESS_FILTER] = {
 [IFLA_ADDRESS_LIST] = {
 [IFLA_ADDRESS_LIST_ENTRY]
 }
 }
 [IFLA_MC_ADDRESS_FILTER] = {
 [IFLA_ADDRESS_LIST] = {
 [IFLA_ADDRESS_LIST_ENTRY]
 }
 }
 [IFLA_VLAN_FILTER] = {
 [IFLA_VLAN_LIST] = {
 [IFLA_VLAN]
 }
 }
 }

I personally like the idea of grouping UC and MC addresses into two
distinct
attributes/groups.
The receiver (the kernel) would have to check them anyway (I suppose),
but
I like the idea of having the kernel being able to detect the case
where, for
example, the user configures a MC address thinking he is actually
configuring
a UC address.

Most probably the iproute2 commands used to configure/add/del UC and MC
address
will be assigned two different keywords.
BTW, once this code will be in, it will be possible for ip link show
to show
all UC/MC MAC addresses; right now ip link only shows dev-dev_addr.
This output is useful for debugging.

The output from ip maddr only shows the MC list and anyway I think the
best
place for the list of MAC addresses is ip link show.

Would ip link show also show the list of vlans?
Probably, best would be to add new flags (to ask for the extended
output) or
simply add the extra output (uc/mc/vlan lists) under the already
existent -s flag ?

 - Is there any need to keep address and vlan filters separate. And
 have
 two rtnl_link_ops, set_rx_address_filter, set_rx_vlan_filter ?. I
don't
 see
 one .
 
 [IFLA_RX_ADDRESS_FILTER] = {
 [IFLA_ADDRESS_FILTER_FLAGS]
 [IFLA_ADDRESS_LIST] = {
 [IFLA_ADDRESS_LIST_ENTRY]
 }
 }
 [IFLA_RX_VLAN_FILTER] = {
 [IFLA_VLAN_LIST] = {
 [IFLA_VLAN]
 }
 }

I think both approaches are good.
Anyway, given that you can have/configure nested vlans, having
IFLA_RX_VLAN_FILTER
inside IFLA_RX_FILTER would be syntactically correct too.

/Chris

 
 Thanks,
 Roopa
 
 
 
 On 9/12/11 10:02 AM, Roopa Prabhu ropra...@cisco.com wrote:
 
 
 
 
  On 9/11/11 12:03 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Sun, Sep 11, 

Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Scott Wood
On 09/26/2011 01:34 PM, Alex Williamson wrote:
 The other obvious possibility is a pure ioctl interface.  To match what
 this proposal is trying to describe, plus the runtime interfaces, we'd
 need something like:
 
 /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
 #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64)
 
 
 /* Return number of mmio/iop/config regions.
  * For PCI this is always 8 (BAR0-5 + ROM + Config) */
 #define VFIO_DEVICE_GET_NUM_REGIONS   _IOR(, , int)

How do you handle BARs that a particular device doesn't use?  Zero-length?

 /* Return the device tree path for type/index into the user
  * allocated buffer */
 struct dtpath {
   u32 type; (0 = region, 1 = IRQ)
   u32 index;
   u32 buf_len;
   char*buf;
 };
 #define VFIO_DEVICE_GET_DTPATH_IOWR(, , struct dtpath)

So now the user needs to guess a buffer length in advance... and what
happens if it's too small?

 /* Reset the device */
 #define VFIO_DEVICE_RESET _IO(, ,)

What generic way do we have to do this?  We should probably have a way
to determine whether it's possible, without actually asking to do it.

 /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
 #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS  _IOW(, , int)
 #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int)
 
 Hope that covers it.

It could be done this way, but I predict that the code (both kernel and
user side) will be larger.  Maybe not much more complex, but more
boilerplate.

How will you manage extensions to the interface?  With the table it's
simple, you see a new (sub)record type and you either understand it or
you skip it.  With ioctls you need to call every information-gathering
ioctl you know and care about (or are told is present via some feature
advertisement), and see if there's anything there.

 Something I prefer about this interface is that
 everything can easily be generated on the fly, whereas reading out a
 table from the device means we really need to have that table somewhere
 in kernel memory to easily support reading random offsets.  Thoughts?

The table should not be particularly large, and you'll need to keep the
information around in some form regardless.  Maybe in the PCI case you
could produce it dynamically (though I probably wouldn't), but it really
wouldn't make sense in the device tree case.

You also lose the ability to easily have a human look at the hexdump for
debugging; you'll need a special lsvfio tool.  You might want one
anyway to pretty-print the info, but with ioctls it's mandatory.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-26 Thread Kevin O'Connor
On Mon, Sep 26, 2011 at 10:04:13AM +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote:
  On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote:
   On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote:
The code to generate basic SSDT code isn't that difficult (see
build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
patch the DSDT versus just generating the necessary blocks in an SSDT?
   
   I don't really care whether the code is in DSDT or SSDT,
   IMO there isn't much difference between build_ssdt and patching:
   main reason is build_ssdt uses offsets hardcoded to a specific binary
   (ssdt_proc and SD_OFFSET_* ) while I used
   a script to extract offsets.
  
  Yes - your script to extract the offsets is nice.
 
 If you still have doubts,
 it might make sense to merge just patch 1 -
 acpi: generate and parse mixed asl/aml listing
 - as the first step.
 With the infrastructure in place it will be
 easier to discuss the best way to use it.

I'm okay with your first patch.  However, I wish to tag a release
before committing ACPI changes.  There was a concern raised with
two-pass PCI initialization that I need to follow up on before
tagging.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Scott Wood
On 09/26/2011 02:57 PM, Stuart Yoder wrote:
 On Mon, Sep 26, 2011 at 2:51 AM, David Gibson
 Um, not to put too fine a point on it, this is madness.

 Yes, it's very flexible and can thereby cover a very wide range of
 cases.  But it's much, much too complex.  Userspace has to parse a
 complex, multilayered data structure, with variable length elements
 just to get an address at which to do IO.  I can pretty much guarantee
 that if we went with this, most userspace programs using this
 interface would just ignore this metadata and directly map the
 offsets at which they happen to know the kernel will put things for
 the type of device they care about.

Then they deserve to break when those offsets change, just like with all
the other bad assumptions poorly written code tends to make.

Really, it should not be that hard to parse this.  A loop with a switch
statement for toplevel records, and another loop with a switch statement
for each context in which subrecords can appear.

 _At least_ for PCI, I think the original VFIO layout of each BAR at a
 fixed, well known offset is much better.

This is what grew out of attempting to address the needs of assigning
non-PCI devices based on the device tree.  Personally, I'd rather be
dealing with this for PCI devices as well.

  Despite its limitations,
 just advertising a device type ID which describes one of a few fixed
 layouts would be preferable to this.

That's already more information than the original VFIO layout provided.

 So, is your issue really the variable length nature of what was
 proposed?
 
 I don't think it would be that hard to make the different resources
 fixed length.   I think we have 2 types of resources now-- address
 regions and interrupts.
 
 The only thing that get's a bit tricky is device tree paths, which
 are obviously variable length.
 
 We could put a description of all the resources in an array with
 each element being something like 4KB??

I'm not sure what that would improve.  If the user wants to put a limit
on the size of a certain field it's willing to handle, so be it.  It'll
break in the same set of cases that would be unexpressable with such a
limitation in the interface, except that since the breakage is not in
the interface, it's more easily fixable.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How many threads should a kvm vm be starting?

2011-09-26 Thread Thomas Fjellstrom
I just noticed something interesting, a virtual machine on one of my servers 
seems to have 69 threads (including the main thread). Other guests on the 
machine only have a couple threads.

Is this normal? or has something gone horribly wrong?

-- 
Thomas Fjellstrom
tho...@fjellstrom.ca
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Alex Williamson
On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote:
 On 09/26/2011 01:34 PM, Alex Williamson wrote:
  The other obvious possibility is a pure ioctl interface.  To match what
  this proposal is trying to describe, plus the runtime interfaces, we'd
  need something like:
  
  /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
  #define VFIO_DEVICE_GET_FLAGS   _IOR(, , u64)
  
  
  /* Return number of mmio/iop/config regions.
   * For PCI this is always 8 (BAR0-5 + ROM + Config) */
  #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int)
 
 How do you handle BARs that a particular device doesn't use?  Zero-length?

Yep

  /* Return the device tree path for type/index into the user
   * allocated buffer */
  struct dtpath {
  u32 type; (0 = region, 1 = IRQ)
  u32 index;
  u32 buf_len;
  char*buf;
  };
  #define VFIO_DEVICE_GET_DTPATH  _IOWR(, , struct dtpath)
 
 So now the user needs to guess a buffer length in advance... and what
 happens if it's too small?

-ENOSPC.  Call with buf_len = 0 and it could indicate the size.

  /* Reset the device */
  #define VFIO_DEVICE_RESET   _IO(, ,)
 
 What generic way do we have to do this?  We should probably have a way
 to determine whether it's possible, without actually asking to do it.

It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a
bit to the device flags to indicate if it's available or we could add a
probe arg to the ioctl to either check for existence or do it.

  /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
  #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS_IOW(, , int)
  #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS   _IOW(, , int)
  
  Hope that covers it.
 
 It could be done this way, but I predict that the code (both kernel and
 user side) will be larger.  Maybe not much more complex, but more
 boilerplate.
 
 How will you manage extensions to the interface?

I would assume we'd do something similar to the kvm capabilities checks.
I don't know if that's just bits of GET_FLAGS or a different ioctl.

 With the table it's
 simple, you see a new (sub)record type and you either understand it or
 you skip it.  With ioctls you need to call every information-gathering
 ioctl you know and care about (or are told is present via some feature
 advertisement), and see if there's anything there.

I don't really see much difference between the interfaces here.  You'd
pick and choose which table entries you care about and pick and choose
ioctls.  For one you see it in the table, for the other there's a bit
indicating the capability exists.

  Something I prefer about this interface is that
  everything can easily be generated on the fly, whereas reading out a
  table from the device means we really need to have that table somewhere
  in kernel memory to easily support reading random offsets.  Thoughts?
 
 The table should not be particularly large, and you'll need to keep the
 information around in some form regardless.  Maybe in the PCI case you
 could produce it dynamically (though I probably wouldn't), but it really
 wouldn't make sense in the device tree case.

It would be entirely dynamic for PCI, there's no advantage to caching
it.  Even for device tree, if you can't fetch it dynamically, you'd have
to duplicate it between an internal data structure and a buffer reading
the table.

 You also lose the ability to easily have a human look at the hexdump for
 debugging; you'll need a special lsvfio tool.  You might want one
 anyway to pretty-print the info, but with ioctls it's mandatory.

I don't think this alone justifies duplicating the data and making it
difficult to parse on both ends.  Chances are we won't need such a tool
for the ioctl interface because it's easier to get it right the first
time ;)

Note that I'm not stuck on this interface, I was just thinking about how
to generate the table last week, it seemed like a pain so I thought I'd
spend a few minutes outlining an ioctl interface... turns out it's not
so bad.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kvm tools: Support multiple net devices

2011-09-26 Thread Asias He
On 09/27/2011 01:44 AM, Sasha Levin wrote:
 This patch adds support for multiple network devices. The command line syntax
 changes to the following:
 
   --network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip=
 ip][,guest_mac=mac][,script=file]
 
 Each of the parameters is optional, and the config defaults to a TAP based
 networking with a sequential MAC.
 
 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Acked-by: Asias He asias.he...@gmail.com

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-26 Thread Scott Wood
On 09/24/2011 02:44 AM, Alexander Graf wrote:
 On 20.09.2011, at 19:54, Scott Wood wrote:
 On 09/19/2011 06:35 PM, Alexander Graf wrote:
 +   asm (PPC_CNTLZL %0,%1 : =r (lz) : r (psize));
 +   tsize = min(21 - lz, tsize);

 No need to open-code the cntlz and subtract-from-21:

  tsize = min(ilog2(psize) - 10, tsize);

  /*
   * e500 doesn't implement the lowest tsize bit,
   * or 1K pages.
   */
  tsize = max(BOOK3E_PAGESZ_4K, tsize  ~1);

 There's still an open-coded subtraction of 10, but that relates more
 straightforwardly to the definition of tsize (and could be factored out
 into a size-to-tsize function).
 
 Yeah, no need to micro-optimized those few bits. The reason I used the asm 
 statement was that I copied the hugetlbfs code which does it that way :).

The 21 - lz thing is broken on 64-bit, FWIW.  It works by chance in
the hugetlbfs code (and in settlbcam) because it results in tsize being
too low by 32, which only affects bits that subsequently get masked off.

 }

 up_read(current-mm-mmap_sem);
 }

 if (likely(!pfnmap)) {
 +   unsigned long tsize_pages = 1  (tsize - 2);

 1  (tsize + 10 - PAGE_SHIFT);
 
 Are we getting variable page sizes anytime soon? Will change it nevertheless, 
 just curious :).

Nothing imminent on our chips AFAIK, but there's some 64K page support
in Linux for IBM's book3e chips, and it's not nice to hardcode
(especially in a hidden way) regardless.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html