[Qemu-devel] Virtualization DevRoom at FOSDEM 2013

2012-11-16 Thread Chris Wright
Following on the heels of a successful KVM Forum and oVirt Workshop,
FOSDEM will be hosting a Virtualization DevRoom in February.  If you've
been to FOSDEM before, you know this is about developers and code, not
products.

Presentation proposals are due by December 16th 2012.

The full details are here:

 http://osvc.v2.cs.unibo.it/index.php/Main_Page

With the relevant topics being:

Topics covered will include, but not limited to:
 - machine virtualization (e.g. KVM, Xen, VirtualBox,...)
 - network virtualization (e.g. openvstack, vale, vde, Open vSwitch,...)
 - process level virtualization, flexible kernels (e.g. rump anykernel, 
view-os, ...)
 - virt management (e.g. ganeti, libvirt, ovirt, XCP, ...)

thanks,
-chris



Re: [Qemu-devel] [PATCH] Add nvram to default boot device list

2012-10-11 Thread Chris Wright
* Alexander Graf (ag...@suse.de) wrote:
 On 12.10.2012, at 02:28, David Gibson wrote:
  On Fri, Oct 12, 2012 at 02:03:00AM +0200, Alexander Graf wrote:
  On 12.10.2012, at 00:59, David Gibson wrote:
  On Thu, Oct 11, 2012 at 07:34:42AM +0530, Avik Sil wrote:
  This patch adds nvram specified boot device into qemu default
  boot_devices list. This helps firmware to boot from nvram specified
  boot device if no -boot option is specified.
  
  I really don't think this is a good idea, it extends an already
  deprecated mechanism in a fuzzy way and requires careful checking to
  see if it could break anything.  On all platforms the boot sequence
  should be:
if bootindex is specified:
boot according to bootindex
else if -boot is specified:
boot according to -boot sequence
else:
use platform firmware default sequence
  
  The last will of course vary by platform, and could depend on platform
  details like the contents of NVRAM.  Your original idea of making it
  clear to the guest when -boot has been specified (as opposed to when
  it contains its default value) was the right one, and this x in
  -boot is going the wrong direction.
  
  Given that this is a fundamental direction for a bunch of machines,
  how about we talk about it on the weekly QEMU call?
  
  Uh, is this a call I know about?
 
 I would hope so. Chris / Juan, who is in charge of the phone numbers these 
 days?

Added David to the invite which contains the call details (very
unfriendly time for .au I'm afraid, 14:00 UTC).



[Qemu-devel] KVM Forum 2012 Call For Participation

2012-08-02 Thread Chris Wright
For some reason I'm not seeing this on the qemu list, so here's a fwd

- Forwarded message from KVM Forum 2012 Program Committee 
kvm-forum-2012...@redhat.com -

Date: Fri, 27 Jul 2012 16:31:45 -0700
From: KVM Forum 2012 Program Committee kvm-forum-2012...@redhat.com
To: k...@vger.kernel.org, libvir-l...@redhat.com, qemu-devel@nongnu.org,
virtualizat...@lists.linux-foundation.org
Cc: kvm-forum-2012...@redhat.com
Subject: KVM Forum 2012 Call For Participation

=
KVM Forum 2012: Call For Participation
November 7-9, 2012 - Hotel Fira Palace - Barcelona, Spain

(All submissions must be received before midnight Aug 31st, 2012)
=

KVM is an industry leading open source hypervisor that provides
an ideal platform for datacenter virtualization, virtual desktop
infrastructure, and cloud computing.  Once again, it's time to bring
together the community of developers and users that define the KVM
ecosystem for our annual technical conference.  We will discuss the
current state of affairs and plan for the future of KVM, its surrounding
infrastructure, and management tools.  We are also excited to announce
the oVirt Workshop will run in parallel with the KVM Forum, bringing in
a community focused on enterprise datacenter virtualization management
built on KVM.  For topics which overlap we will have shared sessions.
So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon,
Based on feedback from last year, this time it's LinuxCon Europe!
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2012.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - PCI device assignment
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - VEPA, VN-Link, vswitch
 - Security
 - Architecture ports
 
 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support
 
 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized
 
 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management
 
 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: Aug 31st, 2012
Notification: Sep 14th, 2012

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Sep 14th.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

LIGHTNING TALKS

In addition to submitted talks we will also have some room for lightning
talks. These are short (5 minute) discussions to highlight new work or
ideas that aren't complete enough to warrant a full presentation slot.
Lightning talk submissions and scheduling will be handled on-site at
KVM Forum.

HOTEL / TRAVEL

The KVM Forum 2012 will be held in Barcelona, Spain at the Hotel Fira 

Re: [Qemu-devel] QEMU hacking session/day at KVM Forum 2012?

2012-07-30 Thread Chris Wright
* Anthony Liguori (aligu...@us.ibm.com) wrote:
 Peter Maydell peter.mayd...@linaro.org writes:
  Last year at KVM Forum, in addition to the scheduled talks we also
  had an informal hacking session on one of the following days, since
  we were colocated with LinuxCon NA and most people were still around
  afterwards.
 
  I thought this was really useful and I think it would be good if we
  could arrange something similar this year. This year we're colo'd
  with LinuxCon Europe (which is 5th-7th November with KVM Forum being
  7th-9th), so I guess that one of the two days beforehand might
  be usable.
 
 Do you think we can get a room at some point before/after the main
 conference for a hackathon?  Would be interesting to try and combine it
 with an oVirt hackathon too and get everyone in the same room.

We have one extra room (moderate capacity) that can be used Thur/Fri
ad-hoc.  I think we can get that space earlier in the week as well.

thanks,
-chris



Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year

2012-03-17 Thread Chris Wright
* Natalia Portillo (clau...@claunia.com) wrote:
 QEMU hosted on Haiku would be interesting.

The fun of Haiku
especially when it is
hosting QEMU



Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 02/07/2012 07:18 AM, Avi Kivity wrote:
 On 02/07/2012 02:51 PM, Anthony Liguori wrote:
 On 02/07/2012 06:40 AM, Avi Kivity wrote:
 On 02/07/2012 02:28 PM, Anthony Liguori wrote:
 
 It's a potential source of exploits
 (from bugs in KVM or in hardware). I can see people wanting to be
 selective with access because of that.
 
 As is true of the rest of the kernel.
 
 If you want finer grain access control, that's exactly why we have things 
 like
 LSM and SELinux. You can add the appropriate LSM hooks into the KVM
 infrastructure and setup default SELinux policies appropriately.
 
 LSMs protect objects, not syscalls. There isn't an object to protect here
 (except the fake /dev/kvm object).
 
 A VM can be an object.
 
 Not really, it's not accessible in a namespace. How would you label it?

A VM, vcpu, etc are all objects.  The labelling can be implicit based on
the security context of the process creating the object.  You could create
simplistic rules such as a process may have the ability KVM__VM_CREATE
(this is roughly analogous to the PROC__EXECMEM policy control that
allows some processes to create executable writable memory mappings, or
SHM__CREATE for a process that can create a shared memory segment).
Adding some label mgmt to the object (add -security and some callbacks to
do -alloc/init/free), and then checks on the object itself would allow
for finer grained protection.  If there was any VM lookup (although the
original example explicitly ties a process to a vm and a thread to a
vcpu) the finer grained check would certainly be useful to verify that
the process can access the VM.

 Labels can originate from userspace, IIUC, so I think it's possible for QEMU
 (or whatever the userspace is) to set the label for the VM while it's
 creating it. I think this is how most of the labeling for X and things of
 that nature works.

For X, the policy enforcement is done in the X server.  There is
assistance from the kernel for doing policy server queries (can foo do
bar?), but it's up to the X server to actually care enough to ask and
then fail a request that doesn't comply.  I'm not sure that's the model
here.

thanks,
-chris



Re: [Qemu-devel] [RFC] QEMU Code Audit Team

2012-01-06 Thread Chris Wright
* Corey Bryant (cor...@linux.vnet.ibm.com) wrote:
 Count me in for step 2.  A good approach may be to run a static
 analysis tool against the code, followed by a manual scan of the
 code for common vulnerabilities that static analysis can't find.

Good idea.  Folks are already running things like Coverity.  The false
positive rate is high enough that it's a lot to wade through at first
(so extra eyes could be quite helpful here).  Perhaps the people who
are involved in this could share some of their findings.

thanks,
-chris



Re: [Qemu-devel] [RFC] QEMU Code Audit Team

2012-01-06 Thread Chris Wright
* Anthony Liguori (aligu...@us.ibm.com) wrote:
 2) Two people walk through a particular piece of code and
 independently flag anything that looks like a potential security
 issue.

Auditing is always helpful, but won't ever get full coverage.  qtest +
fuzz is another great way to identify problems.  Also improving any
anotations to help static analysis tools is useful.  And both of those
are development efforts rather than code review.  Trouble with code
review is that security bugs can be subtle and easy to miss.

 I'd want to focus initially on the common PC devices.   The list
 isn't all that large and a review like this should only take a few
 hours to complete each step.

I definitely agree on the initial scope.

thanks,
-chris



Re: [Qemu-devel] KVM Call Agenda for 12/6 (Tuesday) @ 10am US/Eastern

2011-12-05 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 * Anthony Liguori (aligu...@us.ibm.com) wrote:
  1. A short introduction to each of the guest agents, what guests they
  support, and what verbs they support.
 
 I think we did this once before w/ Matahari.  Can we please capture
 these things in email before the call, so people actually have time
 to ponder the details.
 
  2. A short description of key requirements from each party (oVirt, libvirt,
  QEMU) for a guest agent
 
 Same here...call this the abstract/intro of the above detailed list of
 verbs and guest support, and send it by Friday this week.
 
 I know there's plenty of details buried in the current thread and old
 discussions of Matahari.  But that's just it...buried...

It's past Friday.  Barak's links are all we have so far...

thanks,
-chris



Re: [Qemu-devel] KVM Call Agenda for 12/6 (Tuesday) @ 10am US/Eastern

2011-11-30 Thread Chris Wright
* Anthony Liguori (aligu...@us.ibm.com) wrote:
 Hi,
 
 I'd like to propose that we discuss guest agent convergence in our next KVM
 call.  I've CC'd folks from oVirt and libvirt to join the discussion.
 
 I think we should probably attempt to have some structure to the discussion.
 I would suggest:
 
 1. A short introduction to each of the guest agents, what guests they
 support, and what verbs they support.

I think we did this once before w/ Matahari.  Can we please capture
these things in email before the call, so people actually have time
to ponder the details.

 2. A short description of key requirements from each party (oVirt, libvirt,
 QEMU) for a guest agent

Same here...call this the abstract/intro of the above detailed list of
verbs and guest support, and send it by Friday this week.

I know there's plenty of details buried in the current thread and old
discussions of Matahari.  But that's just it...buried...

 3. An open discussion about possible ways to collaborate/converge.

That should really help facilitate this item ;)

thanks,
-chris



Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-30 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Wed, 2011-11-30 at 21:52 +0530, Dipankar Sarma wrote:
  
  Also, if at all topology changes due to migration or host kernel decisions,
  we can make use of something like VPHN (virtual processor home node)
  capability on Power systems to have guest kernel update its topology
  knowledge. You can refer to that in
  arch/powerpc/mm/numa.c. 
 
 I think that fail^Wfeature of PPC is terminally broken. You simply
 cannot change the topology after the fact. 

Agreed, there's too many things that consult topology once and never
look back.



Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-21 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote:
  
  In the original post of this mail thread, I proposed a way to export
  guest RAM ranges (Guest Physical Address-GPA) and their corresponding host
  host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU 
  monitor).
  The idea was to use this GPA to HVA mappings from tools like libvirt to bind
  specific parts of the guest RAM to different host nodes. This needed an
  extension to existing mbind() to allow binding memory of a process(QEMU) 
  from a
  different process(libvirt). This was needed since we wanted to do all this 
  from
  libvirt.
  
  Hence I was coming from that background when I asked for extending
  ms_mbind() to take a tid parameter. If QEMU community thinks that NUMA
  binding should all be done from outside of QEMU, it is needed, otherwise
  what you have should be sufficient. 
 
 That's just retarded, and no you won't get such extentions. Poking at
 another process's virtual address space is just daft. Esp. if there's no
 actual reason for it.

Need to separate the binding vs the policy mgmt.  The policy mgmt could
still be done outside, whereas the binding could still be done from w/in
QEMU.  A simple monitor interface to rebalance vcpu memory allcoations
to different nodes could very well schedule vcpu thread work in QEMU.

So, I agree, even if there is some external policy mgmt, it could still
easily work w/ QEMU to use Peter's proposed interface.

thanks,
-chris



Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-08 Thread Chris Wright
* Alexander Graf (ag...@suse.de) wrote:
 On 29.10.2011, at 20:45, Bharata B Rao wrote:
  As guests become NUMA aware, it becomes important for the guests to
  have correct NUMA policies when they run on NUMA aware hosts.
  Currently limited support for NUMA binding is available via libvirt
  where it is possible to apply a NUMA policy to the guest as a whole.
  However multinode guests would benefit if guest memory belonging to
  different guest nodes are mapped appropriately to different host NUMA nodes.
  
  To achieve this we would need QEMU to expose information about
  guest RAM ranges (Guest Physical Address - GPA) and their host virtual
  address mappings (Host Virtual Address - HVA). Using GPA and HVA, any 
  external
  tool like libvirt would be able to divide the guest RAM as per the guest 
  NUMA
  node geometry and bind guest memory nodes to corresponding host memory nodes
  using HVA. This needs both QEMU (and libvirt) changes as well as changes
  in the kernel.
 
 Ok, let's take a step back here. You are basically growing libvirt into a 
 memory resource manager that know how much memory is available on which nodes 
 and how these nodes would possibly fit into the host's memory layout.
 
 Shouldn't that be the kernel's job? It seems to me that architecturally the 
 kernel is the place I would want my memory resource controls to be in.

I think that both Peter and Andrea are looking at this.  Before we commit
an API to QEMU that has a different semantic than a possible new kernel
interface (that perhaps QEMU could use directly to inform kernel of the
binding/relationship between vcpu thread and it's memory at VM startuup)
it would be useful to see what these guys are working on...

thanks,
-chris



Re: [Qemu-devel] Memory API code review

2011-09-14 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
 I would like to carry out an online code review of the memory API so that
 more people are familiar with the internals, and perhaps even to catch some
 bugs or deficiency.  I'd like to use the next kvm conference call slot for
 this (Tuesday 1400 UTC) since many people already have it reserved in the
 schedule.
 
 It would be great if people from the wider qemu community be present, rather
 than the usual x86 is everything crowd (+Jan) that usually participates in
 the kvm weekly call.
 
 Juan, Chris, can we dedicate next week's call to this?

Yup, sounds like a good idea.



Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote:
 On 8/26/11 7:07 AM, Alexander Graf ag...@suse.de wrote:
  Forget the KVM case for a moment and think of a user space device driver. I 
  as
  a user am not root. But I as a user when having access to /dev/vfioX want to
  be able to access the device and manage it - and only it. The admin of that
  box needs to set it up properly for me to be able to access it.
  
  So having two steps is really the correct way to go:
  
* create VFIO group
* use VFIO group
  
  because the two are done by completely different users.
 
 This is not the case for my userspace drivers using VFIO today.
 
 Each process will open vfio devices on the fly, and they need to be able to
 share IOMMU resources.

How do you share IOMMU resources w/ multiple processes, are the processes
sharing memory?

 So I need the ability to dynamically bring up devices and assign them to a
 group.  The number of actual devices and how they map to iommu domains is
 not known ahead of time.  We have a single piece of silicon that can expose
 hundreds of pci devices.

This does not seem fundamentally different from the KVM use case.

We have 2 kinds of groupings.

1) low-level system or topoolgy grouping

   Some may have multiple devices in a single group

   * the PCIe-PCI bridge example
   * the POWER partitionable endpoint

   Many will not

   * singleton group, e.g. typical x86 PCIe function (majority of
 assigned devices)

   Not sure it makes sense to have these administratively defined as
   opposed to system defined.

2) logical grouping

   * multiple low-level groups (singleton or otherwise) attached to same
 process, allowing things like single set of io page tables where
 applicable.

   These are nominally adminstratively defined.  In the KVM case, there
   is likely a privileged task (i.e. libvirtd) involved w/ making the
   device available to the guest and can do things like group merging.
   In your userspace case, perhaps it should be directly exposed.

 In my case, the only administrative task would be to give my processes/users
 access to the vfio groups (which are initially singletons), and the
 application actually opens them and needs the ability to merge groups
 together to conserve IOMMU resources (assuming we're not going to expose
 uiommu).

I agree, we definitely need to expose _some_ way to do this.

thanks,
-chris



Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote:
 On 8/26/11 12:35 PM, Chris Wright chr...@sous-sol.org wrote:
  * Aaron Fabbri (aafab...@cisco.com) wrote:
  Each process will open vfio devices on the fly, and they need to be able to
  share IOMMU resources.
  
  How do you share IOMMU resources w/ multiple processes, are the processes
  sharing memory?
 
 Sorry, bad wording.  I share IOMMU domains *within* each process.

Ah, got it.  Thanks.

 E.g. If one process has 3 devices and another has 10, I can get by with two
 iommu domains (and can share buffers among devices within each process).
 
 If I ever need to share devices across processes, the shared memory case
 might be interesting.
 
  
  So I need the ability to dynamically bring up devices and assign them to a
  group.  The number of actual devices and how they map to iommu domains is
  not known ahead of time.  We have a single piece of silicon that can expose
  hundreds of pci devices.
  
  This does not seem fundamentally different from the KVM use case.
  
  We have 2 kinds of groupings.
  
  1) low-level system or topoolgy grouping
  
 Some may have multiple devices in a single group
  
 * the PCIe-PCI bridge example
 * the POWER partitionable endpoint
  
 Many will not
  
 * singleton group, e.g. typical x86 PCIe function (majority of
   assigned devices)
  
 Not sure it makes sense to have these administratively defined as
 opposed to system defined.
  
  2) logical grouping
  
 * multiple low-level groups (singleton or otherwise) attached to same
   process, allowing things like single set of io page tables where
   applicable.
  
 These are nominally adminstratively defined.  In the KVM case, there
 is likely a privileged task (i.e. libvirtd) involved w/ making the
 device available to the guest and can do things like group merging.
 In your userspace case, perhaps it should be directly exposed.
 
 Yes.  In essence, I'd rather not have to run any other admin processes.
 Doing things programmatically, on the fly, from each process, is the
 cleanest model right now.

I don't see an issue w/ this.  As long it can not add devices to the
system defined groups, it's not a privileged operation.  So we still
need the iommu domain concept exposed in some form to logically put
groups into a single iommu domain (if desired).  In fact, I believe Alex
covered this in his most recent recap:

  ...The group fd will provide interfaces for enumerating the devices
  in the group, returning a file descriptor for each device in the group
  (the device fd), binding groups together, and returning a file
  descriptor for iommu operations (the iommu fd).

thanks,
-chris



[Qemu-devel] [Bug 807893] Re: qemu privilege escalation

2011-07-12 Thread Chris Wright
This bug is being tracked as CVE-2011-2527

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2011-2527

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/807893

Title:
  qemu privilege escalation

Status in QEMU:
  Confirmed

Bug description:
  If qemu is started as root, with -runas, the extra groups is not
  dropped correctly

  /proc/`pidof qemu`/status
  ..
  Uid:100 100 100 100
  Gid:100 100 100 100
  FDSize: 32
  Groups: 0 1 2 3 4 6 10 11 26 27 
  ...

  The fix is to add initgroups() or setgroups(1, [gid]) where
  appropriate to os-posix.c.

  The extra gid's allow read or write access to other files (such as
  /dev etc).

  Emulating the qemu code:

  # python
  ...
   import os
   os.setgid(100)
   os.setuid(100)
   os.execve(/bin/sh, [ /bin/sh ], os.environ)
  sh-4.1$ xxd /dev/sda | head -n2
  000: eb48 9000        .H..
  010:          
  sh-4.1$ ls -l /dev/sda
  brw-rw 1 root disk 8, 0 Jul  8 11:54 /dev/sda
  sh-4.1$ id
  uid=100(qemu00) gid=100(users) 
groups=100(users),0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/807893/+subscriptions



[Qemu-devel] [Bug 807893] Re: qemu privilege escalation

2011-07-12 Thread Chris Wright
Requesting CVE.  Tools like libvirt deprivilege themselves before
launching qemu as an unprivileged user (no use of -runas), so aren't
vulnerable.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/807893

Title:
  qemu privilege escalation

Status in QEMU:
  Confirmed

Bug description:
  If qemu is started as root, with -runas, the extra groups is not
  dropped correctly

  /proc/`pidof qemu`/status
  ..
  Uid:100 100 100 100
  Gid:100 100 100 100
  FDSize: 32
  Groups: 0 1 2 3 4 6 10 11 26 27 
  ...

  The fix is to add initgroups() or setgroups(1, [gid]) where
  appropriate to os-posix.c.

  The extra gid's allow read or write access to other files (such as
  /dev etc).

  Emulating the qemu code:

  # python
  ...
   import os
   os.setgid(100)
   os.setuid(100)
   os.execve(/bin/sh, [ /bin/sh ], os.environ)
  sh-4.1$ xxd /dev/sda | head -n2
  000: eb48 9000        .H..
  010:          
  sh-4.1$ ls -l /dev/sda
  brw-rw 1 root disk 8, 0 Jul  8 11:54 /dev/sda
  sh-4.1$ id
  uid=100(qemu00) gid=100(users) 
groups=100(users),0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/807893/+subscriptions



[Qemu-devel] [Bug 807893] Re: [PATCH] os-posix: set groups properly for -runas

2011-07-12 Thread Chris Wright
* Stefan Hajnoczi (stefa...@linux.vnet.ibm.com) wrote:
 @@ -199,6 +200,11 @@ static void change_process_uid(void)
  fprintf(stderr, Failed to setgid(%d)\n, user_pwd-pw_gid);
  exit(1);
  }
 +if (initgroups(user_pwd-pw_name, user_pwd-pw_gid)  0) {
 +fprintf(stderr, Failed to initgroups(\%s\, %d)\n,
 +user_pwd-pw_name, user_pwd-pw_gid);
 +exit(1);
 +}

Does initgroups need access to /etc/group?  How does this combine w/
-chroot?

Added bonus...this will fail when the initial user is not privileged
_and_ is the same user as -runas user (probably not what a user intended,
but would've worked before).  Something like:

[doh@laptop qemu]$ qemu -runas doh

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/807893

Title:
  qemu privilege escalation

Status in QEMU:
  Confirmed

Bug description:
  If qemu is started as root, with -runas, the extra groups is not
  dropped correctly

  /proc/`pidof qemu`/status
  ..
  Uid:100 100 100 100
  Gid:100 100 100 100
  FDSize: 32
  Groups: 0 1 2 3 4 6 10 11 26 27 
  ...

  The fix is to add initgroups() or setgroups(1, [gid]) where
  appropriate to os-posix.c.

  The extra gid's allow read or write access to other files (such as
  /dev etc).

  Emulating the qemu code:

  # python
  ...
   import os
   os.setgid(100)
   os.setuid(100)
   os.execve(/bin/sh, [ /bin/sh ], os.environ)
  sh-4.1$ xxd /dev/sda | head -n2
  000: eb48 9000        .H..
  010:          
  sh-4.1$ ls -l /dev/sda
  brw-rw 1 root disk 8, 0 Jul  8 11:54 /dev/sda
  sh-4.1$ id
  uid=100(qemu00) gid=100(users) 
groups=100(users),0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/807893/+subscriptions



Re: [Qemu-devel] [PATCH] os-posix: set groups properly for -runas

2011-07-12 Thread Chris Wright
* Chris Wright (chr...@sous-sol.org) wrote:
 * Stefan Hajnoczi (stefa...@linux.vnet.ibm.com) wrote:
  @@ -199,6 +200,11 @@ static void change_process_uid(void)
   fprintf(stderr, Failed to setgid(%d)\n, user_pwd-pw_gid);
   exit(1);
   }
  +if (initgroups(user_pwd-pw_name, user_pwd-pw_gid)  0) {
  +fprintf(stderr, Failed to initgroups(\%s\, %d)\n,
  +user_pwd-pw_name, user_pwd-pw_gid);
  +exit(1);
  +}
 
 Does initgroups need access to /etc/group?  How does this combine w/
 -chroot?

Tested this on Linux, and w/out /etc/group it simply fails to add any
supplementary groups (doesn't fail completely, just fails safely).
Appears similar from solaris manpages.

Given that...

Acked-by: Chris Wright chr...@sous-sol.org



Re: [Qemu-devel] [PATCHv2] vhost: fix double free on device stop

2011-06-21 Thread Chris Wright
* Michael S. Tsirkin (m...@redhat.com) wrote:
 vhost dev stop failed to clear the log field.
 Typically not an issue as dev start overwrites this field,
 but if logging gets disabled before the following start,
 it doesn't so this causes a double free.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Acked-by: Chris Wright chr...@redhat.com

thanks,
-chris



[Qemu-devel] KVM call agenda for June 7

2011-06-06 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Apr 26

2011-04-26 Thread Chris Wright
Tools for resource accounting the virtual machines.
- Luis Castro was not on the call

Status of glib tree - next steps?
- full conversion done in tree
- still targeting 0.15

status of QCFG
- code generator rewritten to be more generic and useful
- merge core infrastructure first
  - to not block other work waiting on full conversion
- still need to complete full conversion

qemu-kvm merge
- status
  - review and merge/feedback pending from Avi on current outstanding patches
  - still have some 60 patches
- break them into a few smaller series
- next steps, specifically:
  - upstreaming in-kernel irqchip support
  - MSI/MSI-X (cleanup and make mergable)
  - this is a decent amount of work, Jan is solo...anyone want to help?
- need to be careful of regressions
- add tests to avi's autotest run (e.g., cpu hotplug)
  - cpu hotplug test initiated from host side
  - online needs some cooperation in linux
  - still unclear on what's supported, windows apparently only supports online

autotest
- had autotest test day, feedback coming on list
- some issues with getting set up
- having basic common config could be useful

KVM Forum reminder
- send in your proposals



Re: [Qemu-devel] [PATCH] vfio: Add an ioctl to reset the device

2011-04-19 Thread Chris Wright
* Randy Dunlap (rdun...@xenotime.net) wrote:
 I can't find include/linux/vfio.h in linux-next or mainline git, but
 ioctls need to be documented in Documentation/ioctl/ioctl-number.txt

It is in the full patchset: https://github.com/pugs/vfio-linux-2.6



Re: [Qemu-devel] [PATCH] vfio: Add an ioctl to reset the device

2011-04-19 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 When using VFIO to assign a device to a guest, we want to make sure
 the device is quiesced on VM reset to stop all DMA within the guest
 mapped memory.  Add an ioctl which just calls pci_reset_function()
 and returns whether it succeeds.

Shouldn't there be a reset when binding/unbinding vfio to/from a pci
device?



Re: [Qemu-devel] [PATCH] vfio: Add an ioctl to reset the device

2011-04-19 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Tue, 2011-04-19 at 15:07 -0700, Chris Wright wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
   When using VFIO to assign a device to a guest, we want to make sure
   the device is quiesced on VM reset to stop all DMA within the guest
   mapped memory.  Add an ioctl which just calls pci_reset_function()
   and returns whether it succeeds.
  
  Shouldn't there be a reset when binding/unbinding vfio to/from a pci
  device?
 
 There's already one when the /dev/vfioX file is opened, we should add
 another on release, and probably add the same PCI save state store/load
 that I'm proposing for KVM across those.  Thanks,

Hmm, I looked and didn't see it, hence the question.



Re: [Qemu-devel] [PATCH] vfio: Add an ioctl to reset the device

2011-04-19 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Tue, 2011-04-19 at 15:26 -0700, Chris Wright wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
   On Tue, 2011-04-19 at 15:07 -0700, Chris Wright wrote:
* Alex Williamson (alex.william...@redhat.com) wrote:
 When using VFIO to assign a device to a guest, we want to make sure
 the device is quiesced on VM reset to stop all DMA within the guest
 mapped memory.  Add an ioctl which just calls pci_reset_function()
 and returns whether it succeeds.

Shouldn't there be a reset when binding/unbinding vfio to/from a pci
device?
   
   There's already one when the /dev/vfioX file is opened, we should add
   another on release, and probably add the same PCI save state store/load
   that I'm proposing for KVM across those.  Thanks,
  
  Hmm, I looked and didn't see it, hence the question.
 
 vfio_open() - pci_reset_function()
 https://github.com/pugs/vfio-linux-2.6/blob/vfio/drivers/vfio/vfio_main.c

Got it, thanks Alex.



[Qemu-devel] KVM call minutes for Apr 5

2011-04-05 Thread Chris Wright
KVM Forum
- save the date is out, cfp will follow later this week
- abstracts due in 6wks, 2wk review period, notifications by end of May

Improving process to scale project
- Trivial patch bot
- Sub-maintainership

Trivial patch monkeys^Wteam
- small/simple patches posted can fall through the cracks (esp. for
  areas that aren't well maintained)
- patches should be simple, easy to review (
- aiming to gather a team, so that the position can rotate
- patch submitter can rest assured
- Stefan and possibly Mike Roth are volunteering to get this started
- Cc: qemu-triv...@nongnu.org to send patches to the Trivial patch monkey
- details here:
  
  http://wiki.qemu.org/Contribute/TrivialPatches

Sub-maintainership
- have MAINTAINERS file
  - need to add git tree URLs
  - needs another pass to make sure there are no missing subsystems
  - make it clearer how maintained the subsystems are
- adding a wiki page to show how to become a subsystem maintainer
  - one valuable step...write testing around the subsystem
- means you've had to learn the subsystem (builds expertise)
- allows for regression testing the subsystem (esp. validating new patches)
- sub-maintainers sometimes disappear
  - can add another maintainer
  - actively poke the maintainer when patches are languishing
  - if you're going to be away, be sure to let list or backup know
- systematic patch tracking would help, patchwork doesn't quite cut it
- who receives pull request
  - list + blue swirl/aurelien for tcg, anthony picking up plenty of
other bits
- infrastructure subsystems (qdev, migration, etc..)
  - big invasive changes done externally, effective flag day for full merge
  - subsystem localized change (e.g. vmstate fix for a specific device)
maintainers can work it out, be sure to have both
- facilitating patch review and hopefully improving subsystem over time

kvm-autotest
- roadmap...refactor to centralize testing (handle the xen-autotest split off)
- internally at RH, lmr and cleber maintain autotest server to test
  branches (testing qemu.git daily)
  - have good automation for installs and testing
- seems more QA focused than developers
  - plenty of benefit for developers, so lack of developer use partly
cultural/visibility...
  - kvm-autotest team always looking for feedback to improve for
developer use case
- kvm-autotest day to have folks use it, write test, give feedback?
  - startup cost is/was steep, the day might be too much handholding
  - install-fest? (to get it installed and up and running)
- buildbot or autotest for testing patches to verify building and working
- one goal is to reduce mailing list load (patch resubmission because
  they haven't handled basic cases that buildbot or autotest would have
  caught)
- fedora-virt test day coming up on April 14th.  lucas will be on hand and
  we can piggy back on that to include kvm-autotest install and virt testing
- kvm autotest run before qemu pull request and post merge to track
  regressions, more frequent testing helps developers see breakage
  quickly
  - qemu.git daily testing already, only the sanity test subset 
- run more comprehensive stable set of tests on weekends
- one issue is the large number of known failures, need to make these
  easier to identify (and fix the failures one way or another)
- create database and verify (regressions) against that
  - red/yellow/green (yellow shows area was already broken)
- autotest can be run against server, not just on laptop
- how to do remote client display testing (e.g. spice client)
  - dogtail and LDTP
  - graphics could be tested w/ screenshot compares
- WHQL testing automated as well



[Qemu-devel] KVM call minutes for Mar 15

2011-03-15 Thread Chris Wright
QAPI -- http://wiki.qemu.org/Features/QAPI
- please review!
- Anthony would like to see feedback and plans to commit in a week
  (assuming agreement and no major issues in review)
- some concern about the maintainability of code generation
  - but still nothing concrete on the list, need to review and discuss
on the list
- some concern that implementation details may change the wire protocol
  - introduces a new mechanism for new signals (mask by default and
enabled explicitly)
  - disagreement over when/how to introduce new extensions
- libvirt feedback?
  - no protocol level changes
- old and new versions are testable with test suite and proves this
- c library implementation is critical to have unit tests and test
  driven development
  - thread safe?
- no shared state, no statics.
- threading model requires lock for the qmp session
  - licensiing?
- LGPL
  - forwards/backwards compat?
- designed with that in mind see wiki:
  
  http://wiki.qemu.org/Features/QAPI

QCFG -- http://wiki.qemu.org/Features/QCFG
- command line args translation to objects is complex and buggy
- schema + code generator to formalize this
- formally describe each command line option and generate code
  to build and validate objects
- provides systematic way to document command line options
- automatically 
- device_add does multiple conversions to go from qmp to qemuopts to
  objects
- move to basic c structures, and autogenerated marshalling code
- no plan to do this work soon, late in 0.15 cycle
  - same as qapi, fork a tree, do mass conversion and merge for 0.16 cycle
- qmp server mode to take all configuation commands before actually
  starting the guest
- can provide a config file 
- qdev...
  - could just bridge to setting and getting qdev properties
  - OR get to point where device objects go directly to qdev device init
- why not move command line to qmp instead of new schema?
  - single schema
- considerations for -M (didn't capture all of these)
- for all the details:
  
  http://wiki.qemu.org/Features/QCFG

Merging big changes
- in the past, evolving in tree has not worked well, leaving partial
  conversions
- QAPI/QCFG method of doing changes in external tree hopes to set new precedent
  - preserve patch/review on list
  - do full conversion
  - provide strong testing to show it works

Kemari merge plans
- just needs some ACKs
- Juan, Anthony, anybody else who is familiar with migration to review?

switch from gpxe to ipxe
- possible 0.15 release w/ ipxe (Alex looking into it)
- Michael Brown been helpful in fixing bugs, so compat
- Alex will send out mail soon on the details
- ipxe releases?  not yet, there are plans for it, should be coming RSN
- Stefan volunteers to help test



Re: [Qemu-devel] KVM call minutes for Mar 15

2011-03-15 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 03/15/2011 09:53 AM, Chris Wright wrote:
  QAPI
snip
 - c library implementation is critical to have unit tests and test
driven development
- thread safe?
  - no shared state, no statics.
  - threading model requires lock for the qmp session
- licensiing?
  - LGPL
- forwards/backwards compat?
  - designed with that in mind see wiki:
 
http://wiki.qemu.org/Features/QAPI
 
 One neat feature of libqmp is that once libvirt has a better QMP
 passthrough interface, we can create a QmpSession that uses libvirt.
 
 It would look something like:
 
 QmpSession *libqmp_session_new_libvirt(virDomainPtr dom);

Looks like you mean this?

   - request QmpSession - 
client  libvirt
   - return QmpSession  -

client - QmpSession - QMP - QEMU

So bypassing libvirt completely to actually use the session?

Currently, it's more like:

client - QemuMonitorCommand - libvirt - QMP - QEMU

 The QmpSession returned by this call can then be used with all of
 the libqmp interfaces.  This means we can still exercise our test
 suite with a guest launched through libvirt.  It also should make
 the libvirt pass through interface a bit easier to consume by third
 parties.

This sounds like it's something libvirt folks should be involved with.
At the very least, this mode is there now and considered basically
unstable/experimental/developer use:

 Qemu monitor command '%s' executed; libvirt results may be unpredictable!

So likely some concern about making it easier to use, esp. assuming
that third parties above are mgmt apps, not just developers.

thanks,
-chris



[Qemu-devel] Re: [PATCH] Fix performance regression in qemu_get_ram_ptr

2011-03-10 Thread Chris Wright
* Vincent Palatin (vpala...@chromium.org) wrote:
 When the commit f471a17e9d869df3c6573f7ec02c4725676d6f3a converted the
 ram_blocks structure to QLIST, it also removed the conditional check before
 switching the current block at the beginning of the list.

Nice catch.

 In the common use case where ram_blocks has a few blocks with only one
 frequently accessed (the main RAM), this has a performance impact as it
 performs the useless list operations on each call (which are on a really
 hot path).
 
 On my machine emulation (ARM on amd64), this patch reduces the
 percentage of CPU time spent in qemu_get_ram_ptr from 6.3% to 2.1% in the
 profiling of a full boot.

Hopefully this is back on par with before the QLIST switchover.

 Signed-off-by: Vincent Palatin vpala...@chromium.org

Acked-by: Chris Wright chr...@redhat.com

 ---
  exec.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/exec.c b/exec.c
 index d611100..81f08b7 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2957,8 +2957,11 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
  
  QLIST_FOREACH(block, ram_list.blocks, next) {
  if (addr - block-offset  block-length) {
 -QLIST_REMOVE(block, next);
 -QLIST_INSERT_HEAD(ram_list.blocks, block, next);
 +/* Move this entry to to start of the list.  */
 +if (block != QLIST_FIRST(ram_list.blocks)) {
 +QLIST_REMOVE(block, next);
 +QLIST_INSERT_HEAD(ram_list.blocks, block, next);
 +}

Pretty close to self-documenting code now.  Not sure if it's subtle enough
to warrant change to the comment like:
 
  /* Move block to head of list if it's not there already */

thanks,
-chris



[Qemu-devel] KVM call minutes for Feb 22

2011-02-22 Thread Chris Wright
0.14 recap
- keeping schedule on wiki was helpful
- changelog was helpful
- testing (could even more emphasis could be improved)
- -rc cycles
  - -rc2 and final release just hours

0.15
- tentative date July 1st
- qapi
- qed features
- virtagent?
  - depends on whether to terminate in qemu vs external
- terminating w/in qemu is close to feature complete
- using QMP (kinda, QObject - JSON marshalling, still use HTTP)
- QMP is not bi-directional XMLRPC, one way with event posting
- XMLRPC + server logic add to the basic QEMU side attack surface
  - splitting out to external process
- state associated with guest in external process complicates live migration
  - e.g. handling in-process command in server
  - guest client reconnects during migration
  - can virtagent features be stateless 
- Avi's favorite Lua based extension language coming RSN ;)
  - let's use copy and paste as a concrete example
- usecase to help define the requirements and expose
  architectural
- Jes will do this, make concrete counter proposal to hosting
  virtagent server in qemu
  - splitting QEMU into more modular components is a large architectural
step, but better step

Block format acceptance
- qcow3 wiki starting

GSoC projects
- only 3 so far, mentoring organization applications Feb 28th
  - can update app 
- please add your thoughts here so that we can have a successful
- Luiz will send out a note as more explicit reminder

gpxe vs ipxe
- gpxe still stagnate
- ipxe accepting patches (e.g. igbvf)
- perhaps switch in 0.15 (Alex take a look)



[Qemu-devel] KVM call minutes for Feb 15

2011-02-15 Thread Chris Wright
QAPI and QMP
- Anthony adding a new wiki page to describe all of this
- specified in formal schema using JSON
  - includes documenation in javadoc-like syntax
  - can generate api (possibly protocol) docs
  - documenting each command and expected errors
- creates marshalling functions and C interfaces
- can generate C library
  - facilitates unit tests/regression tests
- new and old code both exist in Anthony's tree
  - allows unit tests to run on both to verify
  - will remove old and force a flag day on merging in for 0.15
- still need to convert human monitor commands
  - goal to convert all of human monitor to QMP
- events?
  - still not consumable from internal use
  - model signals and slots
- similar to notifier lists, but can pass arbitrary data
- client connects to signal via QMP
  - how to extend?
- optional parameters (ABI bump)
  - no way to know if client is aware of and consuming the optional
parameters
- add new events
  - client required to register for new events when the know about
them, server can generate different logic based on clients
capability
- first release may not include shared library (lack of libconf/autotool)
  - could 
- QMP session in default well-known location
  - allows iteration of all running QMP sessions
  - per-user directory to handle user-level isolation

qdev future
- have an object model, but can't do polymorphism (i.e. bus level)
- could use more oop style, use GObject, use C++...no great ideas
- no major qdev plans for 0.15
- would be useful to have the ability to do device level unit testing
  - cleaner device model, better encapsulation
  - this is both the device side interfaces, but also interfaces back to qemu
  - ability to do something like a virtual PCI bus to be a test harness
to interact with a device
  - back to the GObject, oop, C++ questions?
- IDL based code generation to generate VMState in effort to make
  migration more verifiable
- VMState
  - need to focus on serialized guest visible state
- start with all state and remove obviously internal only state
- start with only guest visible state (structure separation)
  - verfiable
- need a qdev tree maintainer?
- some disagreement on exactly how much 
- qdev autodoc patches? (posted and ack'd multiple times)

bad patches committed that are not on list
- please inform of specifics incidents, this should not be happening

SeaBIOS update?
- w/out we will have features that can't be used 
- need a release..
  - 0.15 will need good planning and dates and communication with Kevin

0.14-rc2 tagged please review for any missing patches, 0.14.0 likely
tagged late today

revisit new - old migration
- Amit offers virtio-serial patches and some legwork
- tabled discussion to list, possibly next week's call



[Qemu-devel] KVM call agenda for Feb 15

2011-02-14 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Feb 8

2011-02-08 Thread Chris Wright
Automated builds and testing
- found broken 32-bit
- luiz suggested running against maintainer trees
- daniel gollub offered to take on maintenance
- integration with kvm-autotest?
  - lucas, daniel, stefan...
  - testing each git commit is probably overkill and too expensive
  - current autotest run (each 48-hours to batch it up)
  - stefan currently running once a day, autotest run is 3 hours, so
daily should work
- need an integration tree to run build test on?
  - probably still too early

QEMU testing
- kvm unit tests
  - small standalone kernel that exercises paths that have shown bugs
http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=summary
- Michael Roth recent sent RFC for qtest
  (http://www.mail-archive.com/qemu-devel@nongnu.org/msg54191.html)
  - test module (-init(), -run()) which runs in place of vcpu threads to
set up a test framework to do targetted testing, for example, of devices
  - normal C code, access to qemu internal functions
  - not just functional device testing, but can also to fuzz testing
  - looking feedback/users/test developers/etc
- PPC (just kernel + initrd to boot, and verify boots are identical)
  - full install in many cases is too long, and can trigger other issues
(alex had examples of emulation being slow enough that login screen
times out)
- tcg basic testing to verify qemu-kvm patch isn't breaking tcg

Cross version migration (new-old version migration thread)
- downstreams want this, support this upstream?
- versions vs. subsections (subsections should allow this to work)
- (as usual) more vmstate conversion needed
- qdev/vmstate both examples of partially completed work that need more
  attention 



[Qemu-devel] KVM call agenda for Feb 8

2011-02-07 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Feb 1

2011-02-01 Thread Chris Wright
KVM upstream merge: status, plans, coordination
- Jan has a git tree, consolidating
- qemu-kvm io threading is still an issue
- Anthony wants to just merge
  - concerns with non-x86 arch and merge
  - concerns with big-bang patch merge and following stability
- post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
  a problem if it's not there by then
- testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
- qemu-kvm still evolving, needs to get sync'd or it will keep diverging
- 2 implementations of main init, cpu init, Jan has merged them into one
  - qemu-kvm-x86.c file that's only a few hundred lines
- review as one patch to see the fundamental difference

QMP support status for 0.14
- declare QMP fully supported
  - caveats: specific errors aren't guaranteed yet (primarily documentation)
  - human monitor passthrough command is best effort
- device tree structure is not reliable, use name not path
- will send out patch to update qmp-commands.hx to document this (and Cc
  libvirt)
- schema file (json subset which is python) and code generator to
  generate code with C structures, also generates client library for
  test cases (can test against new and old qmp server to verify hasn't
  changed)
  - HMP implemented in terms of QMP only
  - at the end should have a test framework to test all commands
  - glib/gtest framework

0.14 stable fork today
already posted 0.14 patches?
- will pick up all those patches before forking, fork at the end of the day
- will grab latest SeaBIOS and vgabios

SeaBIOS update for 0.14 (AHCI boot capable version)
- need to check if (and why) AHCI is disabled by default 
  - assuming no fundamental issues, could be enabled and become an
experimental new 0.14 feature

Summer of code 2011
- http://wiki.qemu.org/Google_Summer_of_Code_2011
- update wiki page with project ideas (let Anthony or Luiz know if you
  want to be a mentor)
- application is due at end of the month
- mentors...be prepared that projects may take longer than just the
  summer of code to complete
- join #qemu-gsoc on OFTC for gsoc discussions

Going to FOSDEM?  agraf will be there...



[Qemu-devel] KVM call agenda for Jan 25

2011-01-24 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] Re: KVM call agenda for Jan 18

2011-01-18 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, this week's call is cancelled.

thanks,
-chris



[Qemu-devel] KVM call agenda for Jan 18

2011-01-17 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Jan 11

2011-01-11 Thread Chris Wright
KVM Forum 2011
- expand the scope? yes, continue up the stack
- how long?  2 days (maybe 2 1/2 - 3 space permitting)
- where?  Vancouver with LinuxCon

Spice guest agent:
- virt agent, matahari, spice agent...what is in spice agent?
- spice char device
  - mouse, copy 'n paste, screen resolution change
- could be generic (at least input and copy/paste)
  - send protocol details of what is being sent
- need to look at how difficult it is to split it out from spice
  (how to split out in qemu vs. libspice)
- goal to converge on common framework
- more discussion on char device vs. protocol
  - eg. mouse_set breaks if mouse channel is part pv and part spice specific
- Alon will send link to protocol and try to propose new interfaces

migration and block devices:
- need to invalidate data after first read on target,
  because it can be stale
- close + reopen is what was done for NFS
- iscsi: can issue ioctl(BLKFLSBUF) to flush, but it's CAP_SYS_ADMIN only
- O_DIRECT to avoid cache (concerns that it's not guaranteed)
- agree change the default (cache=none for 

qemu patch queue is long:
- slow to return from break
- patience and more patch review will help make sure things are applied
  and don't fall through cracks



[Qemu-devel] Re: KVM call agenda for Dec 21

2010-12-21 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, today's call is cancelled.

Also, given people's holiday and vacation schedules, next week's call is
cancelled.  Talk again after the New Year.

thanks,
-chris



[Qemu-devel] KVM call agenda for Dec 21

2010-12-20 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] Re: KVM call agenda for Dec 14

2010-12-14 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, today's call is cancelled.



Re: [Qemu-devel] KVM call agenda for Dec 14

2010-12-14 Thread Chris Wright
* Jes Sorensen (jes.soren...@redhat.com) wrote:
 Any chance you could fix your cronjob to send out the CFA a day earlier?
 15 hrs before is a bit short notice.

Sure.



[Qemu-devel] KVM call agenda for Dec 14

2010-12-13 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



Re: [Qemu-devel] KVM call agenda for Dec 7

2010-12-07 Thread Chris Wright
* Jes Sorensen (jes.soren...@redhat.com) wrote:
 On 12/07/10 00:51, Chris Wright wrote:
  Please send in any agenda items you are interested in covering.
  
  thanks,
  -chris
  
 
 No agenda, no replies
 
 Call canceled I presume?

Indeed, next week, then pick up next year...



[Qemu-devel] KVM call agenda for Dec 7

2010-12-06 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Nov 30

2010-12-01 Thread Chris Wright
2011 KVM Conference
- together with LF event like LinuxCon Vancouver BC (Aug), KS Prague (Nov)
- wider audience
  - include qemu (tcg)
  - include libvirt
  - include xen

0.14.0 release plan
- could push things out, mainly want to keep on track for

infrastructure changes (irc channel migration, git tree migration)
- savannah down
- git.qemu.org was mirror, will start pushing there
- when savannah is back up, will become mirror (so git users should
  still work)
- plan on moving #qemu to OFTC

nested VMX
- no progress, future plans are unclear

qemu users forum in grenoble
- worth having someone there
- goal to get embedded forks to push changes back to qemu

migration with large memory
- switching to 50ms cap likely to cause regression in terms of vcpu runtime
- 50ms qemu mutex contention, brief period of mutex access
  - this has the effect of speeding up migration but giving too little vcpu
access to qemu mutex (network connections could terminate, for example)
- only fixes to this are to use bw limit or not holding qemu mutex during
  mirgration
- run Anthony's test load and discuss on list



[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Wed, 2010-12-01 at 21:42 +0530, Srivatsa Vaddagiri wrote:
 
  Not if yield() remembers what timeslice was given up and adds that back when
  thread is finally ready to run. Figure below illustrates this idea:
  
  
A0/4C0/4 D0/4 A0/4  C0/4 D0/4 A0/4  C0/4 D0/4 A0/4 
  p0   
  ||-L||||L||||L||||--|
  \\   \  \
 B0/2[2]  B0/0[6] B0/0[10]B0/14[0]
  
   
  where,
  p0  - physical cpu0
  L   - denotes period of lock contention
  A0/4- means vcpu A0 (of guest A) ran for 4 ms
  B0/2[6] - means vcpu B0 (of guest B) ran for 2 ms (and has given up
 6ms worth of its timeslice so far). In reality, we should
 not see too much of given up timeslice for a vcpu.
 
 /me fails to parse
 
   Regarding directed yield, do we have any reliable mechanism to find 
   target of
   directed yield in this (unmodified/non-paravirtualized guest) case? IOW 
   how do
   we determine the vcpu thread to which cycles need to be yielded upon 
   contention?
   
   My idea was to yield to a random starved vcpu of the same guest.
   There are several cases to consider:
   
   - we hit the right vcpu; lock is released, party.
   - we hit some vcpu that is doing unrelated work.  yielding thread
   doesn't make progress, but we're not wasting cpu time.
   - we hit another waiter for the same lock.  it will also PLE exit
   and trigger a directed yield.  this increases the cost of directed
   yield by a factor of count_of_runnable_but_not_running_vcpus, which
   could be large, but not disasterously so (i.e. don't run a 64-vcpu
   guest on a uniprocessor host with this)
   
 So if you were to test something similar running with a 20% vcpu
 cap, I'm sure you'd run into similar issues.  It may show with fewer
 vcpus (I've only tested 64).
   
 Are you assuming the existence of a directed yield and the
 specific concern is what happens when a directed yield happens
 after a PLE and the target of the yield has been capped?
   
 Yes.  My concern is that we will see the same kind of problems
 directed yield was designed to fix, but without allowing directed
 yield to fix them.  Directed yield was designed to fix lock holder
 preemption under contention,
   
   For modified guests, something like [2] seems to be the best approach to 
   fix
   lock-holder preemption (LHP) problem, which does not require any sort of
   directed yield support. Essentially upon contention, a vcpu registers 
   its lock
   of interest and goes to sleep (via hypercall) waiting for lock-owner to 
   wake it
   up (again via another hypercall).
   
   Right.
  
  We don't have these hypercalls for KVM atm, which I am working on now.
  
   For unmodified guests, IMHO a plain yield (or slightly enhanced yield 
   [1])
   should fix the LHP problem.
   
   A plain yield (ignoring no-opiness on Linux) will penalize the
   running guest wrt other guests.  We need to maintain fairness.
  
  Agreed on the need to maintain fairness.
 
 Directed yield and fairness don't mix well either. You can end up
 feeding the other tasks more time than you'll ever get back.

If the directed yield is always to another task in your cgroup then
inter-guest scheduling fairness should be maintained.

   Fyi, Xen folks also seem to be avoiding a directed yield for some of the 
   same
   reasons [3].
   
   I think that fails for unmodified guests, where you don't know when
   the lock is released and so you don't have a wake_up notification.
   You lost a large timeslice and you can't gain it back, whereas with
   pv the wakeup means you only lose as much time as the lock was held.
   
   Given this line of thinking, hard-limiting guests (either in user-space 
   or
   kernel-space, latter being what I prefer) should not have adverse 
   interactions
   with LHP-related solutions.
   
   If you hard-limit a vcpu that holds a lock, any waiting vcpus are
   also halted.
  
  This can happen in normal case when lock-holders are preempted as well. So
  not a new problem that hard-limits is introducing!
 
 No, but hard limits make it _much_ worse.
 
With directed yield you can let the lock holder make
   some progress at the expense of another vcpu.  A regular yield()
   will simply stall the waiter.
  
  Agreed. Do you see any problems with slightly enhanced version of yeild
  described above (rather than directed yield)? It has some advantage over 
  directed yield in that it preserves not only fairness between VMs but also 
  fairness between VCPUs of a VM. Also it avoids the need for a guessing game 
  mentioned above and bad interactions with hard-limits.
  
  CCing other scheduler experts for their opinion of proposed yield() 
  extensions.
 
 sys_yield() usage for anything other but two FIFO 

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Wed, 2010-12-01 at 09:17 -0800, Chris Wright wrote:
  Directed yield and fairness don't mix well either. You can end up
  feeding the other tasks more time than you'll ever get back.
 
 If the directed yield is always to another task in your cgroup then
 inter-guest scheduling fairness should be maintained. 
 
 Yes, but not the inter-vcpu fairness.

That same vcpu doesn't get fair scheduling if it spends its entire
timeslice spinning on a lock held by a de-scheduled vcpu.



Re: [Qemu-devel] KVM call agenda for Nov 23

2010-11-22 Thread Chris Wright
* Juan Quintela (quint...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

usb-ccid



[Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-22 Thread Chris Wright
* Anthony Liguori (aligu...@us.ibm.com) wrote:
 qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
 them to respond to these signals, introduce monitor commands that stop and 
 start
 individual vcpus.

In the past SIGSTOP has introduced time skew.  Have you verified this
isn't an issue.

thanks,
-chris



[Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-22 Thread Chris Wright
* Anthony Liguori (aligu...@linux.vnet.ibm.com) wrote:
 On 11/22/2010 05:04 PM, Chris Wright wrote:
 * Anthony Liguori (aligu...@us.ibm.com) wrote:
 qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
 teaching
 them to respond to these signals, introduce monitor commands that stop and 
 start
 individual vcpus.
 In the past SIGSTOP has introduced time skew.  Have you verified this
 isn't an issue.
 
 Time skew is a big topic.  Are you talking about TSC drift,
 pit/rtc/hpet drift, etc?

Sorry to be vague, but it's been long enough that I don't recall
the details.  The guest kernel's clocksource effected how timekeeping
progressed across STOP/CONT (was probably missing qemu based timer ticks).
While this is not the same, made me wonder if you'd tested against that.

 It's certainly going to stress periodic interrupt catch up code.

Heh, call it a feature for autotest ;)

thanks,
-chris



[Qemu-devel] KVM call agenda for Nov 16

2010-11-15 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] Re: [PATCH v3] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-14 Thread Chris Wright
* Hidetoshi Seto (seto.hideto...@jp.fujitsu.com) wrote:
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:
 
 hw/virtio-9p-local.c: In function 'local_utimensat':
 hw/virtio-9p-local.c:479: warning: implicit declaration of function 
 'utimensat'
 hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat'
 
 and:
 
 hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
 hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this 
 function)
 hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once
 hw/virtio-9p.c:1410: error: for each function it appears in.)
 hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this 
 function)
 hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
 hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this 
 function)
 
 v3:
   - Use better alternative handling for UTIME_NOW/OMIT
   - Move qemu_utimensat() to cutils.c
 V2:
   - Introduce qemu_utimensat()
 
 Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com

Looks good to me (no strong opinion on the cutils vs oslib-posix that
Jes mentioned).

Acked-by: Chris Wright chr...@sous-sol.org



[Qemu-devel] Re: [PATCH] virtio-9p: fix build on !CONFIG_UTIMENSAT v2

2010-11-13 Thread Chris Wright
* Hidetoshi Seto (seto.hideto...@jp.fujitsu.com) wrote:
 +/*
 + * Fallback: use utimes() instead of utimensat().
 + * See commit 74bc02b2d2272dc88fb98d43e631eb154717f517 for known problem.
 + */
 +struct timeval tv[2];
 +int i;
 +
 +for (i = 0; i  2; i++) {
 +if (times[i].tv_nsec == UTIME_OMIT || times[i].tv_nsec == UTIME_NOW) 
 {
 +tv[i].tv_sec = 0;
 +tv[i].tv_usec = 0;

I don't think this is accurate in either case.  It will set the
atime, mtime, or both to 0.

For UTIME_NOW (in both) we'd simply pass NULL to utimes(2).  For
UTIME_OMIT (in both) we'd simply skip the call to utimes(2) altogether.

The harder part is a mixed mode (i.e. the truncate fix mentioned in the
above commit).  I think the only way to handle UTIME_NOW in one is to
call gettimeofday (or clock_gettime for better resolution) to find out
what current time is.  And for UTIME_OMIT call stat to find out what the
current setting is and reset to that value.  Both of those cases can
possibly zero out the extra precision (providing only seconds
resolution).

thanks,
-chris



[Qemu-devel] KVM call minutes for Nov 9

2010-11-09 Thread Chris Wright
linux plumbers
- qemu talks, including xen folks efforts to get patches upstream
  - considering virtio
  - considering seabios
  - even some xenner interest
- seabios presentation
- uefi
  - still needs CSM support (lot of work) to be the only BIOS
  - otherwise need legacy BIOS and UEFI and users choose
  - who's interested?
- host numa support (guest placement and home noding)
  - static pinning fine for benchmarking
  - -mempath
  - migrate_pages(2) can be called from usersapce to push pages around
  - like to push this framework into the kernel
  - wiki page for planning coming RSN
- ways to improve qemu ...
  - C++, RFC patches to show value of language object support and device model
  - like to get to devices modular enough to have unit tests, etc.
  - any security interest in using a stronger typed language?

kernel summit
- showed up in many talks (mainstream part of the kernel)
- KVM for ARM hallways discussion
- more developers interested in joining the project (need to prepare for this)

USB 2.0 ehci support
- anyone looking at this?
  - Jes has it on his todo list (albeit not near the top priority)
- should send out a note summarizing status of that tree (Jan will send note)
  - is USB passthrough fully working with this?
- USB 3.0 coming, plan for that (maybe go straight there)

openfirmware tree
- problem stable/unique names for devices (like UUID generated at dev creation)
- addressing device externally from QEMU is still hard
- also can't touch the user defined namespace



[Qemu-devel] KVM call minutes for Oct 26

2010-10-26 Thread Chris Wright
Guest Agents
- need to get to guest userspace for many actions
- virtio for userspace
- host backend needs to terminate in QEMU
- portable across guest OS's
- virt-agent
  - bi-directional RPC (XML-RPC just since it's easy)
  - cmd: shutdown, reboot, dmesg, execute command, read/write file
- query guest type
- Matahari
  - consolidate agent proliferation
  - w/ or w/out networking (virtio-serial is fine)
  - may or may not have access to host
  - using amqp
- single transport
- use of qpid (C++, not C friendly)
  - put qemu bits in library and wrap (similar to libvirt, netcf, etc)
- e.g. shutdown, where does it live?
  - ACPI shutdown can trigger dbus, dialog, etc.
  - can already do ACPI, agent is for direct shutdown -h now
  - is there an async notification on shutdown (know it's been sucessful)
  - perhaps another library like libsysconfig
- openvmtools
  - useful as reference, GPL (requires copyright assignment)
  - uses PIO (or socket)

0.13.1
- vmmouse is broken
- assertion failure in block layer
  - just qemu-img: https://bugzilla.redhat.com/show_bug.cgi?id=646538
- patch posted, thanks Stefan
- hotplug fixes
- fix for seabios SCI level triggered (broken host initiated powerdown on 
FreeBSD)
  - regression -- any regression needs to be considered seriously
  - was planning to move to 0.6.1 (vs. latest git snapshot)
  - kevin indicated ok with stable/tagging/branch for seabios

bootindex patch series
- qdev name vs specific name
- fine for seabios interface
- migration needs stable address too
  - worth holding up series for this?
  - one more try, then fallback to plan b (new callback)

migration issues
- keep using network after VM has stopped
- sent hacky patch for virtio-net, but need a generic sol'n
- virtio-block flushes after stop
- need a similar stop/flush for other devices
- unclear how anything is running w/out getting back to main loop
  - happening after migration completes

networking interfaces
- old vlan vs new netdev...be nice to finish off and simply internal
  interfaces



[Qemu-devel] Re: KVM call agenda for Oct 26

2010-10-25 Thread Chris Wright
* Juan Quintela (quint...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

Guest agents




Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-22 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 The first step is just identifying what interfaces we need in a
 guest agent.  So far, I think we can get away with a very small
 number of interfaces (mainly read/write files, execute command).

Could you elaborate here?  I can't imagine you mean:

vm_system(target_vm, shutdown -r now)

But from other post, it does seem you want complexity in the host side
not guest side of agent.

Seems vm_reboot(target_vm) as the RPC makes more sense with the guest
side implementing that in whatever guest-specific appropriate way.

thanks,
-chris



Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-22 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 10/22/2010 12:29 PM, Chris Wright wrote:
 * Anthony Liguori (anth...@codemonkey.ws) wrote:
 The first step is just identifying what interfaces we need in a
 guest agent.  So far, I think we can get away with a very small
 number of interfaces (mainly read/write files, execute command).
 Could you elaborate here?  I can't imagine you mean:
 
 vm_system(target_vm, shutdown -r now)
 
 But from other post, it does seem you want complexity in the host side
 not guest side of agent.
 
 Seems vm_reboot(target_vm) as the RPC makes more sense with the guest
 side implementing that in whatever guest-specific appropriate way.
 
 What I really want is a vm_system API that a guest agent MUST
 implement and then APIs like vm_reboot that a guest agent MAY
 implement.
 
 In my mind, the guest agent lives in the distros even though it's
 built from QEMU source tree.  We don't install it ourselves.
 
 That means we might have a new funky fresh version of Fedora 21
 version of QEMU but running an old Fedora 14 guest with a really
 back-level guest agent.
 
 Having very low level APIs with logic that primarily lives in QEMU
 gives us the ability to support new features in QEMU with older
 guests.

I'm not sure about that.  That same new shiny Fedora 21 QEMU has no idea
what the right OS specific command to run in guest is.  Granted, it's
not likely that reboot or shutdown -r now are likely to change for
Linux guests, do we assume cygwin for Windows guests?  Really seems to
make more sense to have a stable ABI and negotiate version.

Also, from the point of view of a cloud where a VM agent is awfully
close to provider having backdoor into VM...a freeform vm_system()
doesn't seem like it'd be real popular.

thanks,
-chris



Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-22 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 10/22/2010 01:20 PM, Chris Wright wrote:
 I'm not sure about that.  That same new shiny Fedora 21 QEMU has no idea
 what the right OS specific command to run in guest is.  Granted, it's
 not likely that reboot or shutdown -r now are likely to change for
 Linux guests, do we assume cygwin for Windows guests?
 
 No, but I'll waive my hands and say that I'm sure Windows has some
 appropriate mechanism to do the same thing (like PowerShell).

OK (bleh), but it's still specific to the guest OS.

Really seems to
 make more sense to have a stable ABI and negotiate version.
 
 I guess the point is: we can always teach QEMU about how to work
 around older guests.  We (usually) can't control the software that's
 present on the guest itself.

I don't understand why we'd work around an older guest if the host -
guest interface is stable.  Sure it can be extended, but old interfaces
should keep on Just Working (TM).

 The more logic we have in QEMU, the less we have to change the
 software in the guest which means the more likely things will work.

Maybe you're saying the advantage of injecting the raw commands into
the guest is that a host rev will automagically give an old guest new
functionality?

 Also, from the point of view of a cloud where a VM agent is awfully
 close to provider having backdoor into VM...a freeform vm_system()
 doesn't seem like it'd be real popular.
 
 This is the best (irrational) argument against this practice.
 Obviously, there's no real security concern here, but the end-user
 view may be troubling.

Heh, cloud + security == irrational fear, basic axiom

 That said, VMware has an interface for exactly this at least it's an
 established practice.

OK, what about other bits of API?  I recall seeing things like cut'n
paste, reboot, ballooning, time, few bits that spice would care about...
Are you thinking that as well, or all in terms of read/write/exec?

thanks,
-chris



Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 So there's no doubt in my mind that if you need a way to inventory
 physical and virtual systems, something like Matahari becomes a very
 appealing option to do that.
 
 But that's not the problem space I'm trying to tackle.
 
 An example of the problem I'm trying to tackle is guest reboot.

Matahari already has shutdown and reboot methods.

Inventory, reboot, filesystem freeze, cut'n paste, etc.. all are
communicating between host and guest.  Main point is to consolidate
effort to keep from having some sprawl of agents (which agent do I
install to do reboot?).

thanks,
-chris



[Qemu-devel] KVM call minutes for Oct 19

2010-10-19 Thread Chris Wright
0.13.X -stable
- Anthony will send note to qemu-devel on this
- move 0.13.X -stable to a separate tree
- driven independently of main qemu tree
- challenge is always in the porting and testing of backported fixes
- looking for volunteers

0.14
- would like to do this before end of the year
- 0.13 forked off a while back (~July), 
- 0.14 features
  - QMP stabilized
- 0.13.0 - 0.14 QMP
- hard attempt not to break compatibility
- new commands, rework, async, human monitor passthrough
- goal getting to libvirt not needing human monitor at all
- QMP KVM autotest test suite submitted
- in-kernel apic, tpr patching still outstanding
- QED coroutine concurrency

Live snapshots
- merge snapshot?
  - already supported, question about mgmt of snapshot chain
- integrate with fsfreeze (and windows alternative)

Guest Agent
- have one coming RSN (poke Anthony for details)
- works over legacy or virtio serial
- simple RPC mechanism between host/guest
- allows host initiated reboot, for example
- can be place to do host driven snahpshot too
- Matahari?
  - can deal w/ block/net/UUID/cluster/etc
  - too heavyweight?
  - be sure to coordinate

threadlets and concurrency model (for virtfs)
- prior discussions
  - 1st model: http://www.mail-archive.com/qemu-devel@nongnu.org/msg43838.html
  - 2nd model: http://www.mail-archive.com/qemu-devel@nongnu.org/msg43921.html
  - threadlets: http://www.mail-archive.com/qemu-devel@nongnu.org/msg43842.html
- please review and continue discussion on the list
- concurrency model questions
  - async state machine is easiest to merge right now
  - future work could push it to cooperative coroutines

usb-ccid (aka external device modules such as vtpm)
- isolating device specific interface from qemu device internals is hard
- usb-ccid description...(go read the patches) 
- technical complexity with external emulation
  - version skew
  - live migration
  - same complextiy as full plug-in



[Qemu-devel] KVM Forum 2010: videos online [was Re: KVM Forum 2010: presentations online]

2010-10-19 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 We were also able to video the speakers, and will send a note when the
 videos are available.
 (and thanks again to Andrew Cathrow for making this happen)

I don't think a note went out yet.  The videos are available as well.

thanks,
-chris



[Qemu-devel] Re: KVM call agenda for Oct 19

2010-10-18 Thread Chris Wright
* Juan Quintela (quint...@redhat.com) wrote:
 
 Please send in any agenda items you are interested in covering.

- 0.13.X -stable handoff
- 0.14 planning
- threadlet work
- virtfs proposals



[Qemu-devel] Re: KVM call agenda for Oct 5

2010-10-05 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, call cancelled.

thanks,
-chris



Re: [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support

2010-10-05 Thread Chris Wright
* Andreas Färber (andreas.faer...@web.de) wrote:
 Am 04.10.2010 um 20:54 schrieb Marcelo Tosatti:
 
 I assume something went wrong with your cover letter here. It
 would've been nice to see MCE spelled out or summarized for those of
 us that don't speak x86.

It would help.  The acronym is Machine Check Exception.  The patchset
should allow (on newer Intel x86 hw with a newer linux kernel) a class of
memory errors delivered to the host OS as MCEs to be propagated to the
guest OS.  Without the patchset, the qemu process assoicated with the
memory where the error took place would be killed.  With the patchset,
qemu can propagate the error into the guest and allow the guest to kill
only the process within the guest that is assocated with the memory error.



[Qemu-devel] KVM call agenda for Oct 5

2010-10-04 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call agenda for Sept 28

2010-09-27 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Sept 21

2010-09-21 Thread Chris Wright
Nested VMX
- looking for forward progress and better collaboration between the
  Intel and IBM teams
- needs more review (not a new issue)
- use cases
- work todo
  - merge baseline patch
- looks pretty good
- review is finding mostly small things at this point
- need some correctness verification (both review from Intel and testing)
  - need a test suite
- test suite harness will help here
  - a few dozen nested SVM tests are there, can follow for nested VMX
  - nested EPT
  - optimize (reduce vmreads and vmwrites)
- has long term maintan

Hotplug
- command...guest may or may not respond
- guest can't be trusted to be direct part of request/response loop
- solve at QMP level
- human monitor issues (multiple successive commands to complete a
  single unplug)
  - should be a GUI interface design decision, human monitor is not a
good design point
- digression into GUI interface

Drive caching
- need to formalize the meanings in terms of data integrity guarantees
- guest write cache (does it directly reflect the host write cache?)
  - live migration, underlying block dev changes, so need to decouple the two
- O_DIRECT + O_DSYNC
  - O_DSYNC needed based on whether disk cache is available
  - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
  - how to manage w/out needing to flush every write, slow
- perhaps start with O_DIRECT on raw, non-sparse files only?
- backend needs to open backing store matching to guests disk cache state
- O_DIRECT itself has inconsistent integrity guarantees
  - works well with fully allocated file, depedent on disk cache disable
(or fs specific flushing)
- filesystem specific warnings (ext4 w/ barriers on, brtfs)
- need to be able to open w/ O_DSYNC depending on guets's write cache mode
- make write cache visible to guest (need a knob for this)
- qemu default is cache=writethrough, do we need to revisit that?
- just present user with option whether or not to use host page cache
- allow guest OS to choose disk write cache setting
  - set up host backend accordingly
- be nice preserve write cache settings over boot (outgrowing cmos storage)
- maybe some host fs-level optimization possible
  - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
- conclusion
  - one direct user tunable, use host page cache or not
  - one guest OS tunable, enable disk cache



[Qemu-devel] KVM call agenda for Sept 21

2010-09-20 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Chris Wright
* Cam Macdonell (c...@cs.ualberta.ca) wrote:
 After fixing the resource_size_t return value with
 pci_resource_alignment, I see one other strange behaviour only when
 using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
 of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.
 
 ACPI Error: The DSDT has been corrupted or replaced - old, new headers
 below (20100702/tbutils-372)
 ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
 ACPI:  (null) 0 (v00   )
 ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
 If system does not work as expected, please boot with acpi=copy_dsdt
 (20100702/tbutils-378)
 ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
 ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
 (20100702/pci_link-283)
 ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
 or acpi=off
 virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
 Non-volatile memory driver v1.3
 Linux agpgart interface v0.103
 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled

IIRC,  the pci hole is only 1.5G in the BIOS, can you verify that
seabios is doing the right thing?

thanks,
-chris



Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Chris Wright
* Cam Macdonell (c...@cs.ualberta.ca) wrote:
 On Fri, Sep 17, 2010 at 2:52 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
  On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote:
  * Cam Macdonell (c...@cs.ualberta.ca) wrote:
  After fixing the resource_size_t return value with
  pci_resource_alignment, I see one other strange behaviour only when
  using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
  of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.
 
  ACPI Error: The DSDT has been corrupted or replaced - old, new headers
  below (20100702/tbutils-372)
  ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
  ACPI:      (null) 0 (v00                       )
  ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
  If system does not work as expected, please boot with acpi=copy_dsdt
  (20100702/tbutils-378)
  ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
  ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
  (20100702/pci_link-283)
  ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
  or acpi=off
  virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
  Non-volatile memory driver v1.3
  Linux agpgart interface v0.103
  Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
 
  IIRC,  the pci hole is only 1.5G in the BIOS, can you verify that
  seabios is doing the right thing?
 
  I'm not sure what the right thing for seabios to do is.  Here is the
  seabios output related to the device.
 
  PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
  region 0: 0xf204
  init smm
  init boot device ordering
 
  snip
 
  Attempting to init PCI bdf 00:04.0 (vd 1af4:1110)
  Attempting to map option rom on dev 00:04.0
  Option rom sizing returned 0 0
  Checking rom 0x000c9800 (sig aa55 size 17)
  Checking rom 0x000cc000 (sig aa55 size 2)
  Checking rom 0x000c9000 (sig aa55 size 4)
  Checking rom 0x000c9800 (sig aa55 size 17)
  Checking rom 0x000cc000 (sig aa55 size 2)
  Mapping hd drive 0x000fdb50 to 0
  Running option rom at c980:0003
  Running option rom at cc00:0003
  pmm_malloc zone=0x000f515c handle= size=36 align=10
  ret=0x000fdaf0 (detail=0x7ffefca0)
  ebda moved from 9fc00 to 9f400
  pmm_malloc zone=0x000f5154 handle= size=2048 align=10
  ret=0x0009f800 (detail=0x7ffefb40)
  finalize PMM
  malloc finalize
 
  when using a BAR of 2GB or less, there is an additional write to the
  PCI space of the device, which may be from the bios
 
  pci_write_config: (val) 0x - 0x18 (addr)
  pci_read_config: (val) 0x8004 - 0x18 (addr)
  pci_write_config: (val) 0x4 - 0x18 (addr)
  pci_write_config: (val) 0x3 - 0x4 (addr)
  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x - 0x1c (addr)
  IVSHMEM: guest pci addr = , guest h/w addr =
  4312137728, size = 8000
 
  so is it succeeding with smaller sizes ( 2GB) because it fits in the
  bios' pci hole?
 
 sorry that should be  2GB.

It seems most likely... 2GB also means = 1GB (which would fit in
the hole).  Although, I have to admit, I'm not sure how seabios handles
the hole nowadays.

What about 2GB with 32bit BAR?

thanks,
-chris



[Qemu-devel] KVM call minutes for Sept 14

2010-09-14 Thread Chris Wright
0.13
- if all goes well...tomorrow

stable tree
- please look at -stable to see what is missing (bugfixes)
  - esp. regressions from 0.12
- looking for dedicated stable maintainer/release manager
  - pick this discussion up next week

qed/qcow2
- increase concurrency, performance
- threading vs state machine
- avi doesn't like qed reliance on fsck
  - undermines value of error checking (errors become normal)
  - prefer preallocation and fsck just checks for leaked blocks
- just load and validate metadata
- options for correctness are
  - fsync at every data allocation
  - leak data blocks
  - scan
- qed is pure statemachine
  - state on stack, control flow vs function call
- common need to separate handle requests concurrently, issue async i/o
- most disk formats have similar metadata and methods
  - lookup cluster, read/write data
  - qed could be a path to cleaning up other formats (reusing)
- need an incremental way to improve qcow2 performance
  - threading doesn't seem to be the way to achieve this (incrementally)
- coroutines vs. traditional threads discussion
  - parallel (and locking) vs few well-defined preemption points
- plan for qed...attempt to merge in 0.14
  - online fsck support is all that's missing
  - add bdrv check callback, look for new patch series over the next week
- back to list with discussion...



[Qemu-devel] KVM call agenda for Sept 14

2010-09-13 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for Sept 7

2010-09-07 Thread Chris Wright
0.13 schedule
- RSN
- rc1 uploaded, tagged in git (and tag should actually be there now)
- announcement once it propagates
- 0.13.0 should be 1 week after rc1 announcement
- please check rc1 for any missing critical patches

qed
- concession that qcow2 is complicated and hard to get right
- it's much more efficient than qcow2
- not had data integrity testing, but simple enough design to
  rationalize the format and meta-data updates
- formal spec planned...documented on wiki http://wiki.qemu.org/Features/QED
  - design doc written first, code written to design doc
- defragmentation supportable and important (not done now)
- defragmented image should be as fast as raw
- concern about splitting install base (doubles qa effort, etc)
  - should be possible to do an in-place qcow2-qed update
  - even live update could be doable
- what about vmdk or vhd?
  - controlled externally
  - specification license implications are unclear
  - too close to NIH?
- qed and async model could put pressure to improve other formats and
  push code out of qed to core
- another interest for qed...streaming images (fault in image extents
  via network)
  - want to design this as starting from mgmt interface discussion



[Qemu-devel] [PATCH] pci: fix pci_resource_alignment prototype

2010-09-07 Thread Chris Wright
From: Cam Macdonell c...@cs.ualberta.ca

* Cam Macdonell (c...@cs.ualberta.ca) wrote:
 It seems it was the alignment value being passed back from
 pci_resource_alignment().  The return type is an int, which was
 causing value of 2GB to be sign extended to to 0x8000.
 Changing the return type to resource_size_t allows BAR values = 2GB
 to be successfully assigned.
snip
 -static inline int pci_resource_alignment(struct pci_dev *dev,
 +static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
  struct resource *res)

Yes, that's my mistake.  Thanks for debugging the issue Cam.
This fixes the prototype for both pci_resource_alignment() and
pci_sriov_resource_alignment().

Patch started as debugging effort from Cam Macdonell.

Cc: Cam Macdonell c...@cs.ualberta.ca
Cc: Avi Kivity a...@redhat.com
Cc: Jesse Barnes jbar...@virtuousgeek.org
[chrisw: add iov bits]
Signed-off-by: Chris Wright chr...@sous-sol.org
---
 drivers/pci/iov.c |2 +-
 drivers/pci/pci.h |5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ce6a366..553d8ee 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -608,7 +608,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
  * the VF BAR size multiplied by the number of VFs.  The alignment
  * is just the VF BAR size.
  */
-int pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
+resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
struct resource tmp;
enum pci_bar_type type;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 679c39d..5d0aeb1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -262,7 +262,8 @@ extern int pci_iov_init(struct pci_dev *dev);
 extern void pci_iov_release(struct pci_dev *dev);
 extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
-extern int pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
+extern resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev,
+   int resno);
 extern void pci_restore_iov_state(struct pci_dev *dev);
 extern int pci_iov_bus_range(struct pci_bus *bus);
 
@@ -318,7 +319,7 @@ static inline int pci_ats_enabled(struct pci_dev *dev)
 }
 #endif /* CONFIG_PCI_IOV */
 
-static inline int pci_resource_alignment(struct pci_dev *dev,
+static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 struct resource *res)
 {
 #ifdef CONFIG_PCI_IOV




[Qemu-devel] KVM call minutes for August 31

2010-08-31 Thread Chris Wright
QMP/QPI
- declare what's in 0.13 supported
  - means reasonable effort to avoid breaking something (deprecation is
possible)
- things that are left, shallow patch or human monitor conversion
- how to move forward?
  - need to improve interfaces (no argument there)
  - internal vs. external interfaces
  - QMP == external, stable, extensible, discoverable, fwd/back compat
  - internal == C, data structures, changeable, non-stable
  - redefine internal interfaces and work up?
- this addresses concern that most changes are in monitor.c
- and addresses the concern that we aren't defining proper
  extensible top level interfaces
  - work top down from external?
- map to internal details...fix internals when external is
  hard/impossible based on internals
- need to focus on QMP command addition in the face of internal details
  - no disagreement there
- decouple monitor and QMP
- sane interfaces (proposal for migration from Anthony)
- error issues...

0.13 schedule
- rc1 tagged locally and under test, once testing completes, upload,
  then one week to fix any outstanding issues
  - will push tag later today and upload, announce once propagates (24hrs-ish)

qemu-kvm integration
- not getting a lot of cycles
- nothing major that anthony won't pull
  - extboot still
- performance
  - i/o threading model (merge both and fix in-tree)
- in-kernel apic
- device assignement (vfio against qemu tree)
- disable ia64
- avi will look at doing the pull request



Re: [Qemu-devel] [PATCH 1/5] virtio-net: Make tx_timer timeout configurable

2010-08-31 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index 075f72d..9ef29f0 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -36,6 +36,7 @@ typedef struct VirtIONet
  VirtQueue *ctrl_vq;
  NICState *nic;
  QEMUTimer *tx_timer;
 +uint32_t tx_timeout;
  int tx_timer_active;
  uint32_t has_vnet_hdr;
  uint8_t has_ufo;
 @@ -702,7 +703,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, 
 VirtQueue *vq)
  virtio_net_flush_tx(n, vq);
  } else {
  qemu_mod_timer(n-tx_timer,
 -   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
 +   qemu_get_clock(vm_clock) + n-tx_timeout);
  n-tx_timer_active = 1;
  virtio_queue_set_notification(vq, 0);
  }
 @@ -842,7 +843,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
 version_id)
  
  if (n-tx_timer_active) {
  qemu_mod_timer(n-tx_timer,
 -   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
 +   qemu_get_clock(vm_clock) + n-tx_timeout);

I think I'm missing where this is stored?  Looks like migration
would revert a changed tx_timeout back to 150us.

thanks,
-chris



Re: [Qemu-devel] [PATCH 1/5] virtio-net: Make tx_timer timeout configurable

2010-08-31 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Tue, 2010-08-31 at 11:00 -0700, Chris Wright wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
   diff --git a/hw/virtio-net.c b/hw/virtio-net.c
   index 075f72d..9ef29f0 100644
   --- a/hw/virtio-net.c
   +++ b/hw/virtio-net.c
   @@ -36,6 +36,7 @@ typedef struct VirtIONet
VirtQueue *ctrl_vq;
NICState *nic;
QEMUTimer *tx_timer;
   +uint32_t tx_timeout;
int tx_timer_active;
uint32_t has_vnet_hdr;
uint8_t has_ufo;
   @@ -702,7 +703,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, 
   VirtQueue *vq)
virtio_net_flush_tx(n, vq);
} else {
qemu_mod_timer(n-tx_timer,
   -   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
   +   qemu_get_clock(vm_clock) + n-tx_timeout);
n-tx_timer_active = 1;
virtio_queue_set_notification(vq, 0);
}
   @@ -842,7 +843,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
   int version_id)

if (n-tx_timer_active) {
qemu_mod_timer(n-tx_timer,
   -   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
   +   qemu_get_clock(vm_clock) + n-tx_timeout);
  
  I think I'm missing where this is stored?  Looks like migration
  would revert a changed tx_timeout back to 150us.
 
 It's not stored, it can be instantiated on the migration target any way
 you please and we can migrate between different values or even different
 TX mitigation strategies.  If a non-default value is used on the source
 and you want to maintain the same behavior, the target needs to be
 started the same way.

heh, IOW, I did miss how it's stored...on cmdline ;)

thanks,
-chris



[Qemu-devel] KVM call agenda for August 24

2010-08-23 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call cancelled [was: KVM call agenda for August 17]

2010-08-17 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

Today's call is cancelled.



[Qemu-devel] KVM Forum 2010: presentations online

2010-08-16 Thread Chris Wright
KVM Forum 2010 was quite a success, many thanks to all who participated!

For those who couldn't attend, the presentations are available online now:
(thanks to Andrew Cathrow for pushing them all up)

http://www.linux-kvm.org/page/KVM_Forum_2010#Presentations

We were also able to video the speakers, and will send a note when the
videos are available.
(and thanks again to Andrew Cathrow for making this happen)

thanks,
-chris



[Qemu-devel] Re: KVM Forum 2010: presentations online

2010-08-16 Thread Chris Wright
* Dor Laor (dl...@redhat.com) wrote:
 On 08/17/2010 12:50 AM, Chris Wright wrote:
 KVM Forum 2010 was quite a success, many thanks to all who participated!
 
 For those who couldn't attend, the presentations are available online now:
 (thanks to Andrew Cathrow for pushing them all up)
 
 http://www.linux-kvm.org/page/KVM_Forum_2010#Presentations
 
 I Beat you in a second ;-)

Assuming accurate local clocks...6 seconds even ;)




[Qemu-devel] KVM call agenda for August 17

2010-08-16 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] KVM call minutes for July 27

2010-07-27 Thread Chris Wright
0.13
- -rc0 tagged, propagating now
- no more features, bug fix only
- although a few things, like shared memory device, are still feasible

qemu64 cpu model
- currently model 2
  - this cpu simply does not exist at all in the real world
- model 13 or higher windows 32bit will enable MSI/-X support
  - anyone aware of issues with simply bumping the model
- must retain compatibility with -M
- cpu models fully configurable in config files
  - should move default to config files
- raises a couple questions
  - does qemu64 need to have a single stable definition?
  - does default cpu make sense
- also the physical models are broken (Conroe, Penryn, etc..)
  - these are simply broken and need to change
- can create versions of base model (qemu64-v1/0.13/whatever)

probed_raw
- 79368c81 closed security hole
- qraw addressing theoretical issue and has too much magic
- any further discussion, list is --- that way

qemu -help parsing
- libvirt current issues
  - qemu -help/-version was changed and broke libvirt (fixed 0.8.2)
  - libvirt improperly parses cache= (fixed 0.8.3)
- reverting version string change: f75ca1ae
- apply bruce's cache -help patch
- no further significant -help changes
- libvirt uses version only (and maintains the version meaning for
  downstreams)
- eventually capabilities fixes this
- minimal info capabilities that usable tomorrow?
  - becomes a suppored interface, deprecating it will be complicated
  - unclear if it buys anything, -help parsing working now, the interim
sol'n would be thrown away
- could isolate libvirt -help parser and toss it into qemu as a test



Re: [Qemu-devel] Re: KVM call agenda for July 27

2010-07-27 Thread Chris Wright
* Daniel P. Berrange (berra...@redhat.com) wrote:
 On Tue, Jul 27, 2010 at 07:17:06PM +0300, Avi Kivity wrote:
   On 07/27/2010 06:28 PM, Anthony Liguori wrote:
  
  If we add docs/deprecated-features.txt, schedule removal for at least 
  1 year in the future, and put a warning in the code that prints 
  whenever raw is probed, I think I could warm up to this.
  
  Since libvirt should be insulating users from this today, I think the 
  fall out might not be terrible.
  
  On a related note, we should ask libvirt to make qemu stderr output 
  available to its users, or perhaps an ABRT plugin to report such 
  messages from libvirt's logs.
 
 QEMU stderr+out is already recorded in /var/lib/libvirt/qemu/$GUESTNAME.log
 along with the env variables and argv used to spawn it. Or did you mean 
 provide an API + virsh command /virt-manager UI for accessing the logs ?

I read that to mean...propagate stderr from qemu to be right in front of
the user.  So that's output from virsh or in virt-manager.  Trouble is,
that's only useful (at best) when starting a guest.  Perhaps some
virt-manager thing (an exclamation point to show there's errors in the
log and a way to read them), and a virsh utility to match (although
that'd require the user to actually poll the interface, at which point
they can just as easily just look at the log).

thanks,
-chris



Re: [Qemu-devel] Re: KVM call agenda for July 27

2010-07-27 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
  On 07/27/2010 07:29 PM, Chris Wright wrote:
 
 QEMU stderr+out is already recorded in /var/lib/libvirt/qemu/$GUESTNAME.log
 along with the env variables and argv used to spawn it. Or did you mean
 provide an API + virsh command /virt-manager UI for accessing the logs ?
 I read that to mean...propagate stderr from qemu to be right in front of
 the user.
 
 Yes, that's what I meant.
 
 So that's output from virsh or in virt-manager.  Trouble is,
 that's only useful (at best) when starting a guest.  Perhaps some
 virt-manager thing (an exclamation point to show there's errors in the
 log and a way to read them), and a virsh utility to match (although
 that'd require the user to actually poll the interface, at which point
 they can just as easily just look at the log).
 
 If things work there's  no reason for the user to go look at the
 logs.  An exclamation point invites clicking.
 
 Even better would be an ABRT plugin, so if something goes
 (marginally) wrong, the siren pops up and you're invited to report
 the bug.

Despite some of the ABRT growing pains, ABRT plugin seems like a good
idea.  I don't know enough of the plugins to know if that requires
formatted output and just grepping for some known regexps.

thanks,
-chris



Re: [Qemu-devel] Re: [PATCH] Introduce a -libvirt-caps flag as a stop-gap

2010-07-27 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 Here are the possible things we can do:
 
 1) merge -libvirt-caps as an intermediate solution, stop caring
 about -help changes, when full caps are introduced, stop updating
 -libvirt-caps
 
 2) don't merge -libvirt-caps, stop caring about -help changes, put
 everything on getting full caps merged by 0.14
 
 3) don't merge -libvirt-caps, care about making -help changes, use
 -help as the caps mechanism until full caps get merged

3.5) same as 3) + add test case to qemu to test that -help parser from
libvirt isn't busted.

 We can't do (3).  I'm going to revert the -help changes for 0.13 so
 that old versions of libvirt work but not for master.

I suspect that if the breakage is seen, it'd be easy to fix.  Adding new
help items won't be the problem, just the subtle changes to the existing
output.  Suck?  Yes.  Workable until full caps?  Think so.

thanks,
-chris



[Qemu-devel] KVM call agenda for July 27

2010-07-26 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-15 Thread Chris Wright
* Paul Brook (p...@codesourcery.com) wrote:
   The right approach IMHO is to convert devices to use bus-specific
   functions to access memory.  The bus specific functions should have
   a device argument as the first parameter.
  
  As for ATS, the internal api to handle the device's dma reqeust needs
  a notion of a translated vs. an untranslated request.  IOW, if qemu ever
  had a device with ATS support, the device would use its local cache to
  translate the dma address and then submit a translated request to the
  pci bus (effectively doing a raw cpu physical memory* in that case).
 
 Really? Can you provide an documentation to support this claim?
 My impression is that there is no difference between translated and 
 untranslated devices, and the translation is explicitly disabled by software.

ATS allows an I/O device to request a translation from the IOMMU.
The device can then cache that translation and use the translated address
in a PCIe memory transaction.  PCIe uses a couple of previously reserved
bits in the transaction layer packet header to describe the address
type for memory transactions.  The default (00) maps to legacy PCIe and
describes the memory address as untranslated.  This is the normal mode,
and could then incur a translation if an IOMMU is present and programmed
w/ page tables, etc. as is passes through the host bridge.

Another type is simply a transaction requesting a translation.  This is
new, and allows a device to request (and cache) a translation from the
IOMMU for subsequent use.

The third type is a memory transaction tagged as already translated.
This is the type of transaction an ATS capable I/O device will generate
when it was able to translate the memory address from its own cache.

Of course, there's also an invalidation request that the IOMMU can send
to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris



Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-15 Thread Chris Wright
* Chris Wright (chr...@sous-sol.org) wrote:
 * Paul Brook (p...@codesourcery.com) wrote:
The right approach IMHO is to convert devices to use bus-specific
functions to access memory.  The bus specific functions should have
a device argument as the first parameter.
   
   As for ATS, the internal api to handle the device's dma reqeust needs
   a notion of a translated vs. an untranslated request.  IOW, if qemu ever
   had a device with ATS support, the device would use its local cache to
   translate the dma address and then submit a translated request to the
   pci bus (effectively doing a raw cpu physical memory* in that case).
  
  Really? Can you provide an documentation to support this claim?

Wow...color me surprised...there's actually some apparently public
training docs that might help give a more complete view:

http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=0ab681ba7001e40cdb297ddaf279a8de82a7dc40

ATS discussion starts on slide 23.

  My impression is that there is no difference between translated and 
  untranslated devices, and the translation is explicitly disabled by 
  software.

And now that I re-read that sentence, I see what you are talking about.
Yes, there is the above notion as well.

A device can live in a 1:1 mapping of device address space to physical
memory.  This could be achieved in a few ways (all done by the OS software
programming the IOMMU).

One is to simply create a set of page tables that map 1:1 all of device
memory to physical memory.  Another is to somehow mark the device as
special (either omit translation tables or mark the translation entry
as effectively do not translate).  This is often referred to as Pass
Through mode.  But this is not the same as ATS.

Pass Through mode is the functional equivalent of disabling the
translation/isolation capabilities of the IOMMU.  It's typically used
when an OS wants to keep a device for itself and isn't interested in
the isolation properties of the IOMMU.  It then only creates isolating
translation tables for devices it's giving to unprivileged software
(e.g. Linux/KVM giving a device to a guest, Linux giving a device to
user space process, etc.)

 ATS allows an I/O device to request a translation from the IOMMU.
 The device can then cache that translation and use the translated address
 in a PCIe memory transaction.  PCIe uses a couple of previously reserved
 bits in the transaction layer packet header to describe the address
 type for memory transactions.  The default (00) maps to legacy PCIe and
 describes the memory address as untranslated.  This is the normal mode,
 and could then incur a translation if an IOMMU is present and programmed
 w/ page tables, etc. as is passes through the host bridge.
 
 Another type is simply a transaction requesting a translation.  This is
 new, and allows a device to request (and cache) a translation from the
 IOMMU for subsequent use.
 
 The third type is a memory transaction tagged as already translated.
 This is the type of transaction an ATS capable I/O device will generate
 when it was able to translate the memory address from its own cache.
 
 Of course, there's also an invalidation request that the IOMMU can send
 to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris



Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-15 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
 On 07/15/2010 07:52 PM, Chris Wright wrote:
 
 Really? Can you provide an documentation to support this claim?
 My impression is that there is no difference between translated and
 untranslated devices, and the translation is explicitly disabled by 
 software.
 ATS allows an I/O device to request a translation from the IOMMU.
 The device can then cache that translation and use the translated address
 in a PCIe memory transaction.  PCIe uses a couple of previously reserved
 bits in the transaction layer packet header to describe the address
 type for memory transactions.  The default (00) maps to legacy PCIe and
 describes the memory address as untranslated.  This is the normal mode,
 and could then incur a translation if an IOMMU is present and programmed
 w/ page tables, etc. as is passes through the host bridge.
 
 Another type is simply a transaction requesting a translation.  This is
 new, and allows a device to request (and cache) a translation from the
 IOMMU for subsequent use.
 
 The third type is a memory transaction tagged as already translated.
 This is the type of transaction an ATS capable I/O device will generate
 when it was able to translate the memory address from its own cache.
 
 Of course, there's also an invalidation request that the IOMMU can send
 to ATS capable I/O devices to invalidate the cached translation.
 
 For emulated device, it seems like we can ignore ATS completely, no?

Not if you want to emulate an ATS capable device ;)

Eariler upthread I said:

  IOW, if qemu ever had a device with ATS support...

So, that should've been a much bigger _IF_

thanks,
-chris



Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-15 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
 On 07/15/2010 08:17 PM, Chris Wright wrote:
 
 For emulated device, it seems like we can ignore ATS completely, no?
 Not if you want to emulate an ATS capable device ;)
 
 What I meant was that the whole request translation, invalidate, dma
 using a translated address thing is invisible to software.  We can
 emulate an ATS capable device by going through the iommu every time.

Well, I don't see any reason to completely ignore it.  It'd be really
useful for testing (I'd use it that way).  Esp to verify the
invalidation of the device IOTLBs.

But I think it's not a difficult thing to emulate once we have a proper
api encapsulating a device's dma request.

thanks,
-chris



Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

2010-07-14 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 07/14/2010 03:13 PM, Paul Brook wrote:
 On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
 Memory accesses must go through the IOMMU layer.
 No. Devices should not know or care whether an IOMMU is present.
 There are real devices that care very much about an IOMMU. Basically all
 devices supporting ATS care about that. So I don't see a problem if the
 device emulation code of qemu also cares about present IOMMUs.
 
 You should be adding a DeviceState argument to
 cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
 transparently.
 That's not a good idea imho. With an IOMMU the device no longer accesses
 cpu physical memory. It accesses device virtual memory. Using
 cpu_physical_memory* functions in device code becomes misleading when
 the device virtual address space differs from cpu physical.
 Well, ok, the function name needs fixing too.  However I think the only thing
 missing from the current API is that it does not provide a way to determine
 which device is performing the access.
 
 I agree with Paul.

I do too.

 The right approach IMHO is to convert devices to use bus-specific
 functions to access memory.  The bus specific functions should have
 a device argument as the first parameter.

As for ATS, the internal api to handle the device's dma reqeust needs
a notion of a translated vs. an untranslated request.  IOW, if qemu ever
had a device with ATS support, the device would use its local cache to
translate the dma address and then submit a translated request to the
pci bus (effectively doing a raw cpu physical memory* in that case).

thanks,
-chris



[Qemu-devel] Re: KVM Call agenda for July 13th

2010-07-12 Thread Chris Wright
* Juan Quintela (quint...@redhat.com) wrote:
 
 Please send in any agenda items you are interested in covering.

0.13 ;-)



Re: [Qemu-devel] KVM Call agenda for June 29

2010-06-29 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 06/28/2010 02:03 PM, Juan Quintela wrote:
 Please send in any agenda items you are interested in covering.
 
 If we have a lack of agenda items I'll cancel the week's call.
 
 After last week debacle, I will wait until 10 mins before call to cancel
 it.
 
 Thanks for posting earlier this week.   I don't have anything to
 discuss so I'm in favor of cancelling this week.

OK, let's cancel this week.

thanks,
-chris



[Qemu-devel] KVM call agenda for June 29

2010-06-28 Thread Chris Wright
Please send in any agenda items you are interested in covering.

If we have a lack of agenda items I'll cancel the week's call.

thanks,
-chris



  1   2   >